Why Does Current Clustering Require Recoding?

← Back to Stories (view on slashdot.org)

Why Does Current Clustering Require Recoding?

Posted by Cliff on Tuesday September 13, 2005 @09:07AM from the reinventing-the-wheel dept.

AugstWest asks: "I've been doing some research into what the available clustering options are for pooling CPU resources, and it looks like most of the solutions I've found require that programs be re-written to take advantage of the cluster. Since there are virtualization apps like Bochs and VMWare, where the applications just make use of a virtual CPU as if it was a real CPU, why aren't there clustering solutions that do this as well?"

4 of 75 comments (clear)

Min score:

Reason:

Sort:

latency? by Johnny+Mnemonic · 2005-09-13 09:13 · Score: 5, Insightful

why aren't there clustering solutions that do this as well?
Because it's a lot faster to address a local CPU than it is to send that info down the wire to a remote CPU? And because of that latency, it's a lot easier to keep 2 or more local CPUs in sync than it is to keep 2 or more remote CPUs in sync?
You need to recode because you want to work around the latency, which is severe, of working via a network cable--so you design your apps to minimize messaging between CPUs. Some apps can do this well--they don't need results from other CPUs to complete their own information.
Other applications require CPUs to work in tandem, and for each CPU to have to wait while the results are served out over GigE would suck some serious ass, even if it might be technically possible.

--

--
$tar -xvf .sig.tar
Mosix by NitsujTPU · 2005-09-13 09:22 · Score: 4, Insightful

You might want to try Mosix.

http://www.mosix.org/
TANSTAAFL by Julian+Morrison · 2005-09-13 09:23 · Score: 4, Insightful

Clustering exposes complications regarding: shared data, latency, concurrency, transactions, central control, security, failovers, and so forth. It's hard because it's hard.
This is a basic systems question. by stienman · 2005-09-13 10:20 · Score: 4, Informative

This is a basic systems question:

[Why must] programs be re-written to take advantage of the cluster.

The simple answer is that programs, in general, are written as single threaded applications with shared state (memory). A cluster is the opposite of that - multiple parallel CPUs without shared state (or at least requiring one to be explicit about shared state, as opposed to simply declaring a variable).

Usually a program algorithm has to be completely re-designed in order to take advantage of the cluster, while mitigating the problems. At minimum the program must be parallelized. If you don't change the program to succesfully deal with shared memory latency then the cluster becomes nearly as powerful as a single fast computer running the program.

The reason you are asking this question is that you don't realize that a cluster is fundamentally different than a single (or dual or quad) CPU. The architecture is completely different. You can't expect to treat it like any old computer.

-Adam