Why Does Current Clustering Require Recoding?

← Back to Stories (view on slashdot.org)

Why Does Current Clustering Require Recoding?

Posted by Cliff on Tuesday September 13, 2005 @09:07AM from the reinventing-the-wheel dept.

AugstWest asks: "I've been doing some research into what the available clustering options are for pooling CPU resources, and it looks like most of the solutions I've found require that programs be re-written to take advantage of the cluster. Since there are virtualization apps like Bochs and VMWare, where the applications just make use of a virtual CPU as if it was a real CPU, why aren't there clustering solutions that do this as well?"

11 of 75 comments (clear)

Min score:

Reason:

Sort:

part of the issue by sfcat · 2005-09-13 09:21 · Score: 3, Informative

When making an application distributed, you must figure out how to replicate the memory the application uses to other machines and make sure that this replication and synchronization work is transparent to the logic of the application. But this replication and synchronization is far far far too expensive (computationally) if done naively. So either special system calls (which is what the recoding requires) or a redesign of how work is parcelled out to worker threads is necessary.
This is in addition to the handling of resources such as database connections and other shared resources across the distributed cluster. I'm not exactly sure what your specific needs are but when you separate threads across different physical memory spaces, it creates significant problems to overcome. If you just want to virtualize the application (so one machine, many virtual machines, one physical memory), then the recoding should be trivial. And I agree, in this isolated case, no recoding should be necessary. But most of the time, clustering entails spaning multiple physical memories, and thus the application needs to be designed to handle these difficulties.

--
"Those that start by burning books, will end by burning men."
duh by jpmkm · 2005-09-13 09:23 · Score: 2, Informative

How is this magical cpu virtualizer going to know what it can split up and send to different computers? Like another poster mentioned, latency is the big issue. If your cpu virtualizer arbitrarily sends instructions over the network to other nodes, but the original program still expects them to be executed at local cpu speed then things are going to get fucked up fast. I wouldn't be surprised if the final result is actually slower than just running the job on one box.

Basically, what's wrong with this idea is the clustering software has no way of knowing what it can chunk up and spit out to other nodes unless the programmer of the software in question tells it. Some multithreaded programs can be run on clusters without a rewrite, but there is already clustering software for that application. What the OP is suggesting is similar to rerouting highway traffic by arbitrarily plucking cars off the highway and putting them on random side streets. They all may get there eventually and, at first, it may seem like they are moving faster, but in the end it just takes everyone a lot longer to reach their destination. Now, if the drivers themselves planned alternate routes to help alleviate congestion on the highways, then there's a good chance everyone would get to their destinations faster.
What type of cluster do you want? by Hast · 2005-09-13 09:30 · Score: 2, Informative

First off, it's not entirely clear what you want to do with it. If you want load balancing then that's one problem. If you want parallel batch processing (such as rendering farms or compiling) then that's another problem. And for the really juicy stuff, ie running a normal application distributed on multiple computers then that is a third, and very different problem.

But all of them require that you add something to the original program which distributes the work (load balancing/render farms). If you want your original program to run in parallel then that is a much harder problem to solve. Basically you'll have to remake it into something like the above.

The last problem would basically require the computer to extract threads out of your code. This is pretty much impossible to do automatically though.
openMosix by Codename_V · 2005-09-13 09:42 · Score: 2, Informative

Actually, I'd recommend openMosix. Granted Mosix is the original and is open source now as well, but it still seems like openMosix is more actively developed.

--
Free will is just an illusion
This is a basic systems question. by stienman · 2005-09-13 10:20 · Score: 4, Informative

This is a basic systems question:

[Why must] programs be re-written to take advantage of the cluster.

The simple answer is that programs, in general, are written as single threaded applications with shared state (memory). A cluster is the opposite of that - multiple parallel CPUs without shared state (or at least requiring one to be explicit about shared state, as opposed to simply declaring a variable).

Usually a program algorithm has to be completely re-designed in order to take advantage of the cluster, while mitigating the problems. At minimum the program must be parallelized. If you don't change the program to succesfully deal with shared memory latency then the cluster becomes nearly as powerful as a single fast computer running the program.

The reason you are asking this question is that you don't realize that a cluster is fundamentally different than a single (or dual or quad) CPU. The architecture is completely different. You can't expect to treat it like any old computer.

-Adam
Re:Compilers by The+boojum · 2005-09-13 10:20 · Score: 2, Informative

I'd love to see a language (or language extension) cleanly define a way to let me define a code block attributes which could affect how and where it gets executed. The runtime library could then distribute that block as the environment best allows.
Have a look at OpenMP. Granted, it's more for shared-memory systems than clusters, but it works similiarly to what you describe.
Re:latency? by Frumious+Wombat · 2005-09-13 11:15 · Score: 3, Informative

Don't forget disk access issues as well. You now have file locking, non-local disk-access, and race state issues to contend with.

Example from my work is that we tend to write several hundred meg to several gig scratch files, and then perform RW operations on them continually during a calculation. If the disk isn't local to the process, then you end up flooding the network, and bringing everything to a screeching halt.

In a Mosixish/Condor type environment, you then have to deal with which processes, because of this disk limitation, can be migrated to other CPUs, or can allow a second job to start on their own because of insufficient utilization, from those which have to have exclusive access to the CPU, and near-exclusive access to the disk, in order to prevent the calc from bogging down.

Then, as the parent mentioned, you have the CPU-CPU communication issues, the network overhead, and memory access patterns, all of which are hard. In theory, had you written your code correctly in the first place, this would only be moderately annoying, but since most people's applications are single-threaded, most programming is taught in serial mode, and the tools for MPar work are still expensive and exotic, then you get a situation where it's easy to run a compute farm (massive numbers of single-processor jobs), but hard to run a parallel cluster (one job aggregating resources)

--
the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
MOSIX License by Noksagt · 2005-09-13 11:18 · Score: 2, Informative

Actually, I'd recommend openMosix.
Agreed.
Granted Mosix is the original and is open source now as well,
Not by OSI/DFSG/FSF standards. The license is still very restrictive. I think the kernel patches might be under GPL, but certainly not the user tools.
it still seems like openMosix is more actively developed.
This is certainly true. Most talent jumped ship & openMosix does have a higher number of active developers (and is somewhat backed by AMD (though I think AMD can and should give more developers to the project)).
Re:openMosix by Noksagt · 2005-09-13 11:27 · Score: 2, Informative

I agree that you may have to make some of these kinds of design changes to benefit for one application processes. But you'd really have to make those if you use other clustering solutions too. With Mosix, you don't have to make the kind of implementation-specific changes, though.

(And, for your particular example, mosix has a number of schedulers & you can schedule manually. You can trivially send one postscript file to each node. Of course you can do this "braindead" clustering with a script, but it isn't as robust, easy, or flexible.)
Mosix sounds good because you don't have to "do" anything special but most apps won't benefit from it.
Somewhat agree for single apps, especially edge cases that you point out. But if a large number of CPU-intensive processes, (open)Mosix is a good, fairly painless way to divide the load.
Your language doesn't support it by photon317 · 2005-09-13 12:37 · Score: 3, Informative

The only way you'll have source code that compiles and runs unmodified on architectures of widely varying parallelism efficiently is for the language itself to know about parallelism, and make it the compiler's (and even runtime-linker and kernel's) job to parallelize your code for you. An inherently parallel language would have ways for you to specify in your source code what can and cannot be executed in parallel, and what code absolutely depends on the serial execution of some previous code. Even then, we're really only talking about the SMP case. When you start involving network latencies and bandwidth restrictions, the decisions on when and how to parallelize become more challenging for the compiler/runtime, possibly requiring either more intelligence on its part and/or more meta-information in your source code.

Until you write code in a language like that, you can never expect to write code in a single-threaded mindset and then have it just magically take advantage of a parallel environment.

--
11*43+456^2
Re:Compilers by GileadGreene · 2005-09-13 14:41 · Score: 3, Informative

I'd love to see a language (or language extension) cleanly define a way to let me define a code block attributes which could affect how and where it gets executed.
The venerable occam programming language requires that each block of code be specifically identified as being executable either in parallel or sequentially. Since PAR and SEQ constructs can be nested it is easy to build up quite complex concurrent structures that can easily be distributed. Since the semantics of occam processes are derived from Hoare's CSP process algebra the compositional nature of occam's parallelism is theoretically sound, and avoids many of the problems associated with thread-based concurrency model that most people are familiar with.