Why Does Current Clustering Require Recoding?

← Back to Stories (view on slashdot.org)

Why Does Current Clustering Require Recoding?

Posted by Cliff on Tuesday September 13, 2005 @09:07AM from the reinventing-the-wheel dept.

AugstWest asks: "I've been doing some research into what the available clustering options are for pooling CPU resources, and it looks like most of the solutions I've found require that programs be re-written to take advantage of the cluster. Since there are virtualization apps like Bochs and VMWare, where the applications just make use of a virtual CPU as if it was a real CPU, why aren't there clustering solutions that do this as well?"

9 of 75 comments (clear)

Min score:

Reason:

Sort:

latency? by Johnny+Mnemonic · 2005-09-13 09:13 · Score: 5, Insightful

why aren't there clustering solutions that do this as well?
Because it's a lot faster to address a local CPU than it is to send that info down the wire to a remote CPU? And because of that latency, it's a lot easier to keep 2 or more local CPUs in sync than it is to keep 2 or more remote CPUs in sync?
You need to recode because you want to work around the latency, which is severe, of working via a network cable--so you design your apps to minimize messaging between CPUs. Some apps can do this well--they don't need results from other CPUs to complete their own information.
Other applications require CPUs to work in tandem, and for each CPU to have to wait while the results are served out over GigE would suck some serious ass, even if it might be technically possible.

--

--
$tar -xvf .sig.tar
Performance by RAMMS+EIN · 2005-09-13 09:18 · Score: 2, Insightful

``Since there are virtualization apps like Bochs and VMWare, where the applications just make use of a virtual CPU as if it was a real CPU, why aren't there clustering solutions that do this as well?''

Because it's virtualization, and thus hurts performance?

--
Please correct me if I got my facts wrong.
Because it's hard. by Elwood+P+Dowd · 2005-09-13 09:18 · Score: 2, Insightful

It's hard to take arbitrary code and decide which parts can be run on opposite ends of a network cable.

Sure, you could make a clustering application that would run arbitrary x86 code on separate machines, but it would be many orders of magnitude slower than just running the code on one big Xeon.

Hell, it's hard enough to take a single thread and spread work across multiple execution units in the CPU for out-of-order execution, and too hard to do it across multiple CPUs in a single box. Why would it be possible across a network cable? Have I completely misunderstood the question?

--

There are no trails. There are no trees out here.
cooking lessons by xutopia · 2005-09-13 09:20 · Score: 3, Insightful

imagine telling a group of 10 cooks to make a huge roast. You wouldn't cut the roast in 10 pieces and each make them cook it seperatly and then glue the pieces back together. It would be highly difficult to glue back the 10 pieces. Instead it would make more sense to ask all cooks to do a seperate tasks. A few could could cut vegetables while another would make a sauce, another a salad.
As it stands today, an OS cannot easily share tasks. But there exists some tasks which are more easily shareable than others. I imagine within a century we'll be able to share tasks more easily and I think the CELL chip is meant to ease this transition but I could be wrong.
1. Re:cooking lessons by swmccracken · 2005-09-13 11:07 · Score: 3, Insightful
  
  See, is this High Availablity clustering or performance clustering. The asker doesn't state, and it's a rather important distinction.
  
  If it's HA, you'd get 10 cooks each to make a roast. Sure, you'd end up with cooking extra meat but that doesn't matter - the goal here is to guarentee that a roast will be cooked no matter what. (I can imagine two copies of bochs running on seperate physical machines but linked to run in absoulte lock-step. Performance might be impared, but relability will be there.)
  
  If it's performance, then you're right, you can't magically glue two computers together and get twice the performance.
Mosix by NitsujTPU · 2005-09-13 09:22 · Score: 4, Insightful

You might want to try Mosix.

http://www.mosix.org/
TANSTAAFL by Julian+Morrison · 2005-09-13 09:23 · Score: 4, Insightful

Clustering exposes complications regarding: shared data, latency, concurrency, transactions, central control, security, failovers, and so forth. It's hard because it's hard.
Holy Grail by owlstead · 2005-09-13 11:18 · Score: 3, Insightful

This will be a bit difficult to explain fully. The other posts have already lightly touched the problems involved (especially latency). But you are talking about the holy grail of parallel computing here; seeing one system while it is running all over the place. My best advice for you is to get a good book on parallel systems and get educated. This is something like asking a doctor why there are still diseases.
Re:Mosix - a great answer, but not for everything. by ancientt · 2005-09-14 16:11 · Score: 2, Insightful

I don't think its quite as simple as a right answer. Sure, openMosix rocks but its only one kind of answer, not the final one. OpenMosix spreads the processes around but can't split a single process up to make it complete faster. It can send processes to the most likely CPU but that still doesn't address the question of speeding up the time that the process will take to complete.
Beowulf clusters typically are designed for specific purposes and software is written to take advantage of the design. You can't have two computers add 2+2 any faster than you can have one computer do it. You can however, have two computers adding 2+2 and 0+1+1 at the same time to get two answers in half the time it would take one computer to do it.
I'm certainly no expert, but I have researched this a bit since I work in a department with a LOT of extra boxes laying around. They're slow individually but together add up to a good bit of processing power and memory. We want to put them to use but the question is "what use?"
That question boils down to programs designed to use multiple threads versus splitting processes. If your needs involve running things that require lots of processes, then openMosix is a good bet, but if you're simply wanting to make your favorite software run faster, the answer might be to rebuild it to take advantage of a Beowulf cluster with more threads rather than trying to divy up the processes. Fortunately, there are compile tools out there to make it a little easier and specifically openMosix has some compile tools to make programs more multi-process friendly.
Despite all the tools though, some programs just don't divide well without significant recoding. If you're faced with that type of problem, its time to call in the coding gurus because openMosix can't help you. Others, like apache and mysql were practically written to be shared.
OpenMosix may be the answer or not, it all depends on the question, which in this case isn't completely clear because the objective and software desired aren't discussed.
As to the why clustering works this way, there are far more technical and probably much more accurate answers but in simple terms, you can't make two computers do one thing faster than one computer can do it unless you can divide the job. Some jobs divide easily, some don't.

--
B) Eliminate all the stupid users. This is frowned upon by society.