A New Approach To Linux Clusters

← Back to Stories (view on slashdot.org)

A New Approach To Linux Clusters

Posted by timothy on Monday August 13, 2001 @04:19AM from the but-you-can-still-use-your-imagination dept.

rkischuk writes: "InformationWeek has an article about a group of ex-Cray engineers working on a new architecture for clustering Linux systems. 'It's not easy trying to build scalable systems from commodity hardware designed for assembling desktop computers and small servers.' Per the article, 'As the number of CPUs in a Beowulf-style cluster-a group of PCs linked via Ethernet-increases and memory is distributed instead of shared, the efficiency of each processor drops as more are added,' but 'Unlimited's solution involves tailoring Linux running on each node in a cluster, rather than treating all the nodes as peers.'" Looks like Cray engineers think about clustering even when they're not at Cray.

10 of 143 comments (clear)

Min score:

Reason:

Sort:

Re:actually it shows why Cray always does so well. by fgodfrey · 2001-08-13 06:49 · Score: 3, Interesting

The Top 500 is not a list of who makes the best machines. It's a list of what real world installations that run LINPAC the best. LINPAC is a benchmark, not a real piece of code. The T3E holds the record for *sustained* performance on real world code. That, in my opinion, and probably the opinions of the people still buying them, is more valuable than any benchmark. The "systems" on the list above the T3E in question are all strongly connected clusters, not single system image machines. Well, the Hitachi and NEC boxes are probably is an SSI. That's not saying they are bad, mind you, just that they aren't single machines. There are still some codes that run best on vector based SMP systems and for those codes, you buy a Cray. Also, for MPI code that communicates *a lot* between nodes, the T3E will run rings around a cluster.
Just to sumarise my basic point: The Top 500 is a benchmark and is not necesarily a good indication of who makes better computers than who.

--
Go Badgers! -- #include "std/disclaimer.h"
this brings up something.. by xtermz · 2001-08-13 04:28 · Score: 3, Interesting

i have been thinking about for quite a while now. Beauwould clusters are all nice and well, but what about joe sixpack with some it background like me who wants to get some sort of cluster going on. what methods are available (be it a simplified beouwoulf cluster or whatever...) for the guy with 3 or 4 old machines who wants to waste some electricity and try his hand at clustering some machines. is it possible to do it without being a CS major, or is it just a matter of having enough time/resources...

--

I lost my concept of community when my community lost all concept of me.
1. Re:this brings up something.. by Raging+Idiot · 2001-08-13 04:52 · Score: 2, Interesting
  
  Normally I just troll, but this is important.
  MOSIX works fucking awesome. I used it to compress MP3s. I ripped waves and stored them on my bad-ass machine. Then I ran a at daemon on my slowest machine and ran compression routines in the at q in batch mode. By running on the slowest machine in the group it guaranteed that the jobs would migrate to the faster machines in the group and the slowest machine remained as a "task manager". It would run instances until every machine was busy, then batch would hold jobs until a machine came free, then it would release and on with the show.
  It works EXTREMELY well. Try MOSIX for some serious fun.
  
  --
  
  Stupidity never felt so good.
From the last Cray notice by HerrGlock · 2001-08-13 04:42 · Score: 2, Interesting

There were a few "Because Linux does not scale well with multiple servers" posts about why someone would use a mainframe as opposed to a Beowulf.

Well, it looks like there are people working on the task. But that's not the real point, the right tool for the right job is the point. A whole lot of processes that do not require another process to finish before the next one is where Beowulfs shine, if you want throughput or process with dependancies then a mainframe is your best bet.

But it's still nice to have an alternative for those of us who cannot afford a mainframe.

DanH

--
Cav Pilot's Reference Page
UNIX - Not just for Vestal Virgins anymore
Request for help by tbo · 2001-08-13 04:37 · Score: 5, Interesting

I'm trying to design a specialized data-fitting program to be used for accelerator-based condensed matter physics (and maybe ultimately other branches of science as well). I need information on adding clustering support to this program. Here's a brief description of what the program does:

The user writes a small chunk of code that calculates the function they're trying to fit the data to. We require the user to code the function him/herself because speed is important, and some of these functions are too difficult for Mathematica or the like to fit. Once the user writes their function, it's linked (dynamically) with the rest of the code. The user then passes in a parameter file, and away it goes.

Many of these fits can take days, and, since they often have to be repeated many times with slight changes to the fitted function or initial parameters, this is a serious concern.

Can this new approach to Linux clusters be used here? We have tons of Linux boxes lying around that are being used for other things, but have lots and lots of spare cycles. We probably couldn't afford a dedicated processing farm, but we could easily live with something like distributed.net where the program transparently takes all the spare cycles.

I know the problem is parallelizable, since each node can calculate the value of the function at a few of the data points, then send back to the "master" the chi-squared contribution of those points. Each iteration of the fitting process, the master sends out the current parameter values, and then the nodes grind away... There's not too much communication required.

One of my big concerns is how to get the user-written function from the "master" computer to all the "slaves". It's unrealistic to expect the user to manually install it on all the machines each time something in the function gets tweaked and it's recompiled. Are there pre-existing standards on how to send code to nodes in a cluster, then have it executed?

Any advice or pointers to good starting places on distributed computing would be much appreciated.

BTW, as a hint to all the other comp sci geeks out there--physics is a great place to find new and challenging computing problems (I'm not claiming this is one). In particular, the particle physics people often have to deal with spectacular data rates, and do extremely complicated event reconstruction. Check it out some time.
1. Re:Request for help by Anonymous Coward · 2001-08-13 05:10 · Score: 1, Interesting
  
  This problem is trivially easy if you use any of the well developed architechtures for distributed parallel computing. MPI is the most popular, PVM was, but is going out of style. MPI is a cluster based interprocess communication system + remote program invocation. Look at the websites of the major open source implementations of the MPI standard, MPICH (from Argonne Natl. Labs) or LAM/MPI (Notre Dame Uni.) for more information. There's a ton of tutorials and on-line training courses online. -Colin
Plan 9 style architecture? by Anonymous Coward · 2001-08-13 05:06 · Score: 1, Interesting

How in general is this different than the approach taken by Plan 9? (http://plan9.bell-labs.com/sys/doc/9.html)
Gnutella parallel... by Saeger · 2001-08-13 05:33 · Score: 2, Interesting

Sounds to me like they've rediscovered the concept of a supernode where it's acknowledged that not all peers are created equal.
(I know--not the best analogy)

--
Power to the Peaceful
if I understand correctly... by dario_moreno · 2001-08-13 05:22 · Score: 2, Interesting

what these guys want to do is to build, say, a cluster of 2 CPU system where one of the CPUs only computes while the other manages I/O and communications. Indeed, the I/O part is really a problem on Beowulves, and dedicating a CPU on it and communication can be cheaper than dedicated network cards like Myrinet (at 1000 $/port) or SCI, and hi-perf I/O like HiPPi. I wonder though if they can beat the price/performance ratio of the latter the way Beowulves beat on raw Flops the ones of traditional supercomputers.

--
Google passes Turing test : see my journal
Huge Market for Supercomputers Will Come... by Louis+Savain · 2001-08-13 05:20 · Score: 3, Interesting

Mentioned with reverence, but still slowly going bust.

The reason that high speed computing has not taken off is that there are currently no consumer apps that require it. Only a few scientific, research and governmental organizations have a need for it. However, let's say there is a breakthrough in AI technology, it will require googles of CPUs and memory. And when that happens, the market will explode.

People are going to want their mechanical maids, baby sitters, gardeners, chauffeurs, lawyers, companions, stock market experts, and what not. I predict they are going to crave their mechanical servants to the point of pathological obssession.

Don't be so sure this won't happen in your lifetime. In fact, there is every reason to suppose that it might happen anytime. There is an awful lot of minds thinking about intelligence and an awful lot of money being spent on it right now. IMO, the solution to the intelligence problem is probably simple. As Dr. Rodney Brooks of MIT says, "Maybe this is wishful thinking, but maybe there really is something that we're missing." Any day now.

In conclusion, I would recommend that you don't sell your shares in the supercomputing sector just yet.