A New Approach To Linux Clusters

← Back to Stories (view on slashdot.org)

A New Approach To Linux Clusters

Posted by timothy on Monday August 13, 2001 @04:19AM from the but-you-can-still-use-your-imagination dept.

rkischuk writes: "InformationWeek has an article about a group of ex-Cray engineers working on a new architecture for clustering Linux systems. 'It's not easy trying to build scalable systems from commodity hardware designed for assembling desktop computers and small servers.' Per the article, 'As the number of CPUs in a Beowulf-style cluster-a group of PCs linked via Ethernet-increases and memory is distributed instead of shared, the efficiency of each processor drops as more are added,' but 'Unlimited's solution involves tailoring Linux running on each node in a cluster, rather than treating all the nodes as peers.'" Looks like Cray engineers think about clustering even when they're not at Cray.

4 of 143 comments (clear)

In the words of Seymour Cray: by LocalYokel · 2001-08-13 04:33 · Score: 5, Funny

If you were plowing a field, which would you rather use?
Two strong oxen or 1024 chickens?

--
--
E2 IN2 IE?
Request for help by tbo · 2001-08-13 04:37 · Score: 5, Interesting

I'm trying to design a specialized data-fitting program to be used for accelerator-based condensed matter physics (and maybe ultimately other branches of science as well). I need information on adding clustering support to this program. Here's a brief description of what the program does:

The user writes a small chunk of code that calculates the function they're trying to fit the data to. We require the user to code the function him/herself because speed is important, and some of these functions are too difficult for Mathematica or the like to fit. Once the user writes their function, it's linked (dynamically) with the rest of the code. The user then passes in a parameter file, and away it goes.

Many of these fits can take days, and, since they often have to be repeated many times with slight changes to the fitted function or initial parameters, this is a serious concern.

Can this new approach to Linux clusters be used here? We have tons of Linux boxes lying around that are being used for other things, but have lots and lots of spare cycles. We probably couldn't afford a dedicated processing farm, but we could easily live with something like distributed.net where the program transparently takes all the spare cycles.

I know the problem is parallelizable, since each node can calculate the value of the function at a few of the data points, then send back to the "master" the chi-squared contribution of those points. Each iteration of the fitting process, the master sends out the current parameter values, and then the nodes grind away... There's not too much communication required.

One of my big concerns is how to get the user-written function from the "master" computer to all the "slaves". It's unrealistic to expect the user to manually install it on all the machines each time something in the function gets tweaked and it's recompiled. Are there pre-existing standards on how to send code to nodes in a cluster, then have it executed?

Any advice or pointers to good starting places on distributed computing would be much appreciated.

BTW, as a hint to all the other comp sci geeks out there--physics is a great place to find new and challenging computing problems (I'm not claiming this is one). In particular, the particle physics people often have to deal with spectacular data rates, and do extremely complicated event reconstruction. Check it out some time.
1. Re:Request for help by san · 2001-08-13 04:48 · Score: 5, Informative
  
  Hi
  The normal way to operate a cluster is to have a shared (NFS) file system across all the systems, thereby solving the data distribution problem (please note though that this prevents you from doing too much file base IO because it's too slow, you might want to make a local /scratch directory on each node)
  Besides the NFS share you'll need some kind of parallel programming library like MPI or pvm, and a job scheduler of some sorts. The libraries you can find on the web (maybe in precompiler RPMS, look for the mpich MPI implementation for a start), and will provide you with a programming framework for doing all the networking and setting up the topology. The scheduler can be as simple as the one provided with MPI/pvm (ie. you name a few hosts and your job gets run on those), or, if there's a number of people accessing the cluster at the same time, you might want to try a real queuer (like gridware).
  The parallellization is something you'll have to do yourself and it's the hardest part of clustering.
  Hope this helps :-)
Re:this brings up something.. by baptiste · 2001-08-13 04:33 · Score: 5, Informative

Try a MOSIX Cluster This type of cluster spreads processes out to the machine with the least load. A Beowulf can be done, but to take advantage of it, you have to run custom software that is capable of parallel processing.

--
Top Most Bizarre/Disturbing Error Messages