Building a Linux Cluster from the Ground Up?
dooling asks: "How would one go about building a Linux cluster from the ground up? I read a lot about Linux clusters on /. and have been able to find some information on configuring the cluster, but have found little on how to assemble the hardware, i.e., what is necessary, how they should be connected, etc. So does anyone have reliable information on hardware assembly and configuration? Also, (if you've never done this before) is it worth building your own, or is it better to just buy one prebuilt and preconfigured? If you want specifics: 20-40 machines, Linux (probably RedHat 6.x), disk or diskless?, do not need video cards (but should we have them?), switch or hub (best way to hook them up). We will be doing pretty straightforward scientific computing (floating point number crunching). "
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
A friend of mine put together a cluster in our high school. He did things a bit differently
:)
...)
1. Custom-build
Definitely the way to go. You can get nice machines for a few hundrew bucks each. But put cheap video cards in -- it makes maintenance much easier. And some MB's may not boot without them. Spare ISA ones that are lying around should do the trick -- you'll never be taking them out of text mode. The machines we got had cheapie vidcards on the MB, which was fine.
2. HUB
Switched all the way.
3. Versions
If you want the latest ver, use the ones from Debian potato (unstable)
5. CPU
I recommend Celeron 450a's (300a's OC'd to 450MHz). You'll need a nice motherboard that will let you set core CPU voltages (or some Celerons may not OC, which happened to us). But there are some relatively inexpensive dual motherboards that let you set core voltage, and Celerons and slockets are still pretty cheap (our machines were before the slockets came out, so they're single CPUs).
Celerons are far faster than equivalently clocked K6's, so go with Intel unless you want to spring for K7's (now that would be slick!).
6. NICs
I'm not sure what the gigabit advantage would be -- probably depends on what you're crunching. Obviously if computation time is high relative to data quantity, you're fine. But gigabit equipment is expensive. Consider ATM equipment -- fast and cheap. ATM switches are way cheaper afaik, and there are a couple of ATM boards supported by Linux. It's ideal for this kind of application since only the head node (which would then need another NIC) needs to talk to the outside world.
8. PGCC
I've seen no indication that
a) Pentium-optimized code is particularly better (and I suspect its stability
b) It's faster at all on PPro-based chips. Optimizing for PPro and Pentium are two very different things. I'd just go with a standard Linux distro -- it'll make your life easier.
It's irrelevant anyhow, since your code will make all the difference. It might be worth playing with different compilers to see what makes your stuff go fastest. Post the results for the rest of us!
9. Overclocking
Any moron can OC a Celeron 300a to 450MHz with a decent motherboard. Beyond that takes guts and skill -- and may not be worth it, since a decently sized cluster (ours was 16 machines) will start to show some variance in chips -- as we found out. On the upside, the load balancing software should be able to compensate just fine if you have a few dud nodes. Many Beowulf clusters are heterogeneous.
Look at the Beowulf Underground for an excellent compilation of links, resources and software.
Also, if you're not gonna need some of the more specific software (kernel patches, ethernet channel bonding and the like that usually come in RPMs) but just want to implement generic MPI or PVM, I'd go with Debian next time instead of RH6.0, purely for maintenance reasons.
The big question you have to ask yourself though, is what kind of application you want implemented, and build the cluster to match it... if you don't have an application in mind, then you probably don't need a Beowulf...