Where to Spend $1M on a Cluster?
Natchswing asks: "My university has been given a $757,825 NSF grant to build, 'A 256 node (128 pair) Beowulf parallel computing cluster ... to improve the realism of gravity-wave modeling by permitting treatment of the three dimensional problem and multiple wave interactions.' They want to pay a company to just show up and drop off a functional cluster rather than build it themselves. Since word has leaked out regarding the purchase intent, every computer manufacturer under the sun (including Apollo himself) has called up trying to sell their cluster. Since I'm no cluster expert, I'm writing Slashdot. If you had $0.7 mil to buy a pre-built cluster who would you go with and why?"
Have the companies submit bids... then compare them and make a decision.
This isn't rocket science.
Conformity is the jailer of freedom and enemy of growth. -JFK
I'll do it ... send via paypal to my /. username @ yahoo.com ...
It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
Penguin Computing does this kind of stuff for a living. I think they are an all open source shop, too... There may be others, too.
get 7 free Japanese lessons.
Does anyone else see the VERY obvious discrepancy between the submission title and the submission? Where are the editors? Last time I checked, 0.7 mil != 1 mil.
could you say, "Imagine a beowulf cluster of those!" and actually be asking a legitimate question!
Condemnant quod non intellegunt.
Shit, my department needs to take lessons from you guys. We need to specify the budget down to the last rubber foot and network cable just to have them review the application, and you got a grant without even having an idea what you were going to spend it on?
A cluster of storage? Perhaps you mean the Xserve itself.
They even have a page on clusters.
I think that you should look at your intended application.
- How much disk space are you going to need in total?
- How much disk space are you going to need per node?
- How much RAM is each node going to need?
- Is your application going to benefit from a low-latency or a high-bandwith connection between nodes?
- What about cpu? which cpu family will provide the best bang/$ for your calculations? PPC or X86? x86-64 maybe?
Once you know what you need, put it together in an RFP and send it out to every company that shows up under a google search for "beowulf cluster"
Review the responses and pick the best.
Since you are asking this question here, I'm going to refrain from suggesting the better option which is to build your own.
Hector
3. Pocket the leftover $499.5K
I run a 48 Node Microway beowulf and I must say that it is the most stable system available. Everything came assembled and ready to go (of course, I built the enclosure and did the networking, but they will do that for you if you'd like). If you're not very knowledgeable about beowulfs, how do you know you'll need so much power? Do you know how well the software you will be using will scale? Is it close to embarassingly parallel or does it lose efficiency over X nuber of nodes? What type of resources and consumption does the program use? Is it extremely processor hungry, or does it deal with dense matrices and require low-memory latency and high bandwidth or both? Do you know if you will need the power of Myranet or will you be able to get by on GigE?
These are important questions you must ask your researchers and yourself before you purchase this cluster. But, to answer your question, I believe Microway is the best choice and I plan on having them build our next cluster in the next fiscal year.
-brian
Whoever you chose to go with (I'm partial to Apple, but that's just me - and just because they have sexy hardware), see if you can get them to give you either more for your money, or free implementation/consulting help, or something like that in exchange for using your implementation as a success story. I think Virginia Tech got a bunch of free stuff from Apple when they decided to build their supercomputer.
All these vendors want to be able to talk about their work. Letting them use you for marketing may help you get more for your money.
I currently maintain some Opteron based Angstrom Microsystems Linux clusters. We've had them for less than a year, and already 30% of our nodes have had to be replaced. Support has been a nightmare.
Sadly, I was not around when the proposal was made, otherwise I would have rejected this cluster outright. There is no way to hook external storage up to this beast. There is no USB, Firewire, SCSI, external SATA, or fibre channel options. You can't even run an ATA cable out of the thing without drilling holes into the blade walls.
Personally? I'm looking at an XServe or an IBM Bladecenter.. but maybe it's just because I'd like some real support.
First of all, you really should put out an RFP for your cluster.
.75M, like maybe doubling your size and getting AMD64 nodes. Look at your primary problemset first, see if it's IO-intensive or CPU-intensive to figure out what you want in the way of disk/networking.
We've got a 128 node (1 cpu per node) cluster from Atipa http://www.atipa.com/ that cost CDN$ 0.25M.
128 P4 Xeon, 1GB RAM, 120Gb IDE, Gigabit Ethernet.
I'd expect you to get a lot more for your USD$
The only thing I don't like about it is Atipa's configuration of Redhat8 (they didn't offer anything newer at the time). Look for something newer there.
Atipa is one of the suppliers for SGI-branded clusters as well.
I'd really like a cluster from http://adelielinux.com/en/, but I wasn't aware of them at the time we did our RFP and cluster purchase.
ICQ# : 30269588
"I used to be an idealist, but I got mugged by reality."
I see you mentioned the problem set, which is good. to me and my only somewhat novice mind (I work with scientists all day, hear all kinds of stuff), this sounds suspiciously like a fine grained problem. that is to say, there will be a lot of interprocess communication, so don't skimp on the network. I'm not talking "get gigE". I'm talking "look at myrinet, or quadrix, or infiniband".
Most people can do you up a 256 node cluster for under half a million, but doing up one with high speed and low latency network is another story. that net costs bucks, around $1500 per machine (for a card, a cable, and a port on the switch).
Make sure you know your problem. if you understand how it works, then you can buy a cluster that meets the need much better. Make sure the nodes are not being starved for ram iif the problem is a ram hungry one (your researchers should be able to tell you, even from data off a single machine). Find out if it's heavily integer based or floating point based (my guess is that it's a floating point problem). Find out if it's a lot of vector and matrix manipulations.
every machine type is a little better at something than the others. for instance, on integer based problems, x86 will generally scorch everything else. on floating point, there is lots of good competition (apple, intel's itanium, opterons). Don't be afraid to say to the vendor "we want to run on it for a week before we buy".
as others have said, do a RFP, but get their specs. get your tech guys to look it over hard. ask them "what sucks".
All that having been said, i'm a fan of apples and of verari systems. Dell is also quite good.
-- Who is the bigger fool? The fool or the fool who follows him? --
If you haven't already, google for beowulf clusters at other universities and contact those departments.
You'll be back, believe me. You'll be back in no time.
..Dell
...IBM
and *talk* to a sales rep. I know how hard this is (not!) but asking Slashdot is kinda silly. Sure you might want some impartial advice but
ciao
When my research group decided to build one, I was incharge, opted for OpenMosix and after a tweaking period worked really well. Now with the various bootable CDs with OpenMosix (PlumpOS, BCCD, Quantian, ClusterKNOPPIX...), tests and upgrades are done by just pressing reset !
Of course with clusters your mileage may vary.
Non-Linux Penguins ?
While we're on the subject of spelling, I just thought I'd point out that you need a knew spellchecker... or maybe you already new that?
My other car is first.