Ask Slashdot: Clusters On the Cheap?
First time accepted submitter serviscope_minor writes "A friend of mine has recently started a research group. As usual with these things, she is on a shoestring budget and has computational demands. The computational task is very parallel (but implementing it on GPUs is an open research problem and not the topic of research), and very CPU bound. Can slashdotters advise on a practical way of getting really high bang for buck? The budget is about £4000 (excluding VAT/sales tax), though it is likely that the system will be expanded later. The computers will probably end up running a boring Linux distro and Sun GridEngine to manage batch processing (with home directories shared over NFS)."
Why waste money on building a cluster when you can rent the best in the world * by the hour * ?
Many universities/consortia have supercomputers available on which researchers can apply for (or buy) time. For example, my university is a member of VPAC, which has a big-arse cluster shared between a number of institutions. She might get much better bang for buck if she uses the money for that, rather than splashing out for dedicated hardware.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
You can get a SuperMicro reseller to sell you one workstation with 4 sockets of CPUs and a bunch of RAM. UK£ 4000 = 6 299.2 U.S. dollars
That buys you a box with 4 x Opteron 6134 (32 cores) and 128GB RAM (32 x 4GB sticks). And some hard disks.
OP hasn't mentioned a lot except budget. Since you are on such a tight budget, I would highly recommend doing some theoretical analysis first. Do you have a serial code? How much parallelism exists in the code? You say the task is 'very parallel', but Amdahl's law (which is really common sense) will tell you that even for small amounts of serial sections of code, your speedup will be limited. You should also consider the amount of time the code actually runs. Achieving a speedup of 2 for a serial code that runs for one minute is near worthless.
After you estimate speedup, do some rough calculations on the basis of average cost of a processor and the the number of processors required. This should give you an estimate of the hardware cost required. Compare that with the cost of CPU cycles per dollar you get using a cloud service such as Amazon.
$1.60 / hour for the largest non-GPU cluster instance. This also provides you with rather fast interconnects and scalability with multiple instances.
Only £4,000 in hardware would be a waste of money. You wouldn't have all that much computing power, and it would be obsolete immediately.
Don't forget to add up power, cooling, sysadmin time...
If the friend's research group is in an academic institution, power and cooling are outside of the acquisition budget, along with space, network, etc., as those are typically part of overhead. Depending on the institution, sysadmin services are too. Often the institution will even have embarrassingly large discounts with hardware and software vendors (at my institution, a licensed copy of Matlab, for example, is about $100 per seat per year).
GBP 4000 buys a rackfull of modern computers that can be run as long as you want. It can be used to explore ideas without concern for cost. In contrast, once the GBP 4000 has been paid to a cloud service, the money is gone. Given that the pressures for a new researcher are already immense (and I speak from recent first-hand experience) not worrying about running out of compute resources, even if it means the instantaneously available compute power is somewhat lower than what you could get from a cloud service.
If this new research group is going to be competing for research funds, for example, then the compute resource is going to be highly utilized for the first 12-18 months to get preliminary results in order to write grants. I can't imagine that GBP 4000 is going to last long enough. Looking at Rackspace, as another poster suggested, they charge about USD 350 per decent configuration (8GB RAM / 320 GB disk) per month. That single server is going to last 18 months before the money is gone. If the memory demands of the computation aren't so large, then the charges are lower, say USD 45 per month (1GB RAM / 40 GB disk), then you get to use 7 virtual machines for the same 18 months.
Given that a highly capable system can be purchased new for USD 500, the same money gives the researcher about a dozen real machines for 18 months, and beyond (buying off-lease machines can easily double the amount of hardware). From my perspective as a researcher, there's no comparison: when money is tight, buy your own hardware and take advantage of the services provided by your institution.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
The problem is the constraints. The cheap cluster in my old department cost £100k. £4k does not buy you a lot of hardware. You will probably find a lot more lying around in the undergrad labs. For some of my work as a PhD student, that's exactly what I used - each lab had 40 machines on a GigE network and closed overnight, and for work that wasn't that latency sensitive, I could distribute it across the machines there and run it at night without anyone minding.
If you're serious about needing a cluster, then you need to spend a lot more than £4K. If you only need a cluster for a short time, then £4K can buy you a chunk of time on someone else's hardware. Since this is the UK, they should contact the Manchester Supercomputing Centre, which provides this kind of service to UK universities at quite a reasonable price (and will also lend you people who are good at optimising code for their systems). If the university doesn't already have some clusters lying around, then you should get in contact with a few other research groups. £4K won't go very far, but if half a dozen research groups each put in £4K then that gives you enough for a reasonable cluster to share between the various users.
I am TheRaven on Soylent News
or use a 16GB or 32GB USB flash (or better yet, a small SSD - swapping to USB flash would suck) as the boot drive on most machines and have one machine (the head node) with hard disks as a file server - NFS will do for small to medium size clusters (anywhere from a handful of nodes to a few hundred nodes). The OP is going to need a head node anyway to run Slurm or Torque as the scheduler/resource-manager (yes, i have built clusters before).
put a 2nd NIC in the head node, so the compute nodes can run on a private 192.168 network (you'll need a 24 or 48 port switch as well), and also install DHCP, tftp, and apache. Set up the last three to allow the compute nodes to netboot clonezillla....install everything you'll need on one compute node (openmpi, libatlas, octave, R, open source and proprietary scientific software as needed, etc) and use clonezilla to mass produce the rest (also allows you to quickly and easily add new nodes or replace failed nodes). LDAP or NIS will be needed for sharing account/auth details between machines.
i built something quite similar to this last year (but using some sunfire 1RU opteron rackmount servers as the compute nodes)
I'd go for an x4 CPU, they're not that much more than an x3 and the extra core is useful. 8GB RAM too, 2x4GB only costs about $40). given the budget, it's probably not worth getting a custom power supply for the tray-mounted motherboards, so each will need its own dedicated PSU
each node is going to cost somewhere around $250 (very rough estimates: $50 for the m/b, $40 for 8GB RAM, $50 CPU, $50 PSU, $60 for 32GB SSD - but possibly a fair bit cheaper as a bulk purchase), and the head node will cost roughly triple that (you'll need a case w/ hot-swap bays for the drives - a Norco 4224 is probably overkill but at well under $400 for 4RU with 24 SAS/SATA hot-swap bays, it would be hard to find a significantly cheaper case even with less drive bays) so for $6K you can build a cluster with 20 x 4 core compute nodes plus a good head node for the scheduler & file server). 80 compute cores for $6K. that's good, even considering that with cheap crap motherboards you'll have a noticable failure rate. the cluster i built last year with name brand hardware cost closer to $50K. I could build a better system today (far less nodes with a lot more cores and RAM each), also with name brand hardware, for about $20K - $30K
trays for the motherboards, the rack(s), and cooling will cost extra. as will licenses for any proprietary software they might need to run (could easily cost as much - or more! - as the hardware). if the OP's friend is at a university, she can probably scavenge an old rack or two from another dept, but even if she has to buy one new she could easily build 15+ compute nodes entirely within the $6K budget