Ask Slashdot: Clusters On the Cheap?
First time accepted submitter serviscope_minor writes "A friend of mine has recently started a research group. As usual with these things, she is on a shoestring budget and has computational demands. The computational task is very parallel (but implementing it on GPUs is an open research problem and not the topic of research), and very CPU bound. Can slashdotters advise on a practical way of getting really high bang for buck? The budget is about £4000 (excluding VAT/sales tax), though it is likely that the system will be expanded later. The computers will probably end up running a boring Linux distro and Sun GridEngine to manage batch processing (with home directories shared over NFS)."
Subject
Why waste money on building a cluster when you can rent the best in the world * by the hour * ?
You have a limited budget, so it's more cost effective for you to lease time on someone else's equipment for now.
I've seen quite a few projects where people have stacked motherboards with spacers, using booting over Ethernet and a single power supply for multiple MBs. Google should be of use here, I'm trying to get my offspring to school so I'm cheating and not providing any links...
But the idea is that skipping the case and other components makes things cheaper. Leaving the rig exposed without a case also eliminates the need for most cooling.
.: Max Romantschuk
Actually, that's a good question... Assuming no time constraints, at what point does it make sense to buy hardware rather than use the cloud? Take that budget above (roughly US$6K) and the best hardware you can get for that price: How many months would you need to run it, flat out, to equal the number of floating-point ops EC2 would give you for that cost?
Infiniband is out in that budget. But you could see how far you could get buying some cheap quad cores and interconnecting them with GbE. You can take a look at TomsHardware cpu charts (e.g. for 3dsmax rendering since this is a similar task: http://www.tomshardware.com/charts/desktop-cpu-charts-2010/Image-Rendering-3DS-Max-2010,2420.html) and get the most bang for your buck.
why not try it?
Many universities/consortia have supercomputers available on which researchers can apply for (or buy) time. For example, my university is a member of VPAC, which has a big-arse cluster shared between a number of institutions. She might get much better bang for buck if she uses the money for that, rather than splashing out for dedicated hardware.
Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)
Is this program still around? http://research.yahoo.com/node/1884
You don't specify whether or not your friend would be working out of a colo. If so, space will be at a premium.
My needs are high reliability, low cost, and high density. (colo)
I've been providing an excellent bang/buck ratio using whitebox 1U rackmounts made by SuperMicro. For about $1,000 I can get a late model CPU with a decent amount of quality ECC RAM, dual Gbit Ethernet ports, SCSI / SATA3 interfaces with a chipset highly compatible with CentOS Linux. (my distro of choice)
This is server-grade equipment, optimized for I/O throughput and reliability over raw processing power. You may be looking for raw computational power with a higher tolerance of downtime, in which case you'd want to try something else.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
First thing is to run some benchmarks to find out which architecture is best. Next figure out how much memory per core and buy the systems with highest $ per performance. We recently went through this and for our highly-parallel workload. We found that the Intel Nehalem processors were faster by a factor of 2 or so than Opteron 6100 cpus. However, the Opterons were cheaper. We ended up getting boxes with 4 Opteron 6100 cpus that have 12 cores each. We need about 1 GB per core, so ended up with 32 GB of memory. We have an existing cluster with disk servers, etc, so went with 1 U boxes since we are somewhat space-constrained. Our communications need are modest so we are simply using gigabit Ethernet, though we are experimenting with channel bonding. We have found that it is worth buying name brand systems, though not by top-tier manufacturers. Our systems cost about $7k apiece as I recall. For us one of the benefits of buying more expensive systems was that they are considered equipment so we can use funds originally budgeted for facilities (overhead) expenses in our grant to pay for the computers. Without this the calculation might have been different. In the past the sweet spot for price/performance has been 1U boxes with 2 CPUs. We run CentOS on our cluster and like the stability combined with security fixes. We buy our systems with 3 year warranties and do not pay for service contracts. By the end of 3 years we have usually added some new systems and if the old ones die it isn't a big deal. The whole cluster thing works very well for us since we can add computing power in relatively small increments as funds are available.
Imagine a cluster of cheapness!
Table-ized A.I.
Check out a FPGA based solution such as NI's FlexRIO. Moving computation over to hardware makes things much much faster.
It runs sun grid engine with a NFS master on EC2.
http://web.mit.edu/stardev/cluster/
Why buy your own when you can use existing GRID infrastructure? For 4k you can't do much more that get a few decent desktops for yourself and a few grad students and/or postdocs. Rather than blow it on a massively underpowered cluster use the grid. I know the UK has massive clusters available to researchers so find out how to get an account and resources on them and use those. For test jobs, interactive analysis and other low latency tasks use your desktop.
You can get a SuperMicro reseller to sell you one workstation with 4 sockets of CPUs and a bunch of RAM. UK£ 4000 = 6 299.2 U.S. dollars
That buys you a box with 4 x Opteron 6134 (32 cores) and 128GB RAM (32 x 4GB sticks). And some hard disks.
An near-example of what Max is talking about can be found at the Home Linux Render Cluster. The builder threw six dual cpu motherboards into a small, gutted filing cabinet and Gig-e. Cheap, expandable.
However, if your friend hasn't got a very good idea how much mmmph she needs, the AWS EC2 rental idea has merit.
Luke, help me take this mask off
You can't really get a cluster for that kind of money. You can barely get one decent box.
But you shoud be able to rent a lot of computer time in the cloud for that kind of money, or use it to buy time on someone else's cluster.
I do not fail; I succeed at finding out what does not work.
EC2 is really expensive, brah.
Have you considered commercial services like Amazon? I believe some are pay as you use.
OP hasn't mentioned a lot except budget. Since you are on such a tight budget, I would highly recommend doing some theoretical analysis first. Do you have a serial code? How much parallelism exists in the code? You say the task is 'very parallel', but Amdahl's law (which is really common sense) will tell you that even for small amounts of serial sections of code, your speedup will be limited. You should also consider the amount of time the code actually runs. Achieving a speedup of 2 for a serial code that runs for one minute is near worthless.
After you estimate speedup, do some rough calculations on the basis of average cost of a processor and the the number of processors required. This should give you an estimate of the hardware cost required. Compare that with the cost of CPU cycles per dollar you get using a cloud service such as Amazon.
$1.60 / hour for the largest non-GPU cluster instance. This also provides you with rather fast interconnects and scalability with multiple instances.
Only £4,000 in hardware would be a waste of money. You wouldn't have all that much computing power, and it would be obsolete immediately.
It will cost her more that 4000GBP in grad student time to configure/manage the cluster. (Even using a turnkey installation system like ROCKs.) She should use the money instead to buy time on a national/university cluster system.
Study the design of the "microwulf" and it's relatives. Considering that hardware prices has dropped since 2009, your task might be achievable.
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
HP? Is that you?
Assuming a 1.5 to 1 correspondence with the USD, you're either getting a decent cpu box and no storage, or a reasonable amount of storage and no CPU. I build/run supercomputing clusters for molecular dynamics simulations at an university in upstate New York, and I wouldn't even consider attempting a cluster for less than $25,000.
Since the OP didn't specify if this was massively parallel or not, I'm going to assume this is so I can use AMD chips for cheapness.
First off, storage. Computational output adds up quick. You're looking at $7,000 USD for 24TB raw storage from the likes of IBM or HP or Dell. Yes, you can whitebox it for cheaper, but considering if you lose this box, nothing else matters (And I doubt you have the funds for proper backups), it pays to get hardware that's been tested and is from a vendor you can scream at when it breaks.
Second, interconnect. A cheap netgear will work, but reasonable internode communication is not cheap, especially if moving largish amounts of data. This could run $1000 to $3000
Finally, the compute hardware itself. A decent node will run $3000 to $5000 depending on the core count, socket count, GHz, and to a lesser extent RAM.
Assuming you want 128 cores, you're looking at 8 machines for compute ($32,000 right there assuming $4K/node, and dual 8 core chips), plus another $7K for the file server/landing pad, and finally add $1500 for a decent switch that can let those nodes talk to each other at line speed and allow room for future growth. Total cost: $40,500 USD or 27,000 pounds assuming the 1 pound:1.5 USD ratio.
Try a computer recycling centre, most tend to be short on storage and are happy to sell a large number of desktop machines at a lower than normal price per unit. Community operated ones tend to be more helpful than business ones though.
A game has objectives and is competitive, anything else is just play
Buy a small chunk of something that looks like the big machines she will be using. As others have said, with that little money you aren't going to get legitimate computational resources. But she will certainly qualify - or already has - on some of the larger public machines. In my experience, it is really nice to have a small, i.e two or three nodes, cluster to test and benchmark code. You can look at things like parallel performance on a single node versus across nodes. If the code plays well with shared memory. Can the code reasonably mix shared and non-shared parallelization schemes. And so forth.
46 & 2
They are going for like $3 an hour.
If your friend doesn't mind tinkering with the OS, it might be worth buying a bunch of ps3's. They disabled the old other os feature, but id imagine its not too hard to mod them , or do something along those lines.
$35/element, runs a boring Linux distro, runs very cool, low power consumption (less than 1w), onboard Ethernet.
Sorted!
Raspberry Pi
She could also consider creating a BOINC project. She could then do some publicity locally and on forums, to get people to choose her project. I've never tried creating a BOINC project, so I don't know how hard this is. However, I do run the client as a background task, and I imagine many other people do as well.
Enjoy life! This is not a dress rehearsal.
Spend the money on a programmer to parallelize the algorithm on standard CPUs, and put it out on BOINC. People volunteer their spare cycles for BOINC projects that are barely more interesting than the chemistry of aardvark snot. She would likely get volunteers if there's anything of even passing interest in her research.
If your friend doesn't want to do a lot of engineering work, then for this price I would just buy 10 or so PCs (depending on memory/CPU tradefoffs) from wherever has a special offer, plus a gigabit switch and put them on shelves. If you need a lot of memory, or can usefully share memory then that would be a bit different, but you can buy a usable headless PC for £300-£400. This will also not be terribly power efficient, nor will components like motherboards be of the highest quality, but you get more bang for the buck that way than almost anything else except second-hand. At the other extreme, you could probably buy a single 24-core AMD box for the money with quite a lot of RAM and just run a lot of processes on it.
Talking of second-hand, the other thing to do is to see if anyone has a cluster they can't feed (ie power) any more. Our aplied maths dept is about to shut down a 3 year old 1000-core cluster because they can't afford the power to run it and their newer 2000 core cluster. A slice of that would be great and someone locally might be able to help you in a similar way.
Microwulf.
"Tongue tied and twisted, just an Earth bound misfit
And how much space and air conditioning do you have? Depending on the answers do these questions, the optimal* solution might be 'get a bunch of 5 year old computers nearly for free.'
* Optimal for your friend, not for her university.
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
I was in a similar situation setting up a research group. Wanted an expandable setup for a research group, that would meet approval of local IT sysadmins (some remote management opts, vendor support). Per 2.6K pounds a pop I got a Dell poweredge T410 server with 2 6-core CPUs and 24GB RAM. I'm never one to push a Dell (been purchasing IBM/HP for years) but this is a decent machine for a decent price. I tried various cloud solutions using virtual machines on Amazon and similar frameworks, but for the kind of work we do (frequent software updates, massive amounts of data that need to be stored locally and can't be transferred easily), those don't scale. We use Condor as a job submission engine. Not that we don't like SGE but with Oracle's plans (http://en.wikipedia.org/wiki/Oracle_Grid_Engine) one can never tell. PS: remind you friend to invest in a QNAP NAS or similar for backups / disaster recovery.
I don't think any of the posters recommending EC2 have ever looked at the economics of EC2 versus self-hosting.
If you have long-term compute needs (as opposed to needing to throw lots of cores at a problem to get fast results in a short time), you're better off buying a Dell.
An EC2 Quadruple Extra Large EC2 instance is $1.60/hour. You have around $6500 USD, so you could buy 169 days of computer time at EC2 (ignoring the cost of I/O and network bandwidth).
This instance has 23GB of RAM and is equivalent to 2 x Intel Xeon X5570 CPU's.
For around $5000, you could buy a Dell R710 with dual X5647's, 32GB RAM, RAID-1 1TB SATA drives (depending on your storage needs, you might want to move to faster SAS disks). As long as you have a suitable office to host the server, your only recurring hosting cost is electricity (around $70/month) and maybe you'll need to spend $500 on a UPS. If you need to pay for hosting/colocation somewhere, that will definitely change the economics.
So, with your budget, you get one node + UPS + electricity for a year. All for the price of around 5 months of EC2 time.
You come out ahead even if you want to throw away the server every 6 months and start fresh.
You can save a few bucks by building your own (or going to a custom whitebox builder), but the Dell comes with 3 years of next business day support. Last time I priced out a whitebox builder, they beat Dell's best discounted price by about 10% and only offered a 1 year warranty.
Mums are nice.
Nice site. For details visits: http://www.loansunemployed.org.uk/
You're probably excited about being able to help her burn some money on BUILDING some ultra-cheap Beowulf-style HPC cluster but your friend is probably more interested in just actually USING a working production HPC cluster to solve her very parallel, very CPU-bound scientific/mathematical problem ASAP, and not so much into having to setup and administrate the HPC cluster.
Based on the little bits of info you've provided, I'm going to guess your friend is something like a new assistant professor/lecturer at some small research university? Maybe located in the UK or some other European country that prefers to use pounds sterling instead of euros?
Let's see here...4K in GBP is about 6.3K in USD right now. For that amount of money, I would check if the university or a larger affiliated academic institution might have some sort of HPC cluster that you can just pay for compute time on. Best bet is if she can talk to her colleagues involved in her specific research area who might already be aware of what's available. A Google search for "UK high performance computing" comes up with some possible good hits.
In the U.S., there are organizations like XSEDE (http://www.xsede.org/) which has a system where academic researchers can apply for an allocation of compute time to use on various HPC clusters at institutions affiliated with XSEDE. Some of these institutions will also separately independently rent out compute time (sometimes it's easier to just wave some money at some supercomputing center than deal with the hassle for applying for formal allocation based on the proposed merits of your project hoping that somebody will think it's worthy). See the Triton Resource (http://tritonresource.sdsc.edu/) at SDSC (http://www.sdsc.edu/us/tapp/tapp_pricing.html) as an example.
When your friend is actually comfortable just using a HPC cluster to run her compute/simulation jobs, publishes her results, gets noticed for her work by more senior peers and funding agencies, applies for more grant funding, and when serious amounts of money start rolling in ($50K-$1M+), then it's time to start building her own (which is another long story).
Main question is how specialized is software to run. If You are sure what kind of microkernels are required and its fixed over cluster lifetime, make it fpga/custom made cpu with IP logic extension. The best for buck if possible and allowing moderate variation in calc kernels are dsps, also ones becoming multicore lately (for exmple TI). If previous fail: biggest raw power but with high energy costs are x86 variations. If task is easily separable into tinytasks and energy costs do matter - hundreds/thousands of separate tiny arm cores of A8 or A9 grade (A9 planned and starting production for multicore cpus) on gigabit ETH link might suffice.
If you are extremely data heavy, the cloud becomes quickly much more expensive than buying your own. The Broad Institute made some recent experiments on Amazon analyzing genome experiments, and they said Amazon was 4x more expensive.
But for her cpu-heavy workloads the cloud would work perfectly.
more things to consider:
If she buys her own hardware there are a lot of extra costs to the raw hardware:
1 someone needs to set the thing up, administer it, and support it with patches, etc. even for a small cluster this is a good percentage of a person. if she has a slave student doing that, great! although if the guy leaves there will be an issue. if she needs to spend money on a person, then amazon will be muchmuch cheaper.
2 there is an electricity bill and you need space, probably cooled space. if it is available, great! otherwise it can be a showstopper (fire hazard in a lab)
These costs are included when people talk about total cost of ownership. If you factor them in, the cloud suddenly becomes veeery interesting. Btw, EC2 is not the only player on the market now, there is Azure and also the IBM SmartCloud, with competitive pricing.
For normal cluster computing you can go to a number of startups that will build an on-demand cluster for you based on amazon, my favorite in the research domain is cloudbroker - http://www.cloudbroker.com/ - who will actually render the software you need into a SaaS based on EC2 or the IBM cloud. you just launch your hpc cloud app, with your uploaded data and you pay for the workload you did. the bill includes just amazon or ibm costs, licenses if any and a surcharge by cloudbroker which is totally worth the money because now you do not even need to set up the software and the virtual machines anymore.
Somebody upthread mentioned BOINC, which is a great idea for many parallel-oriented compute-bound problems. However, while making your project compatible with BOINC is necessary, it's usually not sufficient. The problem is marketing, to convince enough people to run your work. World Community Grid, sponsored by IBM, is free and is an excellent way to solve that problem. You can submit a proposal, and if approved you'll quickly have lots of BOINC-powered computing working on your problem.
At least you'll be running on the bare metal, not some virtualized piece of cloud. http://www.hector.ac.uk/
Every end has half a stick.
Assuming the later, check among Supermicro & Dell servers. Last time I needed to setup a cluster, the Dell R610s were a good pick, giving great manageability over the LAN, low volume and decent features (balanced storage space along with cpu capacity, around 8 cores + 8 TBs per 1u blade). Don't rule out also options like Shuttle XPCs, they are damned robust in thermal aspects (hey, you'll be running these continuously, won't you?). Finally, don't underestimate the need for local sysadmining; you will likely need to setup a queueing system (Torque, *PBS*, SLURM, SGE, LSF, NQS, Condor) and manage the whole thing. This won't happen automatically, take a note on that. If you run something of the pbs or sge family I can happily help with setting up a tool called qtop
Call the IT depts in some big companies. They may be throwing out old desktops. If you are an institution then you can honestly turn what would have been a scrappage deal (which tends to be regulated and cost money) into a £1 sale from one institution to another.
Just beware the liability for scrapping the computers after the project. But hey, that probably comes out of someone elses budget.
Is the research group in a university? Most universities have a lot of computing power that sits idle for large proportions of the time in their undergraduate computing laboratories. There's a significant resource that could be exploited simply by deploying jobs to idle machines.
Sent from my Tianhe-2 (MilkyWay-2).
In the UK there are academic grids that research groups can use like the ngs or gridpp (for free or next to nothing) http://ngs.ac.uk/use-ngs I used to work at the Center for Parallel Computing where I am sure some people would talk to her. http://www.cpc.wmin.ac.uk/cpcsite/index.php/Main_Page
give some thought to data security as well please. If the research done is sensitive, don't use clouds.
On the other hand, costs of self-hosting are indeed underestimated very quickly, which is not a good thing when budget is low.
Also, while manycore machines seem cost-effective, look at the solutions you are using for computation; it is hard to press a 48-core machine to peak performance, much harder than driving more standard distributed-memory supercomputers. But this depends on your application.
Buying time on an existing cluster (local university, or a dedicated HPC company) seems the surest way, and also reasonably secure when done at a trusted institute or company.
Look up the MicroWulf project. Pretty much exactly the thing you want if you really really really do need to build it yourself.
Self-building clusters needs some careful planning before you start:
1) Don't do it if it you don't have to. Lots of resources around that you can use, especially if you're in academia.
2) Find/read benchmarks for the key components to your research (CPU in this case) and then design your cluster modules around it. Keep in the mind your price/performance ratio for each entire module, not just the core component. Fewer faster modules is usually the best way to go.
3) Cut out as much hardware as you can but don't skimp on important specs. Integrated NICs on motherboards are good but check they're PCI-Ex connected instead of vanilla PCI if you want the best transfer rates with lowest latency. If you don't need lots of data lying around on each module, ditch local hard drives - dual purpose a module as a data server, maybe a Netboot server if you fancy it, or consider booting from cheap USB keys.
4) Don't forget to factor in the price of network switches, Cat-6, power leads, etc - it can mount up pretty quickly!
First, see what's available. Many departments have computing options available that depend upon scheduling and departmental budgets.
If that doesn't work out, what are you doing? Serial processing or based upon existing software, then 'contract it out' (if that's an option). It's easier, and probably cheaper, especially in the early stages.
Parallel processing though needs more serious consideration. Cheap SIMD was being offloaded onto GPU's the last time I looked (which, admittedly, is probably too long ago). So it may be best to look to 'homebrew' configurations in that case.
I'm sure there are a lot of bitcoin mining rigs on the market right now. Which sounds like exactly what you want.
Bitcoin mining are purpose built machines to run OpenCL apps as fast as possible, with multiple high end AMD GPUs.
Recently the bottom has fallen out of the bitcoin price making mining for a large number of people and they will be dumping their rigs.
Check on ebay.
We got a customized 8-core (2x E5504) box with 32 gigs of memory from Leaseweb for about 200 euros a month. I'd say it's a great deal. I think if you talk to them, they could get you a custom cluster too. Definitely cheaper than EC2 if you use it constantly.
I think an innovative solution, assuming you have a defined timeframe for this project, is the following: Set aside a small portion of your budget (£250-£500 - I'm guessing) and set up a BOINC server on an EC2 instance; http://boinc.berkeley.edu/trac/wiki/CloudServer . It will probably not cost as much as I have suggested above as it won't be *nearly* as intensive as actually doing the computing required for analysis of the experiments but you can use some of the budget to pay a server admin to set it up for you if you are not very confident. Although, I am certain, if you looked around any of the big communities involved in grid projects (overclock.net, evga, etc.), someone would be willing to assist you for free. Go around to the major forums posting a message in their grid computing projects asking for assistance and offering a £2000 prize or donation to a charity of their choosing for the group that completes the most work units over the project's life. This may sound like a lot of hard work, but these groups are fiercely competitive and are extremely willing to help to any cause and it will be not as difficult as it seems when reading this. At the very least, I can guarantee you about 20 users from a grid forum I am part of that will contribute - at £0 cost. Best of luck!
You have a limited budget, so it's more cost effective for you to lease time on someone else's equipment for now.
In a fair and logical world this would be true.
In an academic setting there can be problems on what your allowed to spend the money on. If the £4000 is in the 'Computer Hardware' section of the grant buying AWS could count as a service and have to come out of the 'Consumables' section of the grant
The other important question that springs to mind is would £4000 last till the end of the grant. Knowing the typical research program I think it would be very hard to estimate this, and if they underestimate (or suddenly find they need to recompute the last years work) they may end up without their processing power at the end of the grant when they really need to get something pretty to go in the next grant application.
If they buy hardware it may be more expensive, but they can absolutely rely on it being available till the end of the grant (assuming there is a 3 year warentee and the University has some sort of building contents insurance) and if they find they need more power they could still go to AWS.
This does seem prime for outsourcing to a computing on demand but if you really want to build it.
Dual core all in one mini-itx boards have a low power connection, can run quite happily as dual core and emit a small amount of heat.
You can work on a module cost of around £100 for a dual core unit at standard retail costs even without volume discounts.
30 x Intel Dual core atom 4GB RAM No HDD ~ £100 per unit
48 port Gigabit switch £400
1x Storage array (dual or quad gigabit ethernet) £600 depending on the requirements
If you wanted to expand into GPU processing later on you may want to consider ION2 boards as that gives a huge advantage but at a cost.
Having a standard voltage/power/heat and slot in form factor goes a long way in building large scalable systems. Heat management is going to be the major issue in the physical design.
I see a lot of people suggesting the use of cluster instances on AWS. At first blush this is what they are built for, but it's not a gimme that they are the most cost-efficient option. From the description, the job is not targeting GPU, and it's also not network-bound. Some of the high-cpu instances are more economical if you don't need the gobs of RAM or 10 Gigabit pipes. The cluster instances do have somewhat faster CPUs.
AWS offers a MapReduce layer that supports all of these instance types (http://aws.amazon.com/elasticmapreduce/).
Cluster xLarge (GPU) = $2.10 / hour = $0.26 / hour / core = $0.063 / hour / cpu unit
Cluster xLarge = $1.60 / hour = $0.20 / hour / core = $0.048 / hour / cpu unit
High CPU - medium = $0.17 / hour = $0.085 / hour / core = $0.034 / hour / cpu unit
High CPU - large = $0.68 / hour = $0.085 / hour / core = $0.034 / hour / cpu unit
Throw in:
* Spot instances are discounted by over 50%. If your jobs can work on a range of instances, bid on a variety of cheap CPUs first.
* Reserved instances come out ahead after about 6 months of 24/7 usage, if you're going to use it that way.
All together, you could do something like this, with many possible variation. This gets you roughly 10 CPUs running 24/7 for 6 months, plus 3 hours a day of cluster compute time. And of course you don't pay for any time that you're not running so that could be reallocated.
5000 hours High CPU (medium) = $850 = 10,000 CPU hours
5000 hours High CPU (large) = $3400 = 40,000 CPU hours
250 hours spot instance (Cluster) = $150 = 2,000 CPU+ hours
250 hours spot instance (Cluster GPU) = $200 = 2,000 CPU+ hours
---
Roughly 55,000 CPU hours for $4500, leaving about $1800 for bandwidth, storage, or more compute time.
Point being, just like you can customize the heck out of box to buy, you can carefully craft a cloud approach more efficiently that just buying cluster time. If you just throw it at GPU cluster boxes, you could get half the work done (or less)...
I think it doable especially with Android based phones.
Data rates might be an issue but some things like SETI @home
didn't have real high data rates, with Wifi enabled phones this could be
mitigated to only work when Wifi was active.
100's of millions of phones moving in and out of a global cluster.
I think I just had a nerd moment.
google "32 trillion offshore needs IRS attention"
Spend the money on a programmer to parallelize the algorithm on standard CPUs, and put it out on BOINC.
No, DON'T use BOINC. It will take six months to set up the project and publicise it, and a further year to learn how to deal with the volunteers so as to get work done reasonably reliably. Half or more of the work you send out will never come back. Newbs will bother you with questions about BOINC and the sixteen other projects they're 'donating time' to, not your project. (Virus scanners cause people grief.) RAC hunters will badger you incessantly about minor discrepancies in points awarded.
Running a project through BOINC and getting value out of it means committing a *lot* of time and effort just on the PR side of the project-- setting up and maintaining a website and the Boinc infrastructure, and doing regular news updates, answering emails, getting rid of spammers, and so on.
Unless, of course, you don't get any volunteers.
Unless your work is very interesting to the general public and you can use a big PR machine, trying to use BOINC is a good way to waste a year and achieve nothing.
From http://hadoop.apache.org/ The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.
Does the UK/Europe have federally funded, shared computational resources for researchers? In the US we have what used to be called the Teragrid (now XSEDE) which is a network of supercomputers that are available for researchers. You have to write a proposal for machine time, but they're not all that difficult to get. The main disadvantage is that you have to submit your jobs via a queueing system, so your jobs usually don't start right away (having your own hardware does have its advantages) but the big shared resources have their advantages - you don't have to worry about maintenance, they usually have reliable archival resources, and every X years they usually replace the hardware with something faster.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
I had to use Windows and Monte Carlo my sim "locally". I sat two Optiplex 790s w/ 3.4GHz i7-quad cores on a rack shelf for under $3000. Their small form factor is a sweet chassis (can't say that about their Precision desktop). I moved the boot disk into the optical drive's bay, and installed a sub-$190 3TB 3.5" drive internally. No power supply for a serious graphics card but native is good enough for my sim. Many-hour runs are 20-25% shorter than my older X9650 and E5540 so I'm happy.
Is it a rule, that there's an exception to every rule?
What Stoney says. Especially point 1.
If you're past point 1, then at the risk of starting a flame war:-
For a compute-bound problem that *can't go on GPUs* and needs fast turnaround, you'd be silly to use anything other than Sandy Bridge at the moment. Core i7 2600 if you can tolerate consumer grade (keep a spare) or the equivalent Xeon, or better. (See, I told you, flames! Look, guys, I like AMD, I want them to succeed, but benchmarks are benchmarks.) For the combination of raw speed, FLOPS/watt, and minimum idle power, Sandy Bridge beats everything else at present. If you're paying for the power, you'll maximise your total computation per pound with SNB. And be much more likely to have your overnight jobs finished and waiting for you when you come in in the morning.
Regarding second-hand stuff, even if you can get it for free, if it's older/smaller than Core 2 Quad Q6600 or the equivalent Xeon, pass on it. There aren't enough cpu instructions executed per second or per kilowatt-hour in anything older. Some of the older cases can be re-used, though.
This is what I did:
- buy four headless boxes, Standard case, standard 500W PSU, Gigabyte GA-P67A-UD3P-B3, 500GB WD Green
- put a i7/2600K plus good cooler (ARTIC Freezer 13pro) on each mainboard
- add 3 PCI-E Intel NICS (Intel PRO/1000 GT) *per Box* (12 total)
- buy 3 x 2 x 1 (6 total) short CAT-5E GB-Lan Ethernet cables
- buy 4 long CAT-5E cables
- buy a conventional 8-port GB-lan switch
- use an old box for the server (contains some big disk shared to the nodes
and the programs running on the nodes)
- connect the four boxes via long cables to the 8-port switch
- install the boxes (by temporarily adding a videocard) w/Linux of your choice + ssh + OpenMPI + Libs necessery
- connect the boxes with one-another (each contains 3 extra NICS!) w/short cables like the edges of a tetrahedron (6 edges)
- configure the networks between the boxes, each edge is another subnet (10.0.1.0 to 10.0.6.0)
If all works out, you'll got a 4 x 8 = 32 (by threads) process parallel (MPI) machine.
I cosed the said boards because of the high power throughput, an i7 o/c to 4.2GHz
running 7/24 in 8 x 100% load *does* require a stable mainboard (12 power phases).
Open-MPI will figure out your network config. Your aggregated bandwith within
the cluster should be 4 x 2 x GB/sec (if full duplex).
Never ever include the onboard (to the switch) NIC into the MPI-allowed
interfaces. This won't work. The outer interface is disk-transfer and
c&c only.
Should be within budget.
Regards
rbo
Don't rebuild the wheel using existing clusters as a service or hosting is quickest, with guarenteed performance.
Butas an option.. I could suggest two products.. In your have access to a large collection of desktops ie the office machines the could be scheduled at nights to be computational nodes in either a windows hpc cluster, or an open source paralell computer cluster.. If need virtualize the server.. The only way to do it on the budget.. Beg and borrow resources and funding..in my environment I have over 1000 desktops
Also you will need a tech person to run it..
With a hosted solution they look after the tech resource need to create and manage the hardware..
http://linux.slashdot.org/story/11/09/13/2111210/Ask-Slashdot-Best-Use-For-a-New-Supercomputing-Cluster ...who posted that he was allegedly just getting the budget to build "...the largest x86_64-based supercomputer on the east coast of the U.S. (at least until someone takes the title away from us). It's spec'ed to start with 1200 dual-socket six-core servers..." but apparently has no idea what he's going to use it for.
If true, he'll have lots of cycles to sell for cheap and/or his organization is clearly not value-oriented so he'll probably sell time without much concern over price.
-Styopa
I do not know about your research/project. But, here on the university I do my research , once you get "money for something" you must apply for that... If the money is to buy a few computers for parallel programming, it must be used this way. And you can not use the same money for cloud/grid "rent a hour".
1 - is your program cpu intensive only?
2 - is your program memory intensive?
3 - how about disk usage?
4 - Are you guys "driving" a pre-made program/package/solution (like gaussian/columbus) or developing one ?
5 - How many people gonna use your cluster?
Best practices for any of those situation are, in case you are using "boring linux":
1 - customized kernel for better performance
2 - allocation of resources using CGroups
3 - in case of developing you own program you could read about "intrinsic functions" and make your program faster.
OK, I won't be too hard on the discussions above, but I read enough to try to give some real help to the OP. I get that this is basically an embarrassingly parallel application. So, that means a gigabit network is fine. That also means that single core performance is the ONLY indicator of the speed of the application. That means investing in anything AMD is a mistake. The best bang for the buck is quad-core Sandy Bridge CPUs. 4000 pounds is about $6300. I can build a quad-core 2.8 GHz Sandy Bridge node (2GB/core in a desktop case) for under $400 each. Cables, Gbit switch, and 15-16 nodes (60-64 Sandy Bridge cores total) will fit in the budget without too much effort.
OK...so, it isn't ECC memory. And it isn't general purpose. And it isn't going to run most parallel applications worth a crap due to the gbit network, but the point of building a cluster is to design it to match the application. 64 Sandy Bridge cores will run rings around any Magny Cores solution you can build for the same price.
something like this
I'm assuming the poster is in a university environment. If so see if you can use condor (http://www.cs.wisc.edu/condor/) installed on the university's desktops and just scavenge CPU cycles from them when they are idle. Alternatively, i'm guessing you are also in the UK in which can you can take advantage of the National Grid Services (NGS) and get compute time for free. See http://www.ngs.ac.uk/ for more information.
If you do go the buy-your-own-hardware route (or even if you use EC2?) look into using the ROCKS cluster suite (http://rocksclusters.org). Basically, you do a regular linux install on the head node, and everything else is set up so you just PXE boot all the compute nodes and it "just works."
I built a cluster with 36 nodes this way some years ago, and now it's rocking away with over 2100 cores.
ca 11keur
It took me three attempts to parse that before I realized you meant "ca. EUR 11k" ("circa eleven thousand euro.")
never thought I'd see the day where the general consensus was "just rent it!". this is slashdot, how can we not do this better, cheaper, faster and "free" than amazong's ec2?
Everyone wants to get the most for their money be it in good times or bad.
As others pointed out: use cloud first.
If you want your "own" grid, try to team up with other departments. Likely anotherone either did the same and can share resources with you or there is demand and they wait for one to set up a grid.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
If your friend works in the academia (from the currency mentioned, I assume you are from Europe), she can also try applying for an access to HPC-Europa infrastructure.
See http://www.hpc-europa.org/
And to this end, the compute task is CPU bound. Is it actually memory and thread bound or is it core CPU tick bound?
If it is memory bound, not thread heavy, and not a lot of floating point math I would suggest a cluster of Atom dual core ITX boards. 2 gig of ram per node, boot off a USB key and share all the data over GbE. That is likely to be the most active cores for the buck, but if there is a lot of floatin point math or thread/context switches... ouch.
Another option is to go with a lot of slightly older used 1u Xeon/Opteron boxes. Likely can be had fairly cheap from a surplus dealer. This is used kit so no warranty, which may be an issue with the grant, but again will get more processing for the buck.
Final option:
Call up IBM/Dell/HP/Cray/SGI and ask if they would like to help by "selling" a small cluster for £4K. They might just do it, knowing that you are very likely to expand later by buying from the same vendor that you started with (less integration problems).
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
http://www.beowulf.org
This kind of question comes up periodically, and as all the posts indicate, there's a lot of options, but depending on the type of computing you're doing, the answers tend to clump into several general approaches: buy compute cycles, loosely coupled cluster using Ethernet (good for embarrassingly parallel), and tightly coupled using a high performance interconnect (Myrinet, Infinband, etc).
Then, there's decisions about diskless nodes or not, etc.
I've built a few small clusters at various scales (some years ago, I published a design for "how to build a cluster entirely from things you buy at Wal-Mart for $2000)
There is appeal in having your own cluster that YOU control and you don't have to negotiate or pay for changes in resources. It's there when you want it. You don't have to worry about someone else having your data in their hands, etc. OTOH, you have all the infrastructure (electricity, HVAC, maintenance).
http://www.clustermonkey.net has articles on building a cheap cluster using commodity components (a true Beowulf)
There's also the whole problem of computers getting faster. If you have a year's worth of computing to do on a computer you can buy today, it's possible that you could do no computing for 1/2 year, buy your computer then, and get twice as fast a computer, and still finish at the same time.
For $6k, one can go out and buy a pretty powerful multicore/multiCPU desktop machine with all the mod cons. And it would probably be faster than any sort of 4-8 node cluster you build for the same price.
On reason to build your own small low performance cluster is if you are doing a proof of concept for something that will eventually be scaled up massively. That 6 node cluster will provide a platform to develop your parallel code and give you chance to make actual measurements on what the various performance limits are (is it interconnect? Is it memory bandwidth?) so that when you scale up to 1000 nodes (whether rented, bought, etc.) you can be an intelligent and informed consumer. You'll have programs to run as benchmarks that are "your real problem" as opposed to trying to figure out if some sort of SPECmark or Dhrystone number relates. If you ARE buying a 1000 node cluster, the cluster vendor(s) will typically have some machines you could run your code on to try it out.
However, be aware that rolling your own cluster is not a "plug and play" 1 hour setup kind of thing. You're going to spend a week getting it all going the first time, and then several months fooling with it to get your tools and configurations the way you like it, etc. It's a heck of a lot of fun, if you're into that sort of thing, and it is substantial geek-cred to say "oh yeah, I have a cluster supercomputer in my garage, but I don't use it any more, because ..."
Had second thoughts about £4000 lasting long enough. If they can find a service that costs ~15p/hour (and it seems reasonable that they could) that would provide 24/7 computing power for a 3 year grant and better than that, any unused time/money could be kept and used on special occasions/rush jobs...
The point about restrictions on how the grant is spent still stand though :-(
4000 pounds is like what, $6000? Not much you can get with that. Do you really want to build a cluster? As other said, you can rent a cluster but HPC in science is hopelessly inefficient and comes in with large datasets (bandwidth is usually metered) and spits out even larger datasets (bandwidth is metered + online storage is metered) and you'll be paying through the nose because somebody (the researcher) doesn't know how to program correctly. First of all, I would definitely recommend you look around in your institution or in peer institutions whether or not there is already a cluster you can rent (or use for free). Most large institutions have a supercomputer and even smaller departments (like Physics, Astronomy and Imaging or Visual Sciences) have their own small cluster that is not 100% used. You may need to work within the framework of your local politics about that and make concessions as far as time allotments and constraints go but it's cheaper (or free).
If you want to go the DIY route, why don't you just buy a machine from Supermicro (their 3U's are both towers and rack mount) and fill it with a good amount (at least 2GB/core) RAM and 2-4 processors (eg. 6-core or 8-core AMD Opterons), a couple of 2TB hard drives (mirror) and you're pretty much through your budget especially if you want to throw in an nVidia Tesla card. If you want to, you could use virtual machines on those system with Xen or VirtualBox (whichever fits better)
The advantages of this approach is this:
- Much less maintenance even if you go the virtual machine route
- Much faster interprocess and device (storage etc.) communications - interprocess communications over gigabit kills performance unnecessarily on small cluster. Larger clusters have InfiniBand.
- You can still expand it later with another machine like it and use cluster software then.
- Footprint is smaller. You can fit these machines in 3 or 4U and they come with a 1.5kW power supply (to support the GPU's). You can buy about 6 1U systems for roughly the same price but you've doubled your rack usage and the power supplies are together about 3 kW because now you've got to power the motherboards and peripherals of 6 devices.
- GPU computing cannot be done in cheap 1U devices. And even if you can (and spend a little more and get only 4 machine), only 1 fits, rarely 2. The 3U solutions fit 4 (sometimes 5) GPU's perfectly and are built (power supply, cooling) for it. Even if you don't do GPU computing right now, even MATLAB can offload certain functions to gaming GPU's. They have a little less memory than the Tesla's but they only cost $150-250 (compared to the $700-1200 for a Tesla (EDU discount))
- No need to maintain cluster software (and it can be a pain in the neck)
- It fits under a desk and doesn't take up rack space if you don't want it to. No need to pay for hosting or cooling, no extra noise.
Custom electronics and digital signage for your business: www.evcircuits.com
cheap to do with existing botnet systems.
You can't handle the truth.
depending on the nature of your research group (academic, government, military, private...) you may well be able to have cluster time free for the asking. from my experience you may have to 'apply for a grant' which is really filling out a form. the cluster i have access to has many nodes with 64 gig of ram and all nodes are stuffed with gpus as well. it never makes sense to spend money on computer equipment that you will spend a year or more learning to use. do as much development on borrowed equipment as you can and when you have working software already implemented, buy hardware as needed. good luck!
In November the $25 Raspberry Pi computer will become available. Check it out.
Another option is to use instant messenger based distributed computing solutions.
http://ulno.net/f2f/
It will work fine for a few CPUs. But if you get hundreds writing to NFS at the same time, expect bad results. Many simple IO strategies do just that.
Remember, a supercomputer is a device for changing a CPU bound process into an IO bound process. Make sure your IO can handle it.
For our HPC clusters, we run torque on Linux (CentOS), which is descended, I believe, from beowulf. No scaling problems at all. Get servers with the most cores you can afford, put this on, and away you go.
I will note that the code has to be aware of parallelism, and fork.
mark
I'm assuming this is at a university - are there other facilities available already?
How long will the CPU-burning requirements last? Does it make sense to buy hardware, or to rent time on Amazon's cloud? Is it worth spending a month of programmer time to port to GPU/DSP if it saves you three months of computation? Have you done any models on what you need?
When you say "CPU-bound", what do you mean? Is it fixed-point or floating-point? What precision? Is it large-memory or small-memory? Is it a standard problem space, like image processing or cryptography? For some problems, e.g. small memory fixed-point, you can buy DSP boards that will be several orders of magnitude faster than generic PC hardware, and won't require much application porting.
Do you have a spare grad student to do hardware/sysadmin grunt work? For 4000 pounds, you can probably buy about 40 sets of motherboard+power supply, if you have a grunt to build boxes for them, or about 20 sets of pre-built desktop PCs, or about 4 high-end Dell rack servers.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Look up something like Condor. It will let you use the spare cycles kicking around in the workstations in your friend's lab and other labs that are willing to install the client. It can be setup to only run when the computer is idle, to run on her labs computers first before using others etc. Also once it is setup any lab in the building/network/world could use it provided that the admin approved it.
NSF grant that is has been used to develop open source heterogeneous computer cluster software at the University of California San Diego:
http://www.rocksclusters.org/wordpress/
at $25 per 'box' you'd get around 216 of them, which is more than a fair start
Is this an MPP problem or SMP problem?
Does she have access to high speed network equipment or will she have to buy everything on her own?
Is this an embarrassingly parallel problem?
Does the problem or the math parallelize well/easily?
Does the problem or the math SCALE well/easily with more CPUs?
Does each partition/portion of the problem run independently of the next (very little inter-core communication)?
Is she writing with OpenMP or MPI?
Is latency and synchronization a problem?
For about 4000 GBP, she can probably build a dual Xeon X5690 system with 64 GB of RAM and two 240 GB SSDs.
Does the simulation generate a LOT of data?
Does it make sense for her to use CUDA or OpenCL (GPU accelerated computing)?
What language is she writing in?
Here is something I may do one day: 10 nodes each with an intel core i7 2600k (around 100 gflops). If you network boot and have headless nodes then each can easily be built (with 8 GB ram) for £400 So in conclusion approximately 1000 gflops (more if you overclock) for at most £4000 (thats including VAT)
Just get with the guy who did the ask Slashdot the other day that didnt know what to do with his supercomputer.
Are you sure? Might want to take a look at this.
Instead of hardware why not program software for distributed computing (like SETI@home) then you need only write the software and get people to install it on their systems. (yes I said only)
Perhaps set up the software so that other research groups could utilize the spare processing cycles thus getting a huge distributed processing set up for use by multiple Universities with different research needs brought together by a common processing software.
Make it modular not all research has the same needs.
With the correct media spin it could happen, and I have always wondered why schools don't use their lab computers for distributed computing at night, like I did by installing Vue Infinite render nodes on all the math lab computers.
No one ever noticed they were running.
"If any question why we died, Tell them because our fathers lied."
Seems the thread has it all pretty well nailed,
- if this is a 'typical' academic environment, it means you probably have (effectively) free electricity, cooling, and (more or less) labour (undergrad/grad students; and the row of PCs can be stuck under a bench in a spare office or something similar). ... etc ...) and you are good to go.
- such a scenario, it is hard to beat COTS boxes - I would probably lean towards standard consumer-grade 6-core phenom CPU with 16gigs ram per node to get the best bang for your buck (and commodity gig-ether interconnect). Such a system is well under $1k Cdn or less even depending on how much (little) disk you need to toss into the machines; so you will end up with half a dozen boxes and 36 cores with your budget (maybe more)
- Just throw on a standard tin-opener HPC platform (Pelican HPC, ROCKS,
If you are in an environment where power:cooling:labour factor in significantly, clearly the economics of operation costs become bigger quickly. .. is not prohibitive.
Certainly if you have access to free HPC cycles, that is a great option as well; so long as the effort involved in getting your data to the presumably-remote-site; then queuing jobs; and retrieving output
Certainly fewer SMP boxes (dual-12 core opteron or similar) has significantly more versatility for jobs which are SMP rather than serial:parallel in their nature. But the initial question posted seems quite clear on this.
Cloud based services (Amazon, etc) tend to be a bit overpriced when compared to environments that have heavily subsidized cost structures (ie, often, academia for example) - "The Cloud" often makes 'the best sense' for 'peaky' workloads (think: Xmas sales spikes) rather than "CPU Saturating behemoth job batches"; and are priced to be competitive with "enterprise data centre hosting services".
Like most projects, the requirements are well known and won't change. Until they do. :-)
--Mr.Tim
Parallel implies a parallel programming environment.
First look at Rocks and MPI. Next look
at Hadoop. Start with four hefty multi core desktop
that folk also work on. Boxes where high end GFX
cards would be cool and happy. Explore programming languages
and quality compilers. A good compiler can reduce
hardware costs. I have seen compilers improve results
by 70%.... in some cases. Language choice can matter.
Build check point and restart into your application design.
Watch data storage and distribution. I/O can keep many
processors idle waiting for bits to chew on.
Well a research group should do some research.
Security and revision control are important.
When starting a research group one important
and largish investment is the desktops and local
storage to manage the code and the data.
A startup should start with dual purpose resources
when possible. Code design should begin with
some notion of progress and checkpoint and restart.
Building reliable infrastructure is a royal PITA.
The desktop tools and cluster tools should play well
together.
Do research the various cloud resources. Optimum
use of cloud resources can depend on the smallest
initial design decisions.
As always read Jon Louis Bentley's "Programming Pearls"
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
Has someone already mentioned creating a cluster out of SONY PS3 ? There are some universities and research labs in the US that makes use of a cluster of SOny PS3 machines .. and they do serious research with that ..