Slashdot Mirror


Ask Slashdot: Building a Cheap Computing Cluster?

New submitter jackdotwa writes "Machines in our computer lab are periodically retired, and we have decided to recycle them and put them to work on combinatorial problems. I've spent some time trawling the web (this Beowulf cluster link proved very instructive) but have a few reservations regarding the basic design and air-flow. Our goal is to do this cheaply but also to do it in a space-conserving fashion. We have 14 E8000 Core2 Duo machines that we wish to remove from their cases and place side-by-side, along with their power supply units, on rackmount trays within a 42U (19", 1000mm deep) cabinet." Read on for more details on the project, including some helpful pictures and specific questions. jackdotwa continues: "Removing them means we can fit two machines into 4U (as opposed to 5U). The cabinet has extractor fans at the top and the PSUs and motherboard fans (which pull air off the CPU and remove it laterally — (see images) face in the same direction. Would it be best to orient the shelves (and thus the fans) in the same direction throughout the cabinet, or to alternate the fan orientations on a shelf-by-shelf basis? Would there be electrical interference with the motherboards and CPUs exposed in this manner? We have a 2 ton (24000 BTU) air-conditioner which will be able to maintain a cool room temperature (the lab is quite small), judging by the guide in the first link. However, I've been asked to place UPSs in the bottom of the cabinet (they will likely be non-rackmount UPSs as they are considerably cheaper). Would this be, in anyone's experience, a realistic request (I'm concerned about the additional heating in the cabinet itself)? The nodes in the cabinet will be diskless and connected via a rack-mountable gigabit ethernet switch to a master server. We are looking to purchase rack-mountable power distribution units to clean up the wiring a little. If anyone has any experience in this regard, suggestions would be most appreciated."

26 of 160 comments (clear)

  1. Imagine by BumbaCLot · · Score: 5, Funny

    A beowulf cluster of these! FP

    1. Re:Imagine by Ogi_UnixNut · · Score: 5, Interesting

      Yeah, except back in the 2000's people would be thinking it is a cool idea, and would be at least 4 other people who have recently done it and can give tips.

      Now it is just people saying "Meh, throw it away and buy newer more powerful boxes". True, and the rational choice, but still rather bland...

      I remember when nerds here were willing to do all kinds of crazy things, even if they were not a good long term solution. Maybe we all just grew old and crotchety or something :P

      (Spoken as someone who had a lot of fun building an openmosix cluster from old AMD 1.2GHz machines my uni threw out.)

    2. Re:Imagine by CanHasDIY · · Score: 4, Insightful

      You should put the money towards modern 1U nodes rather than a bunch of low volume and high cost chassis parts to try to assemble your frankenrack of used equipment.

      Methinks you've missed the key purpose of using old equipment one already owns...

      --
      An enigma, wrapped in a riddle, shrouded in bacon and cheese
  2. Don't do it by damn_registrars · · Score: 4, Insightful

    Seriously, it isn't worth your effort - especially if you want something reliable. People who set out to make homemade clusters find out the hard way about design issues that reduce the life expectancy of their cluster. There are professionals who can build you a proper cluster for not a lot of money if you really want your own, or even better you can rent time on someone else's cluster.

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
    1. Re:Don't do it by Impy+the+Impiuos+Imp · · Score: 3, Insightful

      Get an older, CUDA-capable card and have your whoever write code for it instead. I doubled all my SETI work units over 10 years in just 2 weeks. A CPU is just a farmer throwing food to the racehorse nowadays.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    2. Re:Don't do it by Anonymous Coward · · Score: 5, Insightful

      Seriously, it isn't worth your effort - especially if you want something reliable. People who set out to make homemade clusters find out the hard way about design issues that reduce the life expectancy of their cluster. There are professionals who can build you a proper cluster for not a lot of money if you really want your own, or even better you can rent time on someone else's cluster.

      If the goal of this is reliable performance, you're absolutely right. But if the goal is to teach yourself about distributed computing, networking, diskless booting, all the issues that come up in building a cluster, on the cheap - then this is a great idea. Just don't expect much from the end product - you'll get more performance a modern box with 10s of cores on a single MB.

    3. Re:Don't do it by cbiltcliffe · · Score: 4, Informative

      For general purpose computing, you are correct. It wouldn't be pessimistic at all for one computer to go malfunctioning every week.

      Huh? E8000 Core2 Duos are not that old. I've got a rack of a half dozen Pentium IIIs that I've run for years without problems. What kind of crap hardware do you run where you're expecting 1 failure out of 14 machines every week?

      This is assuming, of course, that when you set up the cluster in the first place, that you check motherboards for bad caps, loose cooling fans, etc, and discard/repair anything that looks even like it might possibly fail. Considering the effort this guy seems to be going to, that's probably (but I've been wrong about that kind of thing before) a given.
      From the pics, these are BTX machines, which in my experience have better cooling than ATX, and are less likely to have overheated, failing caps in the first place.

      --
      "City hall" in German is "Rathaus" Kinda explains a few things......
  3. don't rule out by v1 · · Score: 5, Insightful

    throwing gear away or giving it away. Just because you have it doesn't mean to have to, or should use it. If energy and space efficiency are important, you need to carefully consider what you are reusing. Sure, what you have now may have already fallen off the depreciation books, but if it's going to draw twice the power and take double the space that newer used kit would, it may not be the best option, even when the other options involve purchasing new or newer-used gear.

    Not saying you need to do this, just recommending you keep an open mind and don't be afraid to do what needs to be done if you find it necessary.

    --
    I work for the Department of Redundancy Department.
    1. Re:don't rule out by eyegor · · Score: 4, Interesting

      Totally agree. We had a bunch of dual dual-core server blades that were freed up and after looking at the power requirements per core for the old systems we decided it would be cheaper in the long run to retire the old servers and buy a fewer number of higher density servers.

      The old blades drew 80 watts/core (320 watts) and the new ones which had dual sixteen-core Opterons drew 10 watts/core for the same amount of overall power. That's a no brainer when you consider that these systems run 24/7 with all CPUs pegged. More cores in production means your jobs finish up faster, you'll be able to have more users and more jobs running and use much less power in the long run.

      --

      Don't anthropomorphize computers, they don't like it.
    2. Re:don't rule out by nine-times · · Score: 4, Insightful

      I agree. I've been doing IT for a while now, and this is the kind of thing that *sounds* good, but generally won't work out very well.

      Tell me if I'm wrong here, but the thought process behind this is something like, "well we have all this hardware, so we may as well make good use out of it!" So you'll save a few hundred (or even a few thousand!) dollars by building a cluster of old machines instead of buying a server appropriate for your needs.

      But let's look at the actual costs. First, let's take the costs of the additional racks, and any additional parts you'll need to buy to put things together. Then there's the work put into implementation. How much time have you spent trying to figure this out already? How many hours will you put into building it? Then troubleshooting the setup, and tweaking the cluster for performance? Now double the amount of time you expect to spend, since nothing ever works as smoothly as you'd like, and it'll take at least twice as long as you expect.

      That's just startup costs. Now factor in the regular costs of additional power and AC. Then there's the additional support costs from running a complex unsupported system, which is constructed out of old unsupported computer parts with an increased chance of failure. This thing is going to break. How much time will you spend fixing it? What additional parts will you buy? Will there be any loss of productivity when you experience down-time that could have been avoided by using a new, simple, supported system? What's the cost of that lost productivity?

      That's just off the top of my head. There are probably more costs than that.

      So honestly, if you're doing this for fun, so that you can learn things and experiment, then by all means have at it. But if you are looking for a cost-effective solution to a real problem, try to take an expansive view of all the costs involved, and compare *all* of the costs of using old hardware vs. new hardware. Often, it's cheaper to use new hardware.

    3. Re:don't rule out by ILongForDarkness · · Score: 4, Interesting

      Great point. Back in the day I worked on a SGI Origin mini/supercomputer (not sure if it qualifies 32 way symmetric multiprocessor still kind of impressive now a days I guess (even a 16 way Opteron isn't symmetric I don't think). Anyways at the time (~2000) there were much faster cores out there. Sure we could use this machine for free for serial load (yeah that is a waste) but we had to wait 3-4X as long as a modern core. You ended up having to ssh in to start new jobs in the middle of the night so you didn't waste an evening of runs versus getting 2-3 in during the day and firing off the fourth before you go to bed. Add to that the IT guys had to keep a relatively obscure system around, provide space and cooling for this monster etc they would have been better just buying us 10 ~1Ghz at the time I guess dual socket workstations.

    4. Re:don't rule out by Farmer+Pete · · Score: 5, Insightful

      But you're missing the biggest reason to do this...The older hardware is already purchased. New hardware would be an additional expense that requires an approval/budgeting process. Electricity costs lots of money, but depending on the company, that probably isn't directly billed to the responsible department. Again, it's hard to go to your management and say that you want them to spend X thousand dollars so that they will save X thousand dollars that they don't think they need to spend in the first place.

    5. Re:don't rule out by i.r.id10t · · Score: 4, Insightful

      On the other hand, depending on what kind of courses you teach (tech school, masters degree comp sci, etc) keepign them around for *students* to have experience building a working cluster and then programming stuff to run parallel on them may be a good idea. Of course, this means the boxes wouldn't be running 24/7/365 (more likely 24/7 for a few weeks per term) so the power bill won't kill you, and it could provide valuable learning experience for students... especially if you have them consider the power consumption and ask them to write a recommendation for a cluster system.

      --
      Don't blame me, I voted for Kodos
  4. Once you solve the hardware challenges..... by eyegor · · Score: 5, Informative

    You'll need to consider how you're going to provision and maintain a collection of systems.

    Our company currently uses the ROCKS cluster distribution, which is a CentOS-based distribution that provisions, monitors and manages all of the compute nodes. It's very easy to have a working cluster set up in a short amount of time, but it's somewhat quirky in that you can't fully patch all pieces of the software without breaking the cluster.

    One thing that I really like about ROCKS is their provisioning tool which is called the "Avalanche Installer". It uses bittorrent to load the OS and other software on each compute node as it comes online and it's exceedingly fast.

    I installed ROCKS on a head node, then was able to provision 208 HP BL480c blades within an hour and a half.

    Check it out at www.rockclusters.org

    --

    Don't anthropomorphize computers, they don't like it.
    1. Re:Once you solve the hardware challenges..... by pswPhD · · Score: 3, Informative

      I can recommend Rocks as well, although you WILL need the slave nodes to have disks in them (you could scrounge some ancient 40Gb drives from somewhere...) you seem to want hardware information so...

      First point is to have all the fans pointing the same way. Large HPC's arrange cabinets back-to-back, so you have a 'hot' corridor and a 'cold' corridor, which enables you to access both sides of the cabinet and saves some money on cooling.
      My old workplace had two clusters and various servers in an air conditioned room, with all the nodes pointing the back wall. probably similar to what you have.
      Don't know anything about the UPS, but I would assume having it on the floor would be OK.

      Good luck with your project. Write a post in the future telling us how it goes.

  5. Really? by Russ1642 · · Score: 5, Funny

    Slashdotters only imagine building Beowulf clusters. This is the first time anyone's been serious about it.

  6. Probably not worth your time by MetricT · · Score: 5, Interesting

    I've been working in academic HPC for over a decade. Unless you are building a simple 2-3 node cluster to learn how a cluster works (scheduler, resource broker and such things), it's not worth your time. What you save in hardware, you'll lose in lost time, electricity, cooling, etc.

    If you're interested in actual research, take one computer, install an AMD 7950 for $300, and you will almost certainly blow the doors off a cluster cobbled from old Core 2 Duo's, and you'll save more than $300 in electricity.

    1. Re:Probably not worth your time by serviscope_minor · · Score: 3, Interesting

      I've been working in academic HPC for over a decade. Unless you are building a simple 2-3 node cluster to learn how a cluster works (scheduler, resource broker and such things), it's not worth your time. What you save in hardware, you'll lose in lost time, electricity, cooling, etc.

      I strongly disagree. I actually had a very similar Ask Slashdot a while back.

      The clustre got built, and has been running happily since.

      If you're interested in actual research, take one computer, install an AMD 7950 for $300, and you will almost certainly blow the doors off a cluster cobbled from old Core 2 Duo's, and you'll save more than $300 in electricity

      Oh yuck!

      But what you save in electricity, you'll lose in postdoc/developer time.

      Sometimes you need results. Developing for a GPU is slow and difficult compared to (e.g.) writing prototype code in MATLAB/Octave. You save heaps of vastly more expensive person and development time by being able to run those codes on a cluster. And also, not every task out there is even easy to put on a GPU.

      --
      SJW n. One who posts facts.
    2. Re:Probably not worth your time by MetricT · · Score: 3, Informative

      You *do* know that Matlab has been supporting GPU computing for some time now? We bought an entire cluster of several hundred nVidia GTX 480's for the explicit purpose of GPU computing.

  7. Sounds interesting... by Mysticalfruit · · Score: 4, Informative

    I'm routinely mounting things in a 42U cabinets that ought not be mounted in them, so I've got *some* insight.

    The standard for airflow is front to back and upwards. Doing some sticky note measurements, I think you could mount 5 of these vertically as a unit. I'd say get a piece of 1" think plywood and dado cut channels 1/4" top and bottom to mount the motherboards. This would also give you a mounting spot that you could line up the power supplys in the back. This would also put the Ethernet ports at the back. Another thing this would allow would be for easy removable of a dead board.

    Going on this idea, you could also make these as "units" and install two of them two deep in the cabinet (if you used L rails).

    Without doing any measuring, I'm suspecting this would get you 5 machines for 7U or 10 machines if you did 2 deep in 7U.

    --
    Yes Francis, the world has gone crazy.
  8. Inter-node communication by plus_M · · Score: 4, Informative

    What do you intend to use for inter-node communication? Gigabit ethernet? You need to realize that latency in inter-node communication can cause *extremely* poor scaling for non-trivial parallelization. Scientific computing clusters typically use infiniband or something like it, which has extremely slow latency, but the equipment will cost you a pretty penny. If you are interested in doing computations across multiple computing nodes, you should really setup just two nodes and benchmark what kind of speed increase there is between running the job on a single node and on two nodes. My guess is that you are going to get significantly less than a 2x speedup. It is entirely possible that the calculation will be *slower* on two nodes than on just one. Of course, if you are just running a massive number of unrelated calculations, then inter-node communication becomes much less important, and this won't be an issue.

  9. So reusing old hardware by MerlynEmrys67 · · Score: 3, Insightful
    There is a reason that old hardware should be gotten rid of. Depending on the exact config of the 14 servers (processor/whatever) you could probably replace them with 1, maybe 2 servers. The current generation of Jefferson Pass servers hold 4 servers in a 2U sled - so you could replace this whole thing with a 2U solution that isn't exposed the elements like you are proposing. It would be new, under warranty and faster than all get out.

    Your solution will take 14 servers, connect them with ancient 1GbE interconnect and hope for the best. The interconnect for clusters REALLY matters, many problems are network bound - and not only network bound but latency bound as well. Look at the list of fastest supercomputers and you will barely see Ethernet anymore (especially at the high end) and definitely not 1GbE. Your new boxes will probably come with 10GbE that will definitely help... Especially since there will be fewer nodes to have to talk to (only 2, maybe 4)

    The other problem that you will run into is your system will take about 20x the power and 20x the air conditioning bill (yeah - that is a LOT of power there), the modern new system will pay for itself in 9-12 months (and that doesn't include the tax deduction for donating the old systems and making them Someone Else's Problem)

    Recycling old hardware always seems like fun. At the end of a piece of hardware's life cycle look at what it will actually cost to keep it in service - Just the electricity bill will bite you hard, then you have the maintenance, and fun reliability problems.

    --
    I have mod points and I am not afraid to use them
  10. Re:Just use Amazon AWS by hawguy · · Score: 5, Informative

    It's 2013 don't build your own cluster just use AWS EC2 spot instances.

    An EC2 "High CPU Medium" instance is probably close to his Core 2 Duo's (it has 1.7GB RAM + two cores of 2.5 EC2 compute units each (each ECU is equivalent to a 2007 era 1.2Ghz Xeon).

    Current spot pricing is $0.018/hour, so a month would cost him around $12.96. (not including storage, add about a dollar for 10GB of EBS disk space).

    If his computers use 150W of power each, at $0.12/KWh, they'll cost exactly $0.018 -- the same price as an EC2 instance excluding storage.

    However spot pricing is not guaranteed, so he'll have to be prepared to shut down his instances when the spot price rises above what he's willing to pay -- full price for the instance is $0.145/hour, but he could get that down to $0.09/hour if he's willing to pay $161 to reserve the instance for 3 years.

  11. sell them and buy new.... by Brit_in_the_USA · · Score: 5, Informative

    SPECfp2006 rate results:
    e8600 34
    i7-3770 130
    x4 the performance

    ...sell the E8xxx series PC's in boxes for$100 a peice with windows licence
    and use the $1400 towards buying Qty.4 lga1155 motherboards (4x$80), 4 unlocked K series i7's (4x$230) and 4x8Gb of DDR3 RAM (4x$40), 4x ~3-400W budget power supplies (4x $30) = $1520

    Use a specialized clustering OS (linux) and have a smaller, easier to manage system, with lots more DDR 3 memory and lower electricity (and AC electricity) bill....

  12. I've built one, it works, but there are caveats by Anonymous Coward · · Score: 3, Interesting

    We have a cluster at my lab that's pretty similar to what the submitter describes. Over the years, we've upgraded it (by replacing old scavenged hardware with slightly less old scavenged hardware) and it is now a very useful, reasonably reliable, but rather power-hungry tool.

    Thoughts:

    - 1GbE is just fine for our kind of inherently parallel problems (Monte Carlo simulations of radiation interactions). It will NOT cut it for things like CFD that require fast node-to-node communication.

    - We are running a Windows environment, using Altair PBS to distribute jobs. If you have Unix/Linux skills, use that instead. (In our case, training grad students on a new OS would just be an unnecessary hurdle, so we stick with what they already know.)

    - Think through the airflow. Really. For a while, ours was in a hot room with only an exhaust fan. We added a portable chiller to stop things from crashing due to overheating; a summer student had to empty its drip bucket twice a day. Moving it to a properly ventilated rack with plenty of power circuits made a HUGE improvement in reliability.

    - If you pay for the electricity yourself, just pony up the cash for modern hardware, it'll pay for itself in power savings. If power doesn't show up on your own department's budget (but capital expenses do), then by all means keep the old stuff running. We've taken both approaches and while we love our Opteron 6xxx (24 cores in a single box!) we're not about to throw out the old Poweredges, or turn down less-old ones that show up on our doorstep.

    - You can't use GPUs for everything. We'd love to, but a lot of our most critical code has only been validated on CPUs and is proving very difficult to port to GPU architectures.

    (Posting AC because I'm here so rarely that I've never bothered to register.)

  13. Actual experience by Anonymous Coward · · Score: 3, Interesting

    I've done this. Starting with a couple of racksful of PS/2 55sx machines in the late '90s and continuing on through various iterations, some with and some without budgets. I currently run an 8-member heterogenous cluster at home (plus file server, atomic clock, and a few other things), in the only closet in the house that has its own AC unit. It's possible I know something about what you're doing.

    Some of what I'll mention may involve more (wood) shop or electrical engineering than you want to undertake.

    My read of your text is that there is a computer lab that will be occupied by people that will also contain this rack with dismounted Optiplex boards and P/Ss. This lab has an A/C unit that you believe can dissipate the heat generated by new lab computers, occupants, these old machines in the rack, and the UPSs. I'll take your word, but be sure to include all the sources of heat in your calculation, including solar thermal loading if, like me, you live in "the hot part of the country". Unfortunately, this eliminates the cheapest/easiest way of moving heat away from your boards -- 20" box fans (e.g. http://www.walmart.com/ip/Galaxy-20-Box-Fan-B20100/19861411 ) mounted to an assembly of four "inward pointing" boards. These can move somewhat more air than 80 mm case fans, especially as a function of noise. One of the smartest thermal solutions I've ever seen tilted the boards so that the "upward slope" was along the airflow direction -- the little bit of thermal buoyancy helped air arriving at the underside of components to flow uphill and out with the rest of the heated air. I.e., this avoided a common problem of unmodeled airflow systems of having horizontal surfaces that trapped heated air and allowed it to just get hotter and hotter.

    Nevertheless, the best idea is to move the air from "this side" to "that side" on every shelf. Don't alternate directions on successive shelves. If you're actually worried about EMI, then you must have an open sided rack (or you shouldn't be worried). One option is to put metal walls around it, which will control your airflow. Another option that costs $10 is to make your own Faraday cage panels however you see fit. (I've done chicken wire and I've done cardboard/Al foil cardboard sandwiches. Both worked.)

    You should probably consider dual-mounting boards to the upper *and* lower sides of your shelves. Another layout I've been very happy with is vertical board mounts (like blades) with a column of P/Ss on the left or right.

    A *really* good idea for power distribution is to throw out the multiple discrete P/Ss and replace them with a DC distribution system. There's very little reason to have all those switching power supplies running to provide the same voltages over 6 feet. The UPSs are the heaviest thing in your setup; putting them at the bottom of the rack is probably a good idea. They generate some heat on standby (not much) and a lot more when running. Of course, when they're running, the AC is (worst case) also off and at least one machine should have gotten the "out of power" message and be arranging for all the machines to "shutdown -h now".

    You only plan on having two cables per machine (since your setup seems KVM-less and headless), so wire organization may not be that important. (Yes, there are wiring nazis. I'm not one.) Pick Ethernet cables that are the right length (or get a crimper, a spool, and a bag of plugs and make them to the exact length). You'll probably get everything you need from 2-sided Velcro strips to make retaining bands on the left and right columns of the rack. Label both ends of all cables. Really. Not kidding. While you're at it, label the front and back of every motherboard with its MAC(s) and whatever identifiers you're using for machines.