Slashdot Mirror


Ask Slashdot: Building a Cheap Computing Cluster?

New submitter jackdotwa writes "Machines in our computer lab are periodically retired, and we have decided to recycle them and put them to work on combinatorial problems. I've spent some time trawling the web (this Beowulf cluster link proved very instructive) but have a few reservations regarding the basic design and air-flow. Our goal is to do this cheaply but also to do it in a space-conserving fashion. We have 14 E8000 Core2 Duo machines that we wish to remove from their cases and place side-by-side, along with their power supply units, on rackmount trays within a 42U (19", 1000mm deep) cabinet." Read on for more details on the project, including some helpful pictures and specific questions. jackdotwa continues: "Removing them means we can fit two machines into 4U (as opposed to 5U). The cabinet has extractor fans at the top and the PSUs and motherboard fans (which pull air off the CPU and remove it laterally — (see images) face in the same direction. Would it be best to orient the shelves (and thus the fans) in the same direction throughout the cabinet, or to alternate the fan orientations on a shelf-by-shelf basis? Would there be electrical interference with the motherboards and CPUs exposed in this manner? We have a 2 ton (24000 BTU) air-conditioner which will be able to maintain a cool room temperature (the lab is quite small), judging by the guide in the first link. However, I've been asked to place UPSs in the bottom of the cabinet (they will likely be non-rackmount UPSs as they are considerably cheaper). Would this be, in anyone's experience, a realistic request (I'm concerned about the additional heating in the cabinet itself)? The nodes in the cabinet will be diskless and connected via a rack-mountable gigabit ethernet switch to a master server. We are looking to purchase rack-mountable power distribution units to clean up the wiring a little. If anyone has any experience in this regard, suggestions would be most appreciated."

160 comments

  1. Imagine by BumbaCLot · · Score: 5, Funny

    A beowulf cluster of these! FP

    1. Re:Imagine by operagost · · Score: 2

      Awesome... it feels like /. circa 2000 again.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    2. Re:Imagine by Anonymous Coward · · Score: 1

      It's been a long time since "Imagine a beowulf cluster of those!" made any degree of sense, or even appeared on ./.

      Natalie Portman's Hot Gritts to you!

    3. Re:Imagine by K.+S.+Kyosuke · · Score: 1

      Don't forget the Petrification Award for unlocking the first post achievement!

      --
      Ezekiel 23:20
    4. Re:Imagine by Ogi_UnixNut · · Score: 5, Interesting

      Yeah, except back in the 2000's people would be thinking it is a cool idea, and would be at least 4 other people who have recently done it and can give tips.

      Now it is just people saying "Meh, throw it away and buy newer more powerful boxes". True, and the rational choice, but still rather bland...

      I remember when nerds here were willing to do all kinds of crazy things, even if they were not a good long term solution. Maybe we all just grew old and crotchety or something :P

      (Spoken as someone who had a lot of fun building an openmosix cluster from old AMD 1.2GHz machines my uni threw out.)

    5. Re:Imagine by Anonymous Coward · · Score: 0

      Not meaning to be crude, but...
      HOW THE FUCK is this offtopic?!?

    6. Re:Imagine by Anonymous Coward · · Score: 1

      The difference is that we take the clustering part for granted now. The question wasn't something interesting like how do I do supercomputer-like parallel activities on regular PCs or solve operational issues. It was just about physically putting a bunch of random parts into a rack on a low budget.

      But we won already... now, the mainstream is commodity rack parts. You should put the money towards modern 1U nodes rather than a bunch of low volume and high cost chassis parts to try to assemble your frankenrack of used equipment. You get subsidized server guts by buying a 1U server instead of just some unusual empty rack case. Even more, half the value of the rack equipment is that it has an optimized cooling plan to work in that density.

    7. Re:Imagine by Anonymous Coward · · Score: 2, Funny

      Back then, people read slashdot at -1, nested, and laughed at the trolls. Right now, I wouldn't be surprised if I'm modded -1 within about 15 minutes by an editor with infinite mod points. Post something the group-think disagrees with, get downmodded. Post something anonymous, no one will read it. Post something mildly offensive, get downmodded.

      We didn't have fucking flags back then and the editors didn't delete posts. Now they do. Fuck what this site has become.

    8. Re:Imagine by CanHasDIY · · Score: 4, Insightful

      You should put the money towards modern 1U nodes rather than a bunch of low volume and high cost chassis parts to try to assemble your frankenrack of used equipment.

      Methinks you've missed the key purpose of using old equipment one already owns...

      --
      An enigma, wrapped in a riddle, shrouded in bacon and cheese
    9. Re:Imagine by Cramer · · Score: 1

      No, we haven't. What you and many others (including the poster) miss is how much time and effort -- and yes, money -- will go into building this custom, already obsolete, cluster. His first mistake is keeping Dell's heat tower and fan -- that's designed for a DESKTOP where you need a large heatsink so a slow (quiet) fan can move enough air to keep it cool; in a rack cluster, that's not even remotely a concern. (density trumps noise)

      (I'm in the same boat -- as I'm sure everyone else is. I have stacks of old, obsolete machines. Difference is, *I* know they're junk which is why they're stacked in a corner... spare parts for the few we still use (read: never replaced))

    10. Re:Imagine by Anonymous Coward · · Score: 0

      The submitter was talking about racks and chassis of some kind to hold the desktop parts he removed from their original cases. I assumed he was going to be purchasing this stuff, and didn't already have it sitting around as well...

      Just the ancillary equipment to assemble a cluster of already owned motherboards is significant budget, and it is budget better spent on just a few modern rack nodes in many cases.

    11. Re:Imagine by CanHasDIY · · Score: 2

      Since this is obviously a 'pet' project, i.e. something he's doing just to see if it can be done, time and effort costs don't really factor in, IMO. Like when I work on my own truck, I don't say, "it cost me $300 in parts and $600 in labor to fix that!"

      His first mistake is keeping Dell's heat tower and fan -- that's designed for a DESKTOP where you need a large heatsink so a slow (quiet) fan can move enough air to keep it cool; in a rack cluster, that's not even remotely a concern. (density trumps noise)

      I find the idea of jury-rigging up a rackmount a bit specious myself... But again, this appears to be a 'can we do it' type project, so I don't feel compelled to criticize like I would if he were trying to do this with some mission-critical system.

      --
      An enigma, wrapped in a riddle, shrouded in bacon and cheese
    12. Re:Imagine by Anonymous Coward · · Score: 0

      Openmosix user here too!
      In my case were 3 boxes: a k6-2, a pentium 66 and a pentium mmx ....
      Performance wise wasnt a good idea, but it was indeed fun....

    13. Re:Imagine by Anubis350 · · Score: 2

      Hell, this is still done in Unis. I used to run a test cluster for my Uni's chem dept that was basically retired lab machines on home depot wire shelving in our machine room. The only thing that cost money for the dept was the headnode (which as used for staging jobs for the big clusters too, and as a file server for storing job output), the Procurve switches, and my time I suppose too. It was a useful cluster for testing things before they got run on big clusters where time was more metered and for getting grad students familiar with a clustered environment, and all that cost was still cheaper than building a 1 or 2u (or, hell, blade) based system just for this purpose.

      To the OP: I used a Debian install on the headnode with Torque for queue management. I netbooted all the machines with the boot image stored there as well, had a couple kernels for the different machines compiled and each clump was separated in the Torque queue. All that made it absurdly easy to update the whole system quickly, but you can also boot the machines locally and for speed you might want to. The main thing you do need to worry about is heat and power consumption, having a lot of machines in one place will each a lot of power and put out a lot of heat. I already had a machine room to toss it all in, don't know if you do.

      --
      "goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
    14. Re:Imagine by Cramer · · Score: 2

      Again, you miss the point. You did it "right"... took the old machines from desks and sat them on a shelf. Translation: the absolute minimum amount of time and effort. The poster is taking the dell optiplex's apart to make a "google cluster" (i.e. motherboard bolted to a sheet) thus, making them take (marginally) less space. He's putting in a whole lot of work for very little gain.

      (For the record, I've built clusters on the uber-cheap using 1U (quad-core opteron) rack mount servers from ebay sellers. Actually have a small pile of them on the conference table right now -- shipping was more than the actual machines.)

    15. Re:Imagine by Anonymous Coward · · Score: 0

      Sometimes you have to kill the puppy to save it.

    16. Re:Imagine by Anonymous Coward · · Score: 0

      Again, you miss the point. You did it "right"... took the old machines from desks and sat them on a shelf. Translation: the absolute minimum amount of time and effort. The poster is taking the dell optiplex's apart to make a "google cluster" (i.e. motherboard bolted to a sheet) thus, making them take (marginally) less space. He's putting in a whole lot of work for very little gain.

      (For the record, I've built clusters on the uber-cheap using 1U (quad-core opteron) rack mount servers from ebay sellers. Actually have a small pile of them on the conference table right now -- shipping was more than the actual machines.)

      I 100% agree with that - taking the machines and just stacking them up in a rack, maybe buying a few shelves, cheap and easy, and the slow(er) desktop fan really doesn't matter - the room itself is probably cool enough (with a real A/C) vs. a non-cooled office the machines are meant to run in. Trying to make "2 machines fit into 4U instead of 5U" is a lot of work for very little space gain - and *no* performance gain - especially just keeping the existing fans, etc. Honestly, 4U bare metal rackmount boxes themselves aren't "cheap" (lets say $100), plus the labor to go taking machines apart, drilling holes, mounting plates/brackets that maybe have to be custom made... and this is for "obsolete" (ie, being replaced) machines, I'd bet for the same cost you could go out on ebay and probably pick up an equivalent (same vintage/speed/memory) pile of 2U servers.

      If he was talking taking the motherboards out, overclocking them with water cooling (or liquid Nitrogen), and really messing with the hardware... still be probably overly expensive, but have a 'cool' factor at least.

    17. Re:Imagine by strikethree · · Score: 1

      I remember when nerds here were willing to do all kinds of crazy things, even if they were not a good long term solution. Maybe we all just grew old and crotchety or something :P

      Actually, I suspect it is the young risk-averse special snowflakes that are all saying to throw money at it. D'oh.

      --
      "Someone needs to talk to the tree of liberty about its ghoulish drinking problem." by ohnocitizen
    18. Re:Imagine by Anonymous Coward · · Score: 0

      I remember when nerds here were willing to do all kinds of crazy things, even if they were not a good long term solution. Maybe we all just grew old and crotchety or something :P

      Actually, I suspect it is the young risk-averse special snowflakes that are all saying to throw money at it. D'oh.

      I think its a great and noble idea - But i dont think it would ever be financially viable and therefore not worth doing. A lot of people are referring to Uni labs, which is great - i am by no means saying they arent technical enviroments. Just cant compare to Business sense. The cost isnt the kit, its the time your company is paying you to be doing something worth while. So i guess its dependant on who will do the work - if you get a desktop engineer on it, perhaps its viable? But if you asked a Sys admin to spend a week fiddling around with all these computers, that same week of wages would buy you a better soloution, whilst your sys admin was improving business continuity and performance on more meaningful tasks..

      People seem to be missing the real cost - Time. You may have the time at work, but if thats the case your company should probably be considering putting you to more tasks. Every business is different and i am trying not to sound so corporate and boring, but it kinda doesnt make sense. (I am young and do throw money at my problems.. In personal life too! :))

    19. Re:Imagine by Anonymous Coward · · Score: 0

      I think its a great and noble idea - But i dont think it would ever be financially viable and therefore not worth doing.

      Hi special snowflake. Not everything is a fucking business nor does everything have to be viewed through the eyes of what is most directly profitable. Perhaps in setting it up, he has an insight into interconnectedness and goes on to revolutionize everything about communications; thereby saving the world trillions of dollars and saving billions of lives through avoiding unnecessary wars?

      Yeah, everything is on the absolute extreme edge of poverty and no time can be given to frivolous undertakings because it could cause the imminent destruction of the entire fucking world in the next 15 seconds. Lighten up. Geez. Let him play. Let YOURSELF play.

  2. Don't do it by damn_registrars · · Score: 4, Insightful

    Seriously, it isn't worth your effort - especially if you want something reliable. People who set out to make homemade clusters find out the hard way about design issues that reduce the life expectancy of their cluster. There are professionals who can build you a proper cluster for not a lot of money if you really want your own, or even better you can rent time on someone else's cluster.

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
    1. Re:Don't do it by Impy+the+Impiuos+Imp · · Score: 3, Insightful

      Get an older, CUDA-capable card and have your whoever write code for it instead. I doubled all my SETI work units over 10 years in just 2 weeks. A CPU is just a farmer throwing food to the racehorse nowadays.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    2. Re:Don't do it by Anonymous Coward · · Score: 0

      Running something like Hadoop would work with this kind of setup. Hadoop is designed to allow machines to break without anything happening to the map-reduce code that runs on it, other than slight delays. Configuration of a new machine should be as simple as installing a disk image with the configurations in place.

      For general purpose computing, you are correct. It wouldn't be pessimistic at all for one computer to go malfunctioning every week. If that disrupts what you are trying to run, or if you don't have resources to maintain the system, don't do it.

    3. Re:Don't do it by Anonymous Coward · · Score: 5, Insightful

      Seriously, it isn't worth your effort - especially if you want something reliable. People who set out to make homemade clusters find out the hard way about design issues that reduce the life expectancy of their cluster. There are professionals who can build you a proper cluster for not a lot of money if you really want your own, or even better you can rent time on someone else's cluster.

      If the goal of this is reliable performance, you're absolutely right. But if the goal is to teach yourself about distributed computing, networking, diskless booting, all the issues that come up in building a cluster, on the cheap - then this is a great idea. Just don't expect much from the end product - you'll get more performance a modern box with 10s of cores on a single MB.

    4. Re:Don't do it by Anonymous Coward · · Score: 1

      I agree with this poster. After building a homebrew HPC environment and then working with a vendor engineered solution, I can tell you that taking old hardware is really not worth it other than a learning exercise. But never the less, building it would be fun, just not practical. So from a learning perspective, knock yourself out.

      From a pragmatic point of view, the hardware is old, and not very efficient in terms of electricity. Also considering that a single TESLA card can deliver anywhere from 2 to 4 teraflops in one card, and you would be lucky to see even a 1 teraflop in this entire arrangement. However this does not preclude you from introducing TESLA cards into the environment if you have a compliant PCI-E slot and the power to run them.

      Also, it depends on what you are trying to achieve. Anyway, have fun with your engineering challenge!!

    5. Re:Don't do it by Anonymous Coward · · Score: 0

      I agree. Try to get 4 cheap 8-core boxes and use those instead of the 14-node frankenstein.
      It'll save you a bundle on electricity and UPS capacity.
      You can also install much more RAM since they'll take DDR3.

      You're looking at $1200 or so if you go AMD.
      That gets 32 CPU cores 4 motherboards and RAM (whatever $100 gets).
      You already have PSUs - salvage those if they can handle 300w.
      You don't need cases since you're going to stick it in trays.

      Add stronger power supplies if you want to use GPUs.
      Add Infiniband if you want a real cluster and GPU to GPU communication.

    6. Re:Don't do it by cbiltcliffe · · Score: 4, Informative

      For general purpose computing, you are correct. It wouldn't be pessimistic at all for one computer to go malfunctioning every week.

      Huh? E8000 Core2 Duos are not that old. I've got a rack of a half dozen Pentium IIIs that I've run for years without problems. What kind of crap hardware do you run where you're expecting 1 failure out of 14 machines every week?

      This is assuming, of course, that when you set up the cluster in the first place, that you check motherboards for bad caps, loose cooling fans, etc, and discard/repair anything that looks even like it might possibly fail. Considering the effort this guy seems to be going to, that's probably (but I've been wrong about that kind of thing before) a given.
      From the pics, these are BTX machines, which in my experience have better cooling than ATX, and are less likely to have overheated, failing caps in the first place.

      --
      "City hall" in German is "Rathaus" Kinda explains a few things......
    7. Re:Don't do it by sneakyimp · · Score: 2

      Nonsense! Home-built cluster can be cheap and very educational. http://helmer.sfe.se/

    8. Re:Don't do it by stymy · · Score: 1

      Also, always calculate Ghz/Watt or whatever, as newer processors are more efficient, and new processors can sometimes pay for themselves pretty fast over time due to a lower electricity bill.

    9. Re:Don't do it by eyegor · · Score: 2

      GPU-based computing's a great idea, but not appropriate for all problems. There's also significantly more work managing memory and all that with a GPU.

      We have about 50 M2070 GPUs in production and virtually no one uses them. They depend instead on our CPU resources since they're easier to program for.

      --

      Don't anthropomorphize computers, they don't like it.
    10. Re:Don't do it by csumpi · · Score: 1

      How do you check motherboards for bad capacitors?

    11. Re:Don't do it by Anonymous Coward · · Score: 2, Informative

      How do you check motherboards for bad capacitors?

      Bad caps will swell or buldge at the top. Eventually they will leak electrolytes and corrosion will occur on the tops.

      FYI the capacitors are the ones shaped like cylinders or tiny soda cans. Sometimes there will be '+' or 'x' perforation on the tops where the swelling usually happens.

    12. Re: Don't do it by Anonymous Coward · · Score: 0

      use your fave search engine to find "capacitor plague wiki".

      useful info on an industry-wide issue we still run into

      usually they can be visually bulging or worse.

    13. Re:Don't do it by Maximum+Prophet · · Score: 2

      How do you check motherboards for bad capacitors?

      Bulges are bad. Leaks are bad. If the smoke has been released, doubly bad.

      Other than that, you have to know what voltage is supposed to be on them, and measure it. If you still suspect something use a scope. Worst case, you have to desolder it, then check it's value and ESR. Mostly, I don't bother, I just replace suspect caps until whatever is working.

      --
      All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
    14. Re:Don't do it by dbIII · · Score: 1

      It's actually not all that hard anymore. It's not just that it isn't that difficult to do from scratch, there are also distros like ROCKS designed to run with almost no configuration.
      As for the hardware side, so long as you stick to nothing more exotic than gigabit copper it's not hard. Taking things out of their chassis like the poster suggests is asking for a bit of trouble since they are designed to channel the incoming air, so that would need ugly measures like large diameter fans to force air through the rack plus a few baffles. That is not difficult even if it adds a bit of time and expense.

      I wouldn't completely dismiss the idea since in some misguided orgs new equipment is far harder to justify than wages, so some poor sods have to make do with the gear that people like me took out of service a few years ago (still have them since there must be something you can do with 22 1U cases with a working power supply even if the CPU is 32 bit and dog slow).

    15. Re:Don't do it by Fallen+Kell · · Score: 2

      Huh? E8000 Core2 Duos are not that old. I've got a rack of a half dozen Pentium IIIs that I've run for years without problems.

      What are you smoking? E8000 Core2 Duos are ancient. These are all 5 year old CPU's. Five years in which Intel has been focusing specifically on better power efficiency which in turn leads to better cooling efficiency all the while improving the number of cores contained in a chip. The 14 E8000 systems which are going to take up 42U of rack space can and should be replaced by a single 1U Dell R620 with 2x E5-2690 processors (8c + hyperthreading), with 8x16GB 1600Mhz ECC DIMMs (which will probably be more memory than you have total in all 14 of those old systems), and if you feel ambitious enough to develop your code to take advantage, 1 or 2 GPUs.

      Not only will the 2x E5-2690 completely blow away all 14 E8000 cpu's combined, the single, 1U system will use less than 1/4 of the power and cooling needs of the 14 systems (the R620 without GPU's will only need a 750W power supply. 1100W if you go with the GPU's, but that is still less than the draw of the 14x255W power supplies the Optiplex's use). Given the power savings, some quick numbers on ROI should be done:

      Dell R620 power usage 1yr: 6570 kWh
      14x Optiplex power usage 1yr: 31273 kWh
      R620 will save you approx $2470 assuming $0.10 a kWh utility cost.

      So in as little as 3 years the R620 will have paid for itself in power consumption alone, not even factoring in the fact that it is also going to only produce 1/8th the heat, meaning the HVAC in the room will not be using as much power to cool the space either. And you have not even factored in the time you are spending wasted on trying to construct a shelf system to hold those old desktop towers which are going to take up 42U of your precious rack space. All this is manpower time which costs money which could have been used on the R620 which is designed to be rack mounted and will take all of 30 minutes for a single person to pull out of the box and mount into the rack.

      --
      We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
    16. Re:Don't do it by fluffy99 · · Score: 1

      Ditto. The labor alone will kill any perceived saving.

      Sell those old boxes for $100 a pop to students and buy something nice. In my experience if you have a Dell Optiplex of that era with the original power supply and motherboard, it's just waiting to die. Check the MB caps. Dell had to replace 2/3rd of the power supplies and 1/2 of the motherboards in the 100 that we owned within the first 2 years.

    17. Re:Don't do it by fluffy99 · · Score: 1

      Nonsense!

      Home-built cluster can be cheap and very educational.

      http://helmer.sfe.se/

      Perhaps as a hobby. But generally they aren't cost effective if you're paying the labor for someone to implement it with 5-year old hardware in a cluster-fuck (pun intended) , jam it in a rack haphazard arrangement and there isn't even a clear need or requirement for it.

    18. Re:Don't do it by Anonymous Coward · · Score: 0

      Ditto. The labor alone will kill any perceived saving.

      Sell those old boxes for $100 a pop to students and buy something nice. In my experience if you have a Dell Optiplex of that era with the original power supply and motherboard, it's just waiting to die. Check the MB caps. Dell had to replace 2/3rd of the power supplies and 1/2 of the motherboards in the 100 that we owned within the first 2 years.

      Yup, you'd be far better off just spending a little time wiping them and maybe re-installing bare-OS, and selling them cheap to students who don't have a computer (or have something lesser), or heck, even donating them to the local HS or something and taking a tax writeoff on them (and they can give them away, or use them to teach people on, or whatever).

    19. Re:Don't do it by caferace · · Score: 1

      I just left a place that had over 100 M2075's used for R&D and test dualed up and all on the same network.... During the day, they were mostly real hot, all day. Some jobs ran overnight or the weekend. As they say, YMMV.

    20. Re:Don't do it by hazeii · · Score: 1

      If no-one's using the M2070s, a project like Einstein@home certainly could.

      --
      All your ghosts are just false positives.
    21. Re:Don't do it by strikethree · · Score: 1

      Seriously, it isn't worth your effort - especially if you want something reliable.

      Huh? Maybe they just want to do it because it is possible? Kind of like climbing a mountain...

      There are professionals who can build you a proper cluster

      And how did those professionals become professionals? How did they learn the pitfalls about building clusters? What happened to learning something yourself because it is possible?

      *sigh* Just like the "scientists" who recently discovered that people with different values have... different values. I despair.

      --
      "Someone needs to talk to the tree of liberty about its ghoulish drinking problem." by ohnocitizen
    22. Re:Don't do it by eyegor · · Score: 1

      Unfortunately, most of our clusters are on closed networks.

      --

      Don't anthropomorphize computers, they don't like it.
    23. Re:Don't do it by Anonymous Coward · · Score: 0

      Huh? E8000 Core2 Duos are not that old. I've got a rack of a half dozen Pentium IIIs that I've run for years without problems. What kind of crap hardware do you run where you're expecting 1 failure out of 14 machines every week?

      The age doesn't matter. When you are running even a small cluster of computers doing intensive computation, they will break very often, even if they are brand new hardware. A failure once a week is not completely unrealistic. Note that a "failure" in this case means "anything that disrupts computing and requires manual fixing".

    24. Re:Don't do it by cbiltcliffe · · Score: 1

      Absolutely nowhere in my comment did I state anything about power efficiency. My simple point was reliability.

      If you're expecting a 7% failure rate per week, then you run crap hardware. As I said, I've got a half dozen Pentium IIIs, which are at least 3 times as old as these, that have been running flawlessly for years, so how does your setup suck so badly?

      --
      "City hall" in German is "Rathaus" Kinda explains a few things......
    25. Re:Don't do it by cbiltcliffe · · Score: 1

      The whole point of the thread was that the old machines would give a high failure rate, of 1/14 per week.
      If buying new machines is still going to give a high failure rate, then that's not a reason to upgrade.

      I realize power and performance are still valid reasons, but that wasn't where this particular thread was going.

      --
      "City hall" in German is "Rathaus" Kinda explains a few things......
  3. don't rule out by v1 · · Score: 5, Insightful

    throwing gear away or giving it away. Just because you have it doesn't mean to have to, or should use it. If energy and space efficiency are important, you need to carefully consider what you are reusing. Sure, what you have now may have already fallen off the depreciation books, but if it's going to draw twice the power and take double the space that newer used kit would, it may not be the best option, even when the other options involve purchasing new or newer-used gear.

    Not saying you need to do this, just recommending you keep an open mind and don't be afraid to do what needs to be done if you find it necessary.

    --
    I work for the Department of Redundancy Department.
    1. Re:don't rule out by eyegor · · Score: 4, Interesting

      Totally agree. We had a bunch of dual dual-core server blades that were freed up and after looking at the power requirements per core for the old systems we decided it would be cheaper in the long run to retire the old servers and buy a fewer number of higher density servers.

      The old blades drew 80 watts/core (320 watts) and the new ones which had dual sixteen-core Opterons drew 10 watts/core for the same amount of overall power. That's a no brainer when you consider that these systems run 24/7 with all CPUs pegged. More cores in production means your jobs finish up faster, you'll be able to have more users and more jobs running and use much less power in the long run.

      --

      Don't anthropomorphize computers, they don't like it.
    2. Re:don't rule out by nine-times · · Score: 4, Insightful

      I agree. I've been doing IT for a while now, and this is the kind of thing that *sounds* good, but generally won't work out very well.

      Tell me if I'm wrong here, but the thought process behind this is something like, "well we have all this hardware, so we may as well make good use out of it!" So you'll save a few hundred (or even a few thousand!) dollars by building a cluster of old machines instead of buying a server appropriate for your needs.

      But let's look at the actual costs. First, let's take the costs of the additional racks, and any additional parts you'll need to buy to put things together. Then there's the work put into implementation. How much time have you spent trying to figure this out already? How many hours will you put into building it? Then troubleshooting the setup, and tweaking the cluster for performance? Now double the amount of time you expect to spend, since nothing ever works as smoothly as you'd like, and it'll take at least twice as long as you expect.

      That's just startup costs. Now factor in the regular costs of additional power and AC. Then there's the additional support costs from running a complex unsupported system, which is constructed out of old unsupported computer parts with an increased chance of failure. This thing is going to break. How much time will you spend fixing it? What additional parts will you buy? Will there be any loss of productivity when you experience down-time that could have been avoided by using a new, simple, supported system? What's the cost of that lost productivity?

      That's just off the top of my head. There are probably more costs than that.

      So honestly, if you're doing this for fun, so that you can learn things and experiment, then by all means have at it. But if you are looking for a cost-effective solution to a real problem, try to take an expansive view of all the costs involved, and compare *all* of the costs of using old hardware vs. new hardware. Often, it's cheaper to use new hardware.

    3. Re:don't rule out by ILongForDarkness · · Score: 4, Interesting

      Great point. Back in the day I worked on a SGI Origin mini/supercomputer (not sure if it qualifies 32 way symmetric multiprocessor still kind of impressive now a days I guess (even a 16 way Opteron isn't symmetric I don't think). Anyways at the time (~2000) there were much faster cores out there. Sure we could use this machine for free for serial load (yeah that is a waste) but we had to wait 3-4X as long as a modern core. You ended up having to ssh in to start new jobs in the middle of the night so you didn't waste an evening of runs versus getting 2-3 in during the day and firing off the fourth before you go to bed. Add to that the IT guys had to keep a relatively obscure system around, provide space and cooling for this monster etc they would have been better just buying us 10 ~1Ghz at the time I guess dual socket workstations.

    4. Re:don't rule out by korgitser · · Score: 2

      Agreed. Once the OP calculates the TCO of the system, it might turn out that the free stuff might not be worth it. First you should find someone who has done something similiar before. Then you can start from the actual bottlenecks and play out some alternative scenarios.
      What requirements do your calculations have? CPU vs I/O? The TDP of an e8000, 65W, is not bad - this puts your presumed rack short of the 2kW range. How much would that electricity cost for you in a year? If your calculations are I/O bound, you will have to spend on additional RAM and maybe SSDs, or the CPUs will be mostly occupied with wasting electricity/money. It might make sense to buy Atom boards instead. On the opposite end, it might make sense to buy some real cruncher CPUs or even GPUs.
      You also have to calculate the labor involved. Setting the system up is not too much, but maintaining it, supporting it? If your lab is 14 people, and we presume every one will ask for support once a week, you will have 3 people every day bugging you about the cluster. Add to this regular maintenance, replacing failed parts (desktop grade hardware will fail regularly under heavy load), keeping track of the general state of your software stack upstream... You might find that you will spend most of your time on the cluster and not on your job. Which means you need hire an extra pair of hands. It might be cheaper just to buy your slice of time on an actual (commodity) science cluster.

      --
      FCKGW 09F9 42
    5. Re:don't rule out by Farmer+Pete · · Score: 5, Insightful

      But you're missing the biggest reason to do this...The older hardware is already purchased. New hardware would be an additional expense that requires an approval/budgeting process. Electricity costs lots of money, but depending on the company, that probably isn't directly billed to the responsible department. Again, it's hard to go to your management and say that you want them to spend X thousand dollars so that they will save X thousand dollars that they don't think they need to spend in the first place.

    6. Re:don't rule out by i.r.id10t · · Score: 4, Insightful

      On the other hand, depending on what kind of courses you teach (tech school, masters degree comp sci, etc) keepign them around for *students* to have experience building a working cluster and then programming stuff to run parallel on them may be a good idea. Of course, this means the boxes wouldn't be running 24/7/365 (more likely 24/7 for a few weeks per term) so the power bill won't kill you, and it could provide valuable learning experience for students... especially if you have them consider the power consumption and ask them to write a recommendation for a cluster system.

      --
      Don't blame me, I voted for Kodos
    7. Re:don't rule out by Anonymous Coward · · Score: 1

      While I don't agree that this project is a GOOD idea. This! A thousand times, THIS!

      I just spent 6 months convincing the managemetn here that we can update our 7 year old servers with 50% less equipment, save 75% on power and cooling and pay for the project in about 18 months without mentioning that our userbase/codebase has grown to the point that we are paying people to stare at screens.

      Read this article about the Titan upgrade to Oak Ridge Supercomputer http://www.anandtech.com/show/6421/inside-the-titan-supercomputer-299k-amd-x86-cores-and-186k-nvidia-gpu-cores FLOPS per Watt has come a long way in the past 5 years. Running old hardware will cost you a lot, especially if it idles any signifiicant period of time.

    8. Re:don't rule out by Anonymous Coward · · Score: 0

      The Origin wasn't symmetric. It was demonstrating that NUMA was good enough for many people to be fooled.

    9. Re:don't rule out by pseudofrog · · Score: 2

      Because I know I'm not the only one who is bothered by this: )

    10. Re:don't rule out by Anonymous Coward · · Score: 0

      If we run with that constraint it's rather freeing. Given: The TCO is not worth it.

      But if there is "free training" in deploying and redeploying the cluster, and power is just not an issue at this scale, then the redneck solution would probably be the following:
      _____________
      Don't waste your rack space on this, two reasons:
              Home Depot has Wire Rack shelving sets that stand alone and have rollers on the base, they will let you roll your cluster into someone's garage when you're done, and get it out of the way.
              Second you don't want to re-engineer the heat flow across the components. Trust me; I've done this; it's a bitch to get right and unless you replace all the fans with high output, high noise fans, you won't be able to engineer much space out of the Dell design.
      _____________
      Face all the computers the same way. Have a hot and a cold side, so you can feed AC toward one side and increase the total path distance for air that comes out the back to get back to the A/C unit. For my implementation of this I had a fan blowing air out the window behind the computers, and an A/C unit on the other side of the room blowing cold air across those seated then through the computers to get out the window. This worked well in the summer, and in the winter they turned the exhaust fan off and heated the office with that special PCB-and-dust smelling air.
      ______________
      Wire racks have a near infinite number of places to wire-tie cables. So buy them (or have college kids wire solder on ends so they are the right length) and bundle them together so there is a drop at the back of each machine with all the cables it needs. When you want to pull a computer out, you roll the entire rack away from the wall. Pull 5 cables from the back (mouse, keyboard, video, ethernet, and power), then slide the computer out the front (with less than a half inch of space between each one). This takes 60 seconds. When you want to put it back, with all the cables dangling there, it's about 90 seconds.
      ______________
      The single drive 5.25 to 3.5" drive caddies are cheap and worth the effort. Buy one more than you need so you can have it's "sleigh" stocked with a replacement drive. Make a USB key with a Fedora Kickstart that restores a cluster node, and make failed drives 2 minutes worth of work and 30 minutes worth of waiting.
      ______________
      I did this for years with a 10 node system, 14 would fit a single one of these $100 rolling shelf units. Then I got well funded and became enchanted with HP's iLO interface.

    11. Re:don't rule out by Maximum+Prophet · · Score: 1

      That's the best idea so far. A few machines that students can trash are invaluable. If the students spend most of the time tearing it down, and rebuilding the cluster, it's not going to use much power.

      You could have them predict what changing CPU/Memory/Interconnect will do to performance, then make them try it out. Put some *Science* into Computer Science.

      --
      All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
  4. Easy... by Anonymous Coward · · Score: 1

    1. buy malware at a shady virus exchange to create a beowulf botnet
    2. ???
    3. profit!!!

  5. Mounting these bare horizontally by Anonymous Coward · · Score: 0

    Is your second mistake. How much memory is available and what will your interconnects be?

  6. GPUs by ThatsNotPudding · · Score: 1

    I thought some folks had switch to GPUs for heavy number-chrunching... Though the custom hardware setups no doubt renders this a moot point.

    Glad I could help :\

    1. Re:GPUs by Farmer+Pete · · Score: 1

      That's because it's a hell of a lot faster to use a GPU. The problem is that a decent GPU uses a lot more power than those PSUs can probably support, but even a semi-proficient GPU may be a wise investment.

    2. Re:GPUs by Anonymous Coward · · Score: 0

      The advantage is if a GPU is just sitting there. I don't have the link at the moment, but there was a comparison a while back... CPUs do comparably once you take equal precautions with optimizing things for cache sizes and such.

    3. Re:GPUs by dbIII · · Score: 1

      They are very much memory limited so don't work for everything. However the biggest showstopper in most cases is when somebody wrote some cool code then left, thus it's stuck on whatever platform it was compiled for. Intel have recently brought out a x86 type of highly parallel GPU type card to catch this market.

  7. Once you solve the hardware challenges..... by eyegor · · Score: 5, Informative

    You'll need to consider how you're going to provision and maintain a collection of systems.

    Our company currently uses the ROCKS cluster distribution, which is a CentOS-based distribution that provisions, monitors and manages all of the compute nodes. It's very easy to have a working cluster set up in a short amount of time, but it's somewhat quirky in that you can't fully patch all pieces of the software without breaking the cluster.

    One thing that I really like about ROCKS is their provisioning tool which is called the "Avalanche Installer". It uses bittorrent to load the OS and other software on each compute node as it comes online and it's exceedingly fast.

    I installed ROCKS on a head node, then was able to provision 208 HP BL480c blades within an hour and a half.

    Check it out at www.rockclusters.org

    --

    Don't anthropomorphize computers, they don't like it.
    1. Re:Once you solve the hardware challenges..... by clark0r · · Score: 1

      How does this play with SGE / OGE? Can you centrally configure each node to mount a share? How about install a custom kernel, modules, packages, infiniband config and Lustre mount? If it can do these then it's going to be useful for real clusters.

    2. Re:Once you solve the hardware challenges..... by Anonymous Coward · · Score: 1

      Correct website is -> www.rocksclusters.org

    3. Re:Once you solve the hardware challenges..... by pswPhD · · Score: 3, Informative

      I can recommend Rocks as well, although you WILL need the slave nodes to have disks in them (you could scrounge some ancient 40Gb drives from somewhere...) you seem to want hardware information so...

      First point is to have all the fans pointing the same way. Large HPC's arrange cabinets back-to-back, so you have a 'hot' corridor and a 'cold' corridor, which enables you to access both sides of the cabinet and saves some money on cooling.
      My old workplace had two clusters and various servers in an air conditioned room, with all the nodes pointing the back wall. probably similar to what you have.
      Don't know anything about the UPS, but I would assume having it on the floor would be OK.

      Good luck with your project. Write a post in the future telling us how it goes.

    4. Re:Once you solve the hardware challenges..... by Anonymous Coward · · Score: 0

      Are you sure the link page is available?
      Here reports host not found

    5. Re:Once you solve the hardware challenges..... by eyegor · · Score: 1

      My bad: www.rocksclusters.org

      --

      Don't anthropomorphize computers, they don't like it.
    6. Re:Once you solve the hardware challenges..... by eyegor · · Score: 1

      It comes with a pretty recent version of SGE and openmpi installed. It's fully capable of using NFS shares and many people have used it with Infiniband. Cluster monitoring's done with ganglia. The kernel's customizable and you can add your own modules as "rolls" and can manage packages either as a post install or build it into the kickstart for each node. We use Isilon for our shared storage, but we're probably going to be setting up a gluster storage cluster too.

      Rocks is a great way for an organization to get their feet wet with high performance computing, but we're beginning to find some limitations especially when it comes to security patching.

      We're working on a next-gen cluster architecture where we will provide the same user interface and resources as Rocks, but will use Cobbler or something similar for provisioning, Puppet for configuration management and either SGE, OGE or Univa grid engine for the scheduler. We plan on using ganglia and nagios for monitoring and will eventually, extend our provisioning, patching and monitoring to cover the rest of the enterprise.

      --

      Don't anthropomorphize computers, they don't like it.
  8. Really? by Russ1642 · · Score: 5, Funny

    Slashdotters only imagine building Beowulf clusters. This is the first time anyone's been serious about it.

    1. Re:Really? by Anonymous Coward · · Score: 0

      Yes, the first time. Seriously.

    2. Re:Really? by Anonymous Coward · · Score: 0

      If I had a nickel for every wet dream I've had of a Beowulf cluster....

    3. Re:Really? by Anonymous Coward · · Score: 0

      You'd be blind?

    4. Re:Really? by Anonymous Coward · · Score: 0

      With so many beowulf cluster "experts" here, you wouldn't think the Wikipedia page would be in need of an expert on the subject.

  9. Don't. by Anonymous Coward · · Score: 0

    Besides the cost of electricity and cooling (which you will either pay yourself or share with others) the hassle of maintaining your own cluster is not worth it. I set up a purpose-built 50-blade cluster as a grad student and it ate upy time like nothing else. Not a good idea.

  10. Don't waste time on /. by Anonymous Coward · · Score: 0

    Trust me why not try asking in LQ forum? I am sure someone will come up with something good, here in /. filter hundreds of replies / comments for answer to your original question :)
    Some comments will let you thinks if the poster is fucken drunk or in sleep while posting comments.

    1. Re:Don't waste time on /. by Anonymous Coward · · Score: 0

      Glad you threw that random smile face there to make your comment whimsical.

  11. Probably not worth your time by MetricT · · Score: 5, Interesting

    I've been working in academic HPC for over a decade. Unless you are building a simple 2-3 node cluster to learn how a cluster works (scheduler, resource broker and such things), it's not worth your time. What you save in hardware, you'll lose in lost time, electricity, cooling, etc.

    If you're interested in actual research, take one computer, install an AMD 7950 for $300, and you will almost certainly blow the doors off a cluster cobbled from old Core 2 Duo's, and you'll save more than $300 in electricity.

    1. Re:Probably not worth your time by Anonymous Coward · · Score: 0

      Is OpenCL the choice for the 7950? If so, how accessible is this for learning v cuda?

    2. Re:Probably not worth your time by Anonymous Coward · · Score: 0

      They probably have no budget for additional computing hardware but instead are capable of sinking electricity and facility costs into the operating costs of the institution. That said, the 28 old cores can serve as a maker movement style project for learning to build bigger things out of unreliable components in the future.

    3. Re:Probably not worth your time by ILongForDarkness · · Score: 1

      Absolutely right about HPC users. Unless you are a gluten for punishment generally you need to get results fast before you know what is next. So users will avoid your cluster nodes because they can get 2-3X the speed from a modern desktop. What you will get is the people that have an endless queue of serial jobs (been there my last computational project was about 250,000 CPU hours of serial work) but generally you'll have a lot of idle time. People will fire off a job and it will finish part way through the night. Your system is so slow they won't bother to login to submit new jobs until the morning etc.

    4. Re:Probably not worth your time by MetricT · · Score: 1

      It depends very specifically on the application. There are some fields that are currently tied to nVidia due to "legacy" code (a strange term for code that can't be 1-2 years old) that is written in CUDA. If so, you can buy an equivalent nVidia.

      If you're writing your own app (which if they're studying combinatorics seems likely) then rewriting the core loop in OpenCL is reasonable.

      OpenCL is a higher-level abstraction, and you do lose some performance compared to CUDA, but it's worth it in my opinion simply for portability.

    5. Re:Probably not worth your time by Anonymous Coward · · Score: 2, Informative

      I'm a glutton for correcting grammar mistakes, and I believe you meant to use the word "glutton" where you used the word "gluten." Gluten is a wheat based protein, and a glutton is someone that exhibits a desire to overeat.

    6. Re:Probably not worth your time by serviscope_minor · · Score: 3, Interesting

      I've been working in academic HPC for over a decade. Unless you are building a simple 2-3 node cluster to learn how a cluster works (scheduler, resource broker and such things), it's not worth your time. What you save in hardware, you'll lose in lost time, electricity, cooling, etc.

      I strongly disagree. I actually had a very similar Ask Slashdot a while back.

      The clustre got built, and has been running happily since.

      If you're interested in actual research, take one computer, install an AMD 7950 for $300, and you will almost certainly blow the doors off a cluster cobbled from old Core 2 Duo's, and you'll save more than $300 in electricity

      Oh yuck!

      But what you save in electricity, you'll lose in postdoc/developer time.

      Sometimes you need results. Developing for a GPU is slow and difficult compared to (e.g.) writing prototype code in MATLAB/Octave. You save heaps of vastly more expensive person and development time by being able to run those codes on a cluster. And also, not every task out there is even easy to put on a GPU.

      --
      SJW n. One who posts facts.
    7. Re:Probably not worth your time by MetricT · · Score: 3, Informative

      You *do* know that Matlab has been supporting GPU computing for some time now? We bought an entire cluster of several hundred nVidia GTX 480's for the explicit purpose of GPU computing.

    8. Re:Probably not worth your time by cbiltcliffe · · Score: 1

      So what if you're a glutton for gluten?
      Well, besides the Atkins Diet is probably not for you, that is....

      --
      "City hall" in German is "Rathaus" Kinda explains a few things......
    9. Re:Probably not worth your time by CanHasDIY · · Score: 1

      Unless you are a gluten for punishment.

      If you're anything like my wife, gluten is punishment.

      Thank you, thank you, I'll be here all week! Enjoy the veal!

      --
      An enigma, wrapped in a riddle, shrouded in bacon and cheese
    10. Re:Probably not worth your time by serviscope_minor · · Score: 2

      You *do* know that Matlab has been supporting GPU computing for some time now?

      Yes, but only for specific builtins. If you want to do something a bit more custom, it goes back to being very slow.

      --
      SJW n. One who posts facts.
    11. Re:Probably not worth your time by ThePeices · · Score: 1

      So what if you're a glutton for gluten?

      Then you are a......glutton for gluten?

    12. Re:Probably not worth your time by L4t3r4lu5 · · Score: 1

      I see this as being similar to using a WYSIWYG HTML editor. It's much faster to use Dreamweaver (or whatever the new favourite is) to get your site up and pretty, but it's also limiting and pumps out bloated and inefficient code. It's much more elegant and streamlined to code yourself, as well as more configurable, but it takes more time.

      You do your due diligence, you pick the best solution for you. Same with parallel computing. Same with everything.

      --
      Finally had enough. Come see us over at https://soylentnews.org/
    13. Re:Probably not worth your time by serviscope_minor · · Score: 1

      Same with parallel computing. Same with everything.

      Sure. Ideally, you'd know in advance exactly what jobs you want to run and tailor the hardware to the jobs. Less ideally, you'd have a general idea of the class of jobs.

      I would advocate a cluster of PCs in general. Firstly because it's not easy to GPUify many jobs, and if you can GPUify it then for a few hundred dollars per node, you can upgrade the cluster to be a GUP cluster.

      The nice thing is that schedulers generally understand that machines have different capabilities, so you don't even have to upgrade the entire thing.

      --
      SJW n. One who posts facts.
  12. Just use Amazon AWS by Anonymous Coward · · Score: 0

    It's 2013 don't build your own cluster just use AWS EC2 spot instances.

    1. Re:Just use Amazon AWS by hawguy · · Score: 5, Informative

      It's 2013 don't build your own cluster just use AWS EC2 spot instances.

      An EC2 "High CPU Medium" instance is probably close to his Core 2 Duo's (it has 1.7GB RAM + two cores of 2.5 EC2 compute units each (each ECU is equivalent to a 2007 era 1.2Ghz Xeon).

      Current spot pricing is $0.018/hour, so a month would cost him around $12.96. (not including storage, add about a dollar for 10GB of EBS disk space).

      If his computers use 150W of power each, at $0.12/KWh, they'll cost exactly $0.018 -- the same price as an EC2 instance excluding storage.

      However spot pricing is not guaranteed, so he'll have to be prepared to shut down his instances when the spot price rises above what he's willing to pay -- full price for the instance is $0.145/hour, but he could get that down to $0.09/hour if he's willing to pay $161 to reserve the instance for 3 years.

    2. Re:Just use Amazon AWS by Guspaz · · Score: 2

      The" Cluster Compute" instances might be better suited to cluster computing, although they're not cheap. But a single one of them, a dual-CPU eight core Xeon E5-2670 (dedicated, so they don't list EC2 compute units), probably has more computing power than the entire Core 2 Duo cluster being proposed.

      But as I said, not cheap. It comes out to $400 per month for a reserved instance. A spot instance could be slightly cheaper. Then again, at the 150W of power usage you specified, times 1.8 to use the industry typical datacenter power usage efficiency (which accounts for air conditioner cooling, UPS losses, and other overhead), we get 3,780W, which in a single month is 2721.6 kilowatt hours, and at $0.12 that amounts to $326.59 in power alone!

      So, it seems that the Amazon server at $400 per month, is barely more expensive than the power required to run those 14 Core 2 machines!

    3. Re:Just use Amazon AWS by Cyberax · · Score: 1

      But you're also forgetting that you actually need to buy hardware, network connectivity, fast storage and support all of it. If you factor it in, AWS simply can't be beaten on price. And right now spot market for high-end cluster computing instances is very sweet if you can tolerate (short) periods of unavailability.

      Also, starting 1000 nodes at once for a task is nothing short of awesome.

  13. Sounds interesting... by Mysticalfruit · · Score: 4, Informative

    I'm routinely mounting things in a 42U cabinets that ought not be mounted in them, so I've got *some* insight.

    The standard for airflow is front to back and upwards. Doing some sticky note measurements, I think you could mount 5 of these vertically as a unit. I'd say get a piece of 1" think plywood and dado cut channels 1/4" top and bottom to mount the motherboards. This would also give you a mounting spot that you could line up the power supplys in the back. This would also put the Ethernet ports at the back. Another thing this would allow would be for easy removable of a dead board.

    Going on this idea, you could also make these as "units" and install two of them two deep in the cabinet (if you used L rails).

    Without doing any measuring, I'm suspecting this would get you 5 machines for 7U or 10 machines if you did 2 deep in 7U.

    --
    Yes Francis, the world has gone crazy.
    1. Re:Sounds interesting... by hawguy · · Score: 1

      The standard for airflow is front to back and upwards. Doing some sticky note measurements, I think you could mount 5 of these vertically as a unit. I'd say get a piece of 1" think plywood and dado cut channels 1/4" top and bottom to mount the motherboards. This would also give you a mounting spot that you could line up the power supplys in the back. This would also put the Ethernet ports at the back. Another thing this would allow would be for easy removable of a dead board.

      That sounds like a fire hazard, not to mention a source of dust - do people really put wooden shelves in their datacenters?

    2. Re:Sounds interesting... by Anonymous Coward · · Score: 0

      I've been waiting for a good day to stop reading slashdot. This is it.

      Anyhow, thank you, 533341, for representing an admirable attitude in this thread. Unfortunately the place has been overrun with some pestilent breed of sophisticate.

      My last slash .02 - if no single tasks actually requires a cluster, and you really just want to retain the computing power, consider mounting your mobos in light-box style picture frames, behind nice art or posters. With minor attention to your fan quality and mounting, and replacing platter hdds with ssds you end up with nearly silent computational mass on your walls.

      Best of luck.

    3. Re:Sounds interesting... by cbiltcliffe · · Score: 1

      That sounds like a fire hazard, not to mention a source of dust - do people really put wooden shelves in their datacenters?

      The autoignition temperature for generic cheapo plywood is somewhere on the order of 300 degrees C. If you went with pine, which is still pretty cheap, it goes up to 427 degrees C.

      How hot do you think computers run?

      The dust, I could give you, if the wood used was cheap chipboard, balsa, or something else soft. Something even moderately hard like pine it wouldn't be a problem, as long as you properly cleaned off the sawdust from the cutting process. If you went all out and used oak, it's probably harder than the circuit boards you'd be mounting in it, with the added benefit that it raises the autoignition temperature up to 482 degrees C.

      Of course, you could use some edge rails in the wood, and eliminate the dust problem regardless of wood used, and they'd probably be cheap enough that you could get them through petty cash, and not need budget approvals, too.

      --
      "City hall" in German is "Rathaus" Kinda explains a few things......
    4. Re:Sounds interesting... by hawguy · · Score: 1

      That sounds like a fire hazard, not to mention a source of dust - do people really put wooden shelves in their datacenters?

      The autoignition temperature for generic cheapo plywood is somewhere on the order of 300 degrees C. If you went with pine, which is still pretty cheap, it goes up to 427 degrees C.

      How hot do you think computers run?

      It's not normal operation that would concern me with wooden rack shelves, but failures like this:

      http://www.theregister.co.uk/2012/11/26/exploding_computer_vs_reg_reader/
      http://ronaldlan.dyndns.org/index.php
      http://www.tomshardware.com/reviews/inadequate-deceptive-product-labeling,536.html

      One bad power supply could set the whole cabinet on fire -- and perhaps worse, set off the server room fire suppression system.

  14. Inter-node communication by plus_M · · Score: 4, Informative

    What do you intend to use for inter-node communication? Gigabit ethernet? You need to realize that latency in inter-node communication can cause *extremely* poor scaling for non-trivial parallelization. Scientific computing clusters typically use infiniband or something like it, which has extremely slow latency, but the equipment will cost you a pretty penny. If you are interested in doing computations across multiple computing nodes, you should really setup just two nodes and benchmark what kind of speed increase there is between running the job on a single node and on two nodes. My guess is that you are going to get significantly less than a 2x speedup. It is entirely possible that the calculation will be *slower* on two nodes than on just one. Of course, if you are just running a massive number of unrelated calculations, then inter-node communication becomes much less important, and this won't be an issue.

    1. Re:Inter-node communication by plus_M · · Score: 2

      And of course by "slow latency" I mean "low latency".

    2. Re:Inter-node communication by Anonymous Coward · · Score: 0

      The university of Kentucky built a cluster using a matrix switch layout that ensured there was a single hop between any two nodes. This was excellent for minimizing latency, but a tad complex to wire up.

    3. Re:Inter-node communication by MerlynEmrys67 · · Score: 1

      Actually - by slow latency you mean high latency. High latency is bad like slow bandwidth is bad. You want the lowest latency numbers that you can afford. I know of people that count the speed of light going down a cable in their latency calculations because it matters to them (~ 5uSec/m)

      --
      I have mod points and I am not afraid to use them
    4. Re:Inter-node communication by bmxeroh · · Score: 2

      I think the point was that they had made a typo in saying "slow latency" and really meant to type "low". But thanks for explaining exactly what they didn't mean.

      --
      Central Ohio Home Theater Installation - The Theater People
    5. Re:Inter-node communication by Anonymous Coward · · Score: 0

      10Gb infiniband is getting pretty cheap on ebay... I almost bought some just because. 20 is still a little more than i am willing to pay for on there.

    6. Re:Inter-node communication by Anonymous Coward · · Score: 0

      goddamn are you dense.

  15. Reliability, space, and efficiency by Peter+Simpson · · Score: 2

    It may initially seem like a good idea, but if the population isn't homogeneous, you could find your time eaten up looking for spares. With a single type of PC, a node can be sacrificed to keep others running. But these are systems near the end of their design lifetime (and loaded with dust -- and who knows what else?) so components (fans, HDDs, power supplies) are going to be starting to fail more frequently. And the rats' nest of power cables! Perhaps a bunch of multiprocessor, multicore server blades would be a better choice? They go pretty cheaply, and you'd get more cores per power supply, and use less floor space to boot, by rack mounting them.

    Scientific American article: http://www.scientificamerican.com/article.cfm?id=the-do-it-yourself-superc

  16. So reusing old hardware by MerlynEmrys67 · · Score: 3, Insightful
    There is a reason that old hardware should be gotten rid of. Depending on the exact config of the 14 servers (processor/whatever) you could probably replace them with 1, maybe 2 servers. The current generation of Jefferson Pass servers hold 4 servers in a 2U sled - so you could replace this whole thing with a 2U solution that isn't exposed the elements like you are proposing. It would be new, under warranty and faster than all get out.

    Your solution will take 14 servers, connect them with ancient 1GbE interconnect and hope for the best. The interconnect for clusters REALLY matters, many problems are network bound - and not only network bound but latency bound as well. Look at the list of fastest supercomputers and you will barely see Ethernet anymore (especially at the high end) and definitely not 1GbE. Your new boxes will probably come with 10GbE that will definitely help... Especially since there will be fewer nodes to have to talk to (only 2, maybe 4)

    The other problem that you will run into is your system will take about 20x the power and 20x the air conditioning bill (yeah - that is a LOT of power there), the modern new system will pay for itself in 9-12 months (and that doesn't include the tax deduction for donating the old systems and making them Someone Else's Problem)

    Recycling old hardware always seems like fun. At the end of a piece of hardware's life cycle look at what it will actually cost to keep it in service - Just the electricity bill will bite you hard, then you have the maintenance, and fun reliability problems.

    --
    I have mod points and I am not afraid to use them
    1. Re:So reusing old hardware by Anonymous Coward · · Score: 0

      This. These old e8000 machines will get killed by a 1/3 as many Ivy Bridge boxes. That alone will pay for itself very quickly.

    2. Re:So reusing old hardware by Anonymous Coward · · Score: 0

      Mind explaining to us how 1-2 high class servers will allow you to learn how a large cluster functions together?

      Not performing actual work, not acomplishing any goal outside of learning, not running software, not running OS's, not learning virtualization... But learning HPC clustering specifically?

      Sounds like you are attempting to get him to spend a few thousand dollars on something that will accomplish zero goals, instead of spending a few bucks on power and spending some time to accomplish all the stated goals.

    3. Re:So reusing old hardware by MerlynEmrys67 · · Score: 1
      What I am saying is the cost of running these 14 nodes will quickly cost more than the cost of running a 4 node cluster that will provide better performance. Server systems (especially OLD server systems) are real power hogs. When you are drawing close to 500W/node that is 7KWatt running 24/7. All of that power adds up. On top of that the stated goal was space saving. He is taking a whole rack to provide the compute power of ~4U. The new server approach provides a 10x savings in space, a 10x savings in power cost - and another 10x savings in cooling cost (the other big cost to running a cluster).

      It is nice to put old hardware to use occasionally - it is almost never cost effective to put old hardware to use, people don't realize that the main cost of a cluster is not acquisition cost, but cooling and power.

      --
      I have mod points and I am not afraid to use them
    4. Re:So reusing old hardware by Anonymous Coward · · Score: 0

      ..but the point of it is to learn how to do it, not to build a high end system.

  17. cabinet UPS by Anonymous Coward · · Score: 0

    You're wasting your time with UPS if you don't have a Cabinet sized supply. The MTBF, maintenance, efficiency etc just doesn't make sense.
    Put real money into making your cluster redundant or don't have a UPS at all.

    You ought to consider what the cost of doing this with Amazon S3 or similar services might be.
    I have a feeling you don't have any specific computation goals though, so it will be difficult to measure success.

    ( former builder of a 80 node 2 way Pentium III 1GHz cluster back in the day.)
    Clusters are still very valuable, but be sure and accurately describe the computational cost of what you have planned becuase as you're building your cluster, prices of current tech keep getting so cheap that you might be able to just sell your equipment and lease time on someone else's HPC for half the money.

  18. sell them and buy new.... by Brit_in_the_USA · · Score: 5, Informative

    SPECfp2006 rate results:
    e8600 34
    i7-3770 130
    x4 the performance

    ...sell the E8xxx series PC's in boxes for$100 a peice with windows licence
    and use the $1400 towards buying Qty.4 lga1155 motherboards (4x$80), 4 unlocked K series i7's (4x$230) and 4x8Gb of DDR3 RAM (4x$40), 4x ~3-400W budget power supplies (4x $30) = $1520

    Use a specialized clustering OS (linux) and have a smaller, easier to manage system, with lots more DDR 3 memory and lower electricity (and AC electricity) bill....

    1. Re:sell them and buy new.... by Anonymous Coward · · Score: 0

      This is probably the best answer.

    2. Re:sell them and buy new.... by Anonymous Coward · · Score: 0

      Why not buy AMD 8320 CPU's. Usually as fast or faster in HPC workloads as a 3770, but for half the price. $40 mobo, $20 4gb ECC ram and $40 SSD (to reduce network load or more ram what ever you want). You could run 2-4 off a single regular PSU.

        You could build a box with say 10 CPU's (80 Cores) for what, $2k? Which would be 150% faster than your suggested boxen.

        Agreed though, new boxen = win. Computing is so cheap now. A single 3770 or 8350 would be as fast as 4 older dual core setups.

  19. Donate them to local charity by Anonymous Coward · · Score: 0

    I know that as a win7 desktop / office use those machines will still work fine. And I am fairly sure a local Boys & Girls Club / YMCA / choose your charity would take them, even if for re-donation to their clients.

    Not sure if .edu's need tax write offs, but at least they will go to a better use.

    Then get a modern high end video card for less than this will cost to build, use it for compute, and have a faster end solution.

  20. Not another cluster... by bobbied · · Score: 2

    Unless you have a large number of identical machines capable of PXE booting and the necessary network hardware to wire them all together, you are really just building a maintenance nightmare. It might be fun to play with a cluster, but you'd do better to buy a couple of machines with as many cores as you can. It will take less space, less power, less fumbling around with configurations, less time and likely be cheaper than trying to cram all the old stuff into some random rack space.

    If you insist on doing this, I suggest the following. 1. Only use *identical* hardware. (Or at least hardware that can run on exactly the same kernel image, modules and configurations) with the maximum memory and fastest networks you can. 2. Make sure you have well engineered Power supplies and cooling. 3. PXE boot all but one machine and make sure your cluster "self configures" based on the hardware that shows up when you turn it on because you will always have something broken. 4. Don't use local storage for anything more than swap, everything comes over the network... 5. Use multiple network segments, split between storage network and operational network.

    By the way... For the sake of any local radio operations, please make sure you don't just unpack all the hardware from it's cases and spread it out on the work bench. Older hardware can be really big RFI generators. Consider keeping it in a rack that offers at least some shielding.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
  21. Cold side, hot side by raymorris · · Score: 1

    Would it be best to orient the shelves (and thus the fans) in the same direction throughout the cabinet, or to alternate the fan orientations on a shelf-by-shelf basis?

    Keep them all the same, so that the system works as one big fan, pulling cool air from one side of the cabinet and exhausting hot air from the other. It's easiest to visualize if you imagine the airflow with a simple scenario. Imagine you had all of the even numbered shelves facing backward, blowing hot air to the front of the rack, while all the odd numbered shelves were trying to suck cool air from the front. That would totally fail because the odd numbered shelves would be sucking in hot air blown out from the even ones and vice-versa. You'd just be blowing hot air around the rack, not moving air through the rack. The same generally applies to other less simple configurations - if different units are arranged differently, they'll work against each other to some extent, rather than working as one team.

  22. Watts X 3 = BTU by raymorris · · Score: 1

    ave a 2 ton (24000 BTU) air-conditioner which will be able to maintain a cool room temperature (the lab is quite small)

    1 BTU is 0.29 watt/hour. So take your total power usage and multiply by three. That's how many BTU of heat the rack will diisipate (all power eventually turns to heat). That's how much ADDITIONAL cooling you'll need beyond what's already used to keep the room cool.

    1. Re:Watts X 3 = BTU by Anonymous Coward · · Score: 0

      proper units nazi edit
      1 Btu. = 0.29 Wh

    2. Re:Watts X 3 = BTU by Anonymous Coward · · Score: 0

      So the air-conditioner is a "6.96 kWh" air-conditioner. What does that mean? That it will remove 6.96 kWh of heat in its lifetime?

  23. Better ways to spend your time by Anonymous Coward · · Score: 0

    Don't build it, rent it. For the cluster size (number of cores) you are proposing, it will be much faster, easier, and cheaper to rent the resources you need from Amazon Web Services. Then use MIT StarCluster to build the software infrastructure, run your cluster jobs, and shut the whole thing down. If you want to learn about building small clusters, that's a fun academic exercise. If you want to get work done, rent a cluster by the hour.

  24. Sounds like a waste of time by sdguero · · Score: 1

    Messing with old hardware to try and make it rack mountable? Pfft. Save the effort. Buy a few mid-range servers and you'll get similar compute performance compared to that energy hog of a cluster. If you really want to use that hardware, don't remount it. Just stack the servers in a corner, plug them in, and install ROCKS. It's still gonna be an energy hog and have crappy performance though.

    1. Re:Sounds like a waste of time by Anonymous Coward · · Score: 0

      Exactly. Disassembling everything from the cases is a waste of time. Seriously, OP, you think you can design better airflow for this concept than Dell already has when they made the machines? I doubt it. Plus, looking at those Optiplex workstations, which I've disassembled before, I know you're gonna have a bad time trying to mount that CPU fan. Not to mention you're going to have to somehow create a mounting mechanism with stand-offs for the motherboards, oh, and since it's Dell they won't follow any of the ATX/mATX/BTX standards for mounting holes.

      tl;dr - It's not worth your time. Donate them to a school or something and just buy new systems.

  25. It depends on the problem. by Anonymous Coward · · Score: 0

    A few people are saying don't bother. I'd like to extrapolate on that a little.

    If your problem is embarrassingly parallel, you might get some good mileage out of your cluster. If not, don't bother.

  26. cheaper and faster by Anonymous Coward · · Score: 0

    it would be cheaper and faster to replace those 14 E8000 with 4 I7-3900 with DDR3 - old hardware should be retire, they are a pain to maintain, worse yet
    no one carry those IDE PATA

    1. Re:cheaper and faster by CanHasDIY · · Score: 1

      it would be cheaper and faster to replace those 14 computers you already own with 4 brand new computers whose processors alone cost more than $500 each

      FTFY.

      Strange idea of "cheaper" you've got there.

      --
      An enigma, wrapped in a riddle, shrouded in bacon and cheese
  27. Raspberry Pi. by faldore · · Score: 1

    Raspberry Pi.

    http://www.tomshardware.com/news/Raspberry-Pi-Supercomputer-Legos-Linux,17596.html

  28. 14 cpu's from 5 years ago by viperidaenz · · Score: 1

    Why not give them away and buy 2 i7 26xx or better CPU's for the same performance? You could fit that in 1U instead of a 42U rack. No switch required, smaller UPS required, less aircon load, less electricity.

  29. Microwulf by xkrebstarx · · Score: 2

    Check out the Microwulf work. It's not necessarily what you're looking for, but the community has produced some creative custom cases/racks. It might give you some fresh ideas.

  30. I've built one, it works, but there are caveats by Anonymous Coward · · Score: 3, Interesting

    We have a cluster at my lab that's pretty similar to what the submitter describes. Over the years, we've upgraded it (by replacing old scavenged hardware with slightly less old scavenged hardware) and it is now a very useful, reasonably reliable, but rather power-hungry tool.

    Thoughts:

    - 1GbE is just fine for our kind of inherently parallel problems (Monte Carlo simulations of radiation interactions). It will NOT cut it for things like CFD that require fast node-to-node communication.

    - We are running a Windows environment, using Altair PBS to distribute jobs. If you have Unix/Linux skills, use that instead. (In our case, training grad students on a new OS would just be an unnecessary hurdle, so we stick with what they already know.)

    - Think through the airflow. Really. For a while, ours was in a hot room with only an exhaust fan. We added a portable chiller to stop things from crashing due to overheating; a summer student had to empty its drip bucket twice a day. Moving it to a properly ventilated rack with plenty of power circuits made a HUGE improvement in reliability.

    - If you pay for the electricity yourself, just pony up the cash for modern hardware, it'll pay for itself in power savings. If power doesn't show up on your own department's budget (but capital expenses do), then by all means keep the old stuff running. We've taken both approaches and while we love our Opteron 6xxx (24 cores in a single box!) we're not about to throw out the old Poweredges, or turn down less-old ones that show up on our doorstep.

    - You can't use GPUs for everything. We'd love to, but a lot of our most critical code has only been validated on CPUs and is proving very difficult to port to GPU architectures.

    (Posting AC because I'm here so rarely that I've never bothered to register.)

  31. Summary of Responses by Anonymous Coward · · Score: 0

    1. "As a commentator I reject your premise."
    2. "You shouldn't want what you state you want."
    3. "You should spend additional money to pay for more efficient machines rather than the computer you already have which are paid for, because money grows on trees, I place no value on your learning exercise, and I assume the electricity comes right out of your departmental budget exactly the same way purchase hardware would."
    4. "I will ignore your very specific and detailed description of your setup, because screw you, that's why."

    1. Re:Summary of Responses by MerlynEmrys67 · · Score: 1

      3. "You should spend additional money to pay for more efficient machines rather than the computer you already have which are paid for, because money grows on trees, I place no value on your learning exercise, and I assume the electricity comes right out of your departmental budget exactly the same way purchase hardware would."

      I always love that - I work for someone, the goal is to get the largest value out of the money spent... regardless of who's budget it is. This is how we end up with a bureaucracy that does very stupid things like deploying old hardware that will cost more in 6 months in power than an updated environment will cost including new systems, its electricity and its cooling. The money for power does not come out of trees, it is a real cost to the whole organization

      --
      I have mod points and I am not afraid to use them
  32. Go ask the guys by ArhcAngel · · Score: 2

    Go ask the guys over at Microwulf. They appear to have licked this particular challenge and link to others who have as well.

    --
    "A person is smart. People are dumb, panicky dangerous animals and you know it." - K
  33. Racks... by Hymer · · Score: 1

    Racks are built for air flow from front to back, you'll need to turn the boards 90 unless you remove the side panels... No, you do not want to alternate airflow, you want a hot side and a cool side, it makes cooling easier. If you can, try to vent the hot air out instead of cooling it down, it is cheaper than cooling it down. Btw. did you consider putting 4 or 5 boards vertically in 2 rows behind each other ?

  34. Good Luck by Anonymous Coward · · Score: 0

    I setup a cluster using Beowolf several years ago as it came from my home state. But latency is a HUGE issue with it. Besides all cables have to be the same length, you have to make sure you have 0 latency otherwise you just have a bunch of computers connected each other rather than one cluster.

  35. 1998 called ... by Anonymous Coward · · Score: 0

    and it wants its 'ask slashdot' story back!

  36. PicoPSU? by Anonymous Coward · · Score: 0

    Why not use PicoPSU (160W version is available). And a large 12VDC power supply to drive all of the PSUs at once? It would save space and let you consolidate several large components into one beefy component.

    For interconnect 4Gb fiberchannel is only about $750/each (if you count the card + switch ports). Myrinet used to be the cheap fast way (faster than ethernet), but the cards aren't really available anymore. the intel chipset on your motherboard might be capable of GAMMA or DET, which is probably just as good as Myrinet and a whole lot cheaper.

  37. Erlang, MPI, Loadtesting, etc. by Anonymous Coward · · Score: 0

    Ok, slow computers that probably lack memory. You could run Erlang to scale a system and experiment with fault-tolerance. Virtualization setups are probably not work spending time on. Experimenting with MPI can be fun. To keep the situation simple, use a network boot over TFTP, using a PXE binary so that you have a single node responsible for the image (a lean Linux distribution), all the other nodes can be diskless. You can get the cheap machines up to 1GB probably, and that is enough for experimentation. You will waste money on electricity and cooling, but knowledge is priceless, and you don't necessarily need to run them continuously... especially if you use your cluster for loadtesting (with Tsung) like I do.

  38. Actual experience by Anonymous Coward · · Score: 3, Interesting

    I've done this. Starting with a couple of racksful of PS/2 55sx machines in the late '90s and continuing on through various iterations, some with and some without budgets. I currently run an 8-member heterogenous cluster at home (plus file server, atomic clock, and a few other things), in the only closet in the house that has its own AC unit. It's possible I know something about what you're doing.

    Some of what I'll mention may involve more (wood) shop or electrical engineering than you want to undertake.

    My read of your text is that there is a computer lab that will be occupied by people that will also contain this rack with dismounted Optiplex boards and P/Ss. This lab has an A/C unit that you believe can dissipate the heat generated by new lab computers, occupants, these old machines in the rack, and the UPSs. I'll take your word, but be sure to include all the sources of heat in your calculation, including solar thermal loading if, like me, you live in "the hot part of the country". Unfortunately, this eliminates the cheapest/easiest way of moving heat away from your boards -- 20" box fans (e.g. http://www.walmart.com/ip/Galaxy-20-Box-Fan-B20100/19861411 ) mounted to an assembly of four "inward pointing" boards. These can move somewhat more air than 80 mm case fans, especially as a function of noise. One of the smartest thermal solutions I've ever seen tilted the boards so that the "upward slope" was along the airflow direction -- the little bit of thermal buoyancy helped air arriving at the underside of components to flow uphill and out with the rest of the heated air. I.e., this avoided a common problem of unmodeled airflow systems of having horizontal surfaces that trapped heated air and allowed it to just get hotter and hotter.

    Nevertheless, the best idea is to move the air from "this side" to "that side" on every shelf. Don't alternate directions on successive shelves. If you're actually worried about EMI, then you must have an open sided rack (or you shouldn't be worried). One option is to put metal walls around it, which will control your airflow. Another option that costs $10 is to make your own Faraday cage panels however you see fit. (I've done chicken wire and I've done cardboard/Al foil cardboard sandwiches. Both worked.)

    You should probably consider dual-mounting boards to the upper *and* lower sides of your shelves. Another layout I've been very happy with is vertical board mounts (like blades) with a column of P/Ss on the left or right.

    A *really* good idea for power distribution is to throw out the multiple discrete P/Ss and replace them with a DC distribution system. There's very little reason to have all those switching power supplies running to provide the same voltages over 6 feet. The UPSs are the heaviest thing in your setup; putting them at the bottom of the rack is probably a good idea. They generate some heat on standby (not much) and a lot more when running. Of course, when they're running, the AC is (worst case) also off and at least one machine should have gotten the "out of power" message and be arranging for all the machines to "shutdown -h now".

    You only plan on having two cables per machine (since your setup seems KVM-less and headless), so wire organization may not be that important. (Yes, there are wiring nazis. I'm not one.) Pick Ethernet cables that are the right length (or get a crimper, a spool, and a bag of plugs and make them to the exact length). You'll probably get everything you need from 2-sided Velcro strips to make retaining bands on the left and right columns of the rack. Label both ends of all cables. Really. Not kidding. While you're at it, label the front and back of every motherboard with its MAC(s) and whatever identifiers you're using for machines.

    1. Re:Actual experience by Anonymous Coward · · Score: 0

      Skip the wood and use ribbed aluminum. EMI shielding and fireproof structural support in one. You can probably get it with the rib spacings close enough together to slide the boards straight in without modification and even if you don't a table saw can work aluminum fairly easily. Just don't forget to dust it all out of your hair before handling electronics again.

  39. What's the Goal? by TreeInMyCube · · Score: 2

    Functioning computer systems are rarely useless; the E8000 systems the OP has will run software just like they did a few years ago when they were purchased. The most important question is: what do you want this cluster to do? If you want the experience of building it, including solving the HW issues of racking and stacking, and the software issues of cluster management software, job scheduling and resource management, then don't throw the equipment away. There are many opportunities for making decisions that require problem-solving and resourcefulness. Plenty of FOSS solutions, even while using only the built-in network connections for an interconnect. If you have some HPC or scientific cluster-aware software in mind that you want to run, tailor your software configuration to run that. The folks who built Beowulf clusters in the early 2000s had a goal in mind; often, that goal was to provide an environment to develop their own MPI software to simulate some phenomena they were interested in. Are you a programmer, or want to learn parallel programming? Are you offering your cluster to folks who are learning parallel programming? http://www.open-mpi.org/ has good information, and FOSS implementations for Linux distributions. There are also Windows clustering solutions, if that's what your user base requires; not free, obviously. So, what do you want this cluster to do?

  40. Depends by dbIII · · Score: 1

    If you have datasets of more than trivial size you don't want to be spending time waiting while you are shuffling them back and forth over the internet.

  41. post on the beowulf list by Anonymous Coward · · Score: 0

    Dude, this kind of question is what the beowulf mailing list is all about. Post your questions there and you'll get lots of answers. www.beowulf.org

    bear in mind that because of Moore's law and related phenomena, clustering old computers is usually a bad deal compared to buying a single new computer that's faster and draws a lot less power. However, it's a great way to learn about clustering, about running MPI, etc. and all sorts of cluster management things.

    You're at the sort of nice spot to learn. A 4 node cluster is too small. A 100 node is too big.

  42. It is very sad... by Anonymous Coward · · Score: 0

    That none of these posts are helpful. I would bet the majority of submitters cut their teeth on similar setups.

    And my question, why not do it? Personally, I originally built a ISP out of 45 486s, designed a similar rack and used box fans as circulation , and it worked wonderfully for 7 years. I see no reason it couldnt work for you. I actually built a box, installed a rack into it and using 2x4s mounted L brackets to hold motherboards in place. a box fan on top and UPS's in the bottom, the horrifying part? I had no air conditioner to cool it, and yet I still had minimal hardware issues

  43. openssi. by Anonymous Coward · · Score: 0

    http://openssi.org/cgi-bin/view?page=features.html:

    All of the machines in the cluster see one cluster wide filesystem, one cluster wide process space, cluster wide IPC space, etc. Processes can be migrated around the cluster. TCP/IP is cluster-wide (connections migrate with processes, and the initnode can load balance connections around).

    About the only thing missing is shared memory, where all nodes have direct read/write access to all of the other nodes. For that, you'd need hardware set up for "NUMA", such as the high end stuff sold by HP, Cray, IBM, etc.

    As much as Slashdot loves Beowulf, it's just an implementation of a 70's era design, of writing software explicitly for a library that passes messages. OpenSSI, like a NUMA cluster, looks like one big Unix machine to software. Processes that fork() can migrate or be migrated to other nodes and can continue talking over pipes they already have open, continue reading/writing files they already have open, talking over sockets they have open, etc.

  44. why not build a virtual datacenter? by Anonymous Coward · · Score: 0

    I agree with those that propose new cpu to have some mips/flops.
    less power, less space, less maintenance.
    I see some usefulness in building a farm for kvm/xen virtual machine.
    having lot's of motherboard let you better distribute load and support
    hw failure. you could use sheepdog for storage and the 1Gb should be
    enough. the virtual farm give you a tool to quickly deploy different server
    for test/experiment. still you can do some number crunching just to test
    you software and when stable run it on better performing (flops/watt)
    hw.

  45. Why? by Vrtigo1 · · Score: 1

    Unless you're specifically undertaking this project to learn more about building a cluster, don't build a cluster. Over time it would be cheaper in terms of power, cooling, manpower and space to toss the old equipment and replace it with something more powerful, or better yet just toss everything and spin up cluster resources on a cloud platform as needed. AWS, for example has very good support for cluster computing and can put you in or very near supercomputer territory for $1,000/hr.

  46. Better than colling, extract the hot air by Anonymous Coward · · Score: 0

    If possible, it's always better to extract the hot air instead of warming up the room and then cooling it down. Do you have any air extractor that you can use? Have you considered isolating hot and cold areas? A bunch of extractors and tubes with plastic panels will be sufficient.

    Keep in mind that if this is commodity hardware, it may as well be possible that if you get rid of the heat, you can actually run the cluster at ambient temperature. That will save you a good amount of money every year.

    About other settings, I'd go for a NFS with DRDB and HA for /home and /scratch (if you use one). Perhaps a controller node with WOL and a few modifications on the job scheduler may allow you to boot/shutdown the nodes on demand. SLURM+MUNGE would be my queue manager/scheduler of choice: it's extremely simple and powerful.

    X.

  47. As someone who's built and manages clusters... by whitroth · · Score: 2

    Pulling the system out of the case seems... odd. Are you that short on space that you can't have another rack?

    Several reasons:
          1. dust
          2. static
          3. a. cooling: real servers have plastic shrouds to guide the air from the fans through the heat sinks. Without that,
                      the cooling won't be anywhere near as good, and possibliy not good enough to keep them from shutting
                      down when they're being run hard.
                  b. DO NOT ALTERNATE directions. In data centers, in server rooms, etc, you have all in a row facing the same
                    way, and blow your cool air towards the front, and let it get somewhat warm behind. This is how they're designed
                    to be used.

    UPSes on the bottom: sure. I've put some in the middle of the rack, but those are rack-mount. MAKE SURE that you leave clearance to open 'em up when you need to replace the batteries.

    NOTE: when you buy replacement batteries for these UPSes, UNDER NO CIRCUMSTANCES BELIEVE ANY MANUFACTURER OR RESELLER. TELL THEM THAT IF THEY DON'T SEND YOU HR - HIGH RATE - BATTERIES, YOU WILL SEND THEM BACK. APC rackmounts WILL NOT ACCEPT *A*N*Y*T*H*I*N*G* but an HR battery, and continue to tell you that you need to replace if, forever.

    I'm assuming you'll be running linux. I'm also assuming that you're using this for heavy duty computing, not load balancing or H/A (high availability).

    For clustering, also check out torque, which is a standard clustering package, though it does need the jobs to be parallel processing aware.

    For the person who mentioned "time" as a cost: I'd assume that the OP was asked to do this "as time permitted", and is certainly something to do that's useful, as opposed to playing solitaire, waiting for something to need work....

                      mark

  48. Plan9 by SampleFish · · Score: 1

    Whatever happened to Plan9?

    If you are serious about this project I would use Plan 9 because it is designed to use all of your hardware transparently. They can always use more members in this small community. You might find this underrated platform quite delightful:

    http://plan9.bell-labs.com/plan9/

    Ignore all the naysayers. Just having experience with Plan 9 makes this experiment worth it.

  49. Diskless?? by niftymitch · · Score: 1
    Diskless oh my... that tells me that they have a gosh-gob lot of memory.

    I have never seen diskless clusters work very well. As soon as they run close to using the system memory (very easy) they go sideways down a GigE link. I have been wrong so sure -- go try it.

    You will notice that all motherboards have mounting holes in about the same place. Bolt them together and have at it. Airflow is a royal pain to control without a chassis. Memory and other large packages in the system often have air flow/cooling issues once the board leaves a standard chassis. The closer together the boards the higher the air flow velocity can be so that may prove to be a plus.

    Also look at ROCKS clustering.

    You can just stack what fits on shelves and benches then get started. That will tell you if the performance is even close to your needs.

    As you benchmark your codes you may find that more than N boxes no longer provides any speedup and compacting boards together to gain a couple U has little value.

    Measure and Track AC power as well as cooling. These are not free and if you can jump a couple generations the cost of upgraded hardware vanishes when AC and cooling are considered. It may be that AC+cooling is a fixed cost to you and thus a don't care. But measure...

    --
    Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
  50. the pinnacle of home built awesomeness by crutchy · · Score: 1
  51. Why not use them? by Anonymous Coward · · Score: 0

    Most of the posts say throw them away, they're not worth the cost of operation. They assume there is $$$ for buying something new.
    Some are saying it's worthwhile as a learning exercise.

    Power: At most places I've worked, my department did not pay for electricity. So from my dept's point of view, it's not a concern. If I was running it at home or in a colo or had to account for power, it would be a concern. So paying for the new with savings isn't a solutions.

    Cooling: If the room isn't overloaded, you don't need to buy more AC. It sounds like the OP had enough extra. If you have to worry about the cooling because you get charged or have to add more, it's a concern.

    CPU Cycles: Anything modern, desktop will blow this away. So will GPU cards. But they have to be budgeted, approved and purchased. I've been places where that is a minimum of 6 months. Lots of things got expensed on a credit card. As a learning exercise, the speed isn't the issue. You want it to work like a real system.

    Rack space: many places I worked had extra real estate for spare machines.

    Labor: There are *many* managers that will spend $5000 in labor to save a $1000 PC. For DIY or student work, labor is free.

    If learning is the only need you could simulate a cluster with a virtual environment with multiple VMs as nodes on a modern CPU. I wonder if a new desktop system w/ a quad core CPU, lots of RAM (enough for each node, plus the host) and disk to put images on would work for student programmers.

    It does sound like there is some $$ here for a UPS, a rack, a network switch in the rack and labor to build it. If $$ is tight, I'd compare that $$ against a VM host w/ multiple virtual nodes. You'd lose the learning experience of physical building, but you'd save future labor on maintaining old hardware.