Slashdot Mirror


Ask Slashdot: Best Use For a New Supercomputing Cluster?

Supp0rtLinux writes "In about 2 weeks time I will be receiving everything necessary to build the largest x86_64-based supercomputer on the east coast of the U.S. (at least until someone takes the title away from us). It's spec'ed to start with 1200 dual-socket six-core servers. We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs. So, what's the best Linux distro for something of this size and scale? Any that include a chargeback option/module? Additionally, due to cost contracts, we have to choose either InfiniBand or 10Gb Ethernet for the backend: which would Slashdot readers go with if they had to choose? Either way, all nodes will have four 1Gbps Ethernet ports. Finally, all nodes include only a basic onboard GPU. We intend to put powerful GPUs into the PCI-e slot and open up the new HPC for GPU related crunching. Any suggestions on the most powerful Linux friendly PCI-e GPU available?"

71 of 387 comments (clear)

  1. Lost some funding? by turkeyfeathers · · Score: 5, Funny

    Start with the cheapest backend that'll get the system up and running, then use your supercomputer to mine Bitcoins for a few days, then use all the money you'll make to buy the InfiniBand backend (you'll probably have enough money left over to buy Monster cables to hook everything up).

    1. Re:Lost some funding? by Anonymous Coward · · Score: 3, Informative

      Maybe the mods are a little more aware than you of the engineering and scientific FACTS about Monster Cable. Some things that you said:

      Monster cables are only worth the investment for speakers and line-level / mic stuff (i.e. analogue signals). [...] But 44.1KHz 16-bit sound, converted to analogue in the transport and sent to the amp via line leads WILL benefit from Monster / premium cables, as will speaker cables of any kind.

      are, I'm afraid, complete nonsense. Counterfactual, in fact. And yes, there's real science to support that. Let me gloss over it...

      A 44.1 kHz sample rate before the DAC means the maximum frequency component the cables need to handle is 22 kHz. (This is due to the Nyquist limit, as in the Nyquist-Shannon Sampling Theorem.) 22 kHz is low. Really low. Practically any old piece of wire can carry audio frequencies with perceptually flat response across the audible range and nearly no loss as long as the cable lengths are as short as they are in a typical home stereo system. The only thing you need is large diameter wire for your speaker cables to ensure they're very low resistance so that the higher currents involved in powering a speaker don't cause resistive loss in the cable.

      As for low-power line level signals (such as CD player to amp), the most likely source of problems is actually ground loops, where the source equipment has a different ground reference than the destination. (A lesser concern is interference.) The pros don't solve this with stupid Monster Cable, they solve it by using pro equipment with balanced (differential) signaling, which both eliminates the need for the source and destination to have a common ground and provides some noise immunity.

      For home stereo systems, however, making sure that everything is grounded to the same point (3 prong plugs all plugged into a single grounded power strip) is generally good enough, and noise is rarely (if ever) a significant problem.

    2. Re:Lost some funding? by webmistressrachel · · Score: 2

      To be fair, some people make very good recordings of power tools! And if I'm a "proper" audiophile I'll still enjoy it, will I?

      --
      This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
    3. Re:Lost some funding? by Radworker · · Score: 2

      Not to mention balanced input power and line conditioners where appropriate. Audiophiles can go to extremes to get that last 2%.

    4. Re:Lost some funding? by ls671 · · Score: 2

      Maybe he is, I have always assumed that on Slashdot, nicknames like "webmistressrachel" could very well be owned by males ;-)

      --
      Everything I write is lies, read between the lines.
    5. Re:Lost some funding? by Arrepiadd · · Score: 2

      I work in computational chemistry and there's currently two or three codes out there using the GPU. Granted that number will only increase, but at this point having GPUs is almost useless (these codes don't do 10% of what other codes, or a combination of them, can do.

      Your mileage may vary, but assuming someone is a moron just because he isn't doing what fits you perfectly is moronic itself.

    6. Re:Lost some funding? by Outtascope · · Score: 2

      I would be more interested in if my spectrum analyzer could tell the difference. I know how I would bet and it wouldn't be with the audiophile.

      Why does everyone keep spelling alchemist incorrectly?

  2. I call Shenanigans!!! by sconeu · · Score: 5, Insightful

    No way in hell a project that big gets approved without a rationale.

    And no way in hell the administrator of such a project would ask Slashdot what to do with it.

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    1. Re:I call Shenanigans!!! by Anonymous Coward · · Score: 2, Informative

      Truth!

      Two weeks away and still at the “thinking of cool shit to use it for” and “picking out hardware” stages? How does that even happen? Is this some kind of tax scam to burn as much money as possible?

      I get that the submitter already have a primary use... but I imagine if I was ever given that kind of budget I’d probably have to account for every CPU cycle every hour of the day (especially since I’m a programmer and should have no business with something like this ;p). I can’t imagine a budget for something like this comprised of “and hopefully we’ll be able to recoup the millions of dollars by leasing it out to some TBD people”.

      Also, the first person to mention bitcoin as an option gets to have their teeth rotated. I’m not joking.. we will find you..

    2. Re:I call Shenanigans!!! by Anonymous Coward · · Score: 2, Funny

      Yes, it probably is a tax scam. It is now the US Federal Year End. Someone wrote a really good funding proposal and got it approved to get money for a HPC cluster to do *something*. Doesn't really matter. The grant application will have focused on broad ideas like # of cores and what not and not the details. A bit surprising that the network wasn't spec'd because that is such a major cost item, but whatever, maybe the grant application's work loads are not network bound.

      So, now that the money is approved the task to build the thing falls to the inexperienced IT group who make all kinds of dumb choices, then will claim they are massively over worked/underfunded trying to get the thing to run and end up with shitty performance and a lot of wasted time and money. Oops. Your Money At Work.

      The system you spec'd should be around the 250TF range, if you set it up properly with QDR IB and do all the work to get MPI optimized. If you are good. The correct way to design the network is to match a 36 port IB switch with 18 servers, and then correctly spread the resulting 1200 uplink ports across a pair of 686 port core switches. The cost of IB cables alone will be shocking and you'll regret not using a HPC blade server arrangement for this.

      Considering the questions you are asking, you should have gone with an HPC focused integrator that could provide the full system for you. 4 GIGE's on every box? Waste of money. The IB equipped blade designs from SGI/Bull, etc are very nice, space and power efficient and much more cost effective. They even come with a pre-integrated and tested OS ready to go, working boot over IB and other long term cost saving features.

      Gosh I hope you bought a few PB of HPC focused storage as well, otherwise you won't find anyone who can even use your machine for their problems.

    3. Re:I call Shenanigans!!! by AdamHaun · · Score: 2

      It did have one. Right there in the submission:

      We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs.

      --
      Visit the
    4. Re:I call Shenanigans!!! by DrgnDancer · · Score: 2

      Also who the Hell buys hardware like this without vendor support? OS and backend choices should have been part of integration from the vendor. No one buys 3000 rack mount servers, a bunch of switches, some racks and some storage and builds "the largest x86_64-based supercomputer on the east coast of the U.S."

      OP, if you are in anyway serious about this stop now. You don't want the largest supercomputer on the East Coast, you want a computer that works. Call SGI, IBM, Cray, or even (ewww) Oracle/Sun and get them to sell you a smaller system with full integration support. Trust me, I've done HPC, been there, done that, literally got the t-shirt (several of them). Even with integration support you'll be lucky if the vendor gets the thing up and running on schedule and as advertised. There's a thousand blades, 200 switches switches, a million cables... get the experts to figure out integration and you'll be a much happier camper.

      --
      I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
    5. Re:I call Shenanigans!!! by Doc+Ruby · · Score: 2

      They are buying a supercomputer because their lucrative medical research is too big for the smaller HPC, but not (yet) big enough for the biggest supercomputer of its type in the region. So they're also looking for some other apps to use the extra capacity instead of it going to waste.

      That might not be true - this is just a Slashdot assertion. But there's nothing inconsistent in there to suggest it's false. It's perfectly plausible.

      You are just one of the modern type of people who make up your mind on your preconceptions, say something out loud, then refuse to listen to any reason you could be wrong or might reconsider. Denial feels so powerful, who cares what's true, right?

      --

      --
      make install -not war

    6. Re:I call Shenanigans!!! by Doc+Ruby · · Score: 2

      Whether or not this is a true story, or whether or not it's a government project, there is as much budget-reserving in private industry like what you described as there is in government. Probably more, since government is more transparent than private business, and so more people have access to exposing that little game, which tends to inhibit it some.

      --

      --
      make install -not war

    7. Re:I call Shenanigans!!! by Nimey · · Score: 2

      Modern? Your faith in your elders is cute.

      --
      Hail Eris, full of mischief...

      E pluribus sanguinem
    8. Re:I call Shenanigans!!! by mapsjanhere · · Score: 2

      Everyone assumes this is a government funded project. I see an administrator at a start-up, running a bunch of promising biochemical/medical simulations stuff on a 20 machine cluster using some linux-based code. Now they got some serious venture capital investments, and venture capital wants fast scale-up for fast flipping. If the researchers say they can do their work in 3 years on the 20 machines or in 1 year with a couple million in new hardware, the couple million will not even cause a blink to a major investor. So the hardware gets ordered, and someone runs the numbers and finds out they don't have enough stuff to run on their machines, the machines not only cost a couple million but also serious dollars in maintenance/cooling etc, lets find something commercial to do with them (also generating a nice non-project bound income stream). And now the admin realizes that while their code runs mostly on the local nodes since was written and optimized 15 years ago on Redhead 5.0, that won't impress people who pay the big bucks for extra cycles. And since they told everyone "the OS is free, it's Linux" they don't have money and don't have time to get someone to explain all the little details of running a HPC for profit. Voila, your slashdot post of "Help, I'm in an airplane, how do you land this thing".

      --
      I'm aging rapidly, I bought a new game and had no idea if my machine was good for it.
  3. Ummm two things by Sycraft-fu · · Score: 3, Insightful

    1) Something with 10gb really isn't a "supercomputer" it is a cluster. Fine, but call it what it is. I really wouldn't call a cluster with Infiniband a supercomputer either.

    2) You really should maybe get someone who knows more about your project and someone who knows more about clusters/supercomputers. The questions you are asking are not ones I would want to see form the guy making the choices on a multimillion dollar project.

    1. Re:Ummm two things by Anonymous Coward · · Score: 2, Interesting

      You clearly have no idea what you're talking about. I was just part of a million-euro EU project consisting of a large partnership of universities and companies. Given the fact that none of them ever did anything, my professor gave up and defined the project on his own.
      I coded the entire project on little more than minimum wage while I was also attending classes. I managed a couple of helpers who did web design and documentation, and dealt with the rest of the partners on my own, even interacting with fancypants EU higher-ups at some point. I was also in charge of administrative work such as financial reports. I dealt with the university accounting department directly as well as their administrative staff. I booked flights and physically walked over to the traveling agency. I represented the project at every single conference where it was demo'ed. As part of its end goal of meeting an audience target of a few thousand people, I took the initiative of aggressively promoting the project and was met with huge success.
      The vast majority of the cash was spent on people who did absolutely nothing other than throwing one or two opinions in the 18 months the project lasted. Our university's share was used to buy new chairs and tables and repaint the walls etc.
      Life in academia is serious research. Very serious. Investing in "science" will solve the world's problems.

    2. Re:Ummm two things by Anubis350 · · Score: 2

      1)You haven't been to any computer conference (like, say, SC) have you? or worked on a supercomputer? Most supercomputers these days are clusters, and hell, one of the most common interconnects is still gigE, not even 10gigE, though that's slowly changing (check the top500 stats if you don't believe me, but I've been at SC's top500 announcement every year for the past 4, and it's been mentioned each time. For that manner I run jobs on a gig based cluster everyday, and for many types of work it's not necessarily a hangup).

      2)I'm going with "article is fake", no-one commits the resources to spec, build, and power a cluster of that size without a projected use. You should see the hoops you have to go to to spec machines a fraction of the size ::shudders::

      --
      "goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
    3. Re:Ummm two things by Sycraft-fu · · Score: 2

      They may call them "supercomputers" but in my mind that is mislabeling things. They work for cluster operations, where there's not a ton of inter-node communication and no need for access to memory outside your node. Well, that is what supercomputers were made for. So in a real supercomputer, you have the ability to do that. That is also why real supercomputers cost more.

      I think it is an important distinction for that reason. While a supercomputer can do all a cluster can, the reverse is not true. Same with distributed computing vs a cloud. If you have something that takes basically no inter node communication, just occasional communication with a server, then you can distribute it all over the net, using low bandwidth links, unreliable nodes, and so on. A cluster can do that stuff too, but there are things a cluster can do that cannot.

  4. Uh oh.. by joib · · Score: 4, Insightful

    Shouldn't you have figured out answers too all these (simple) questions before ordering several million $$$worth of hardware? Sheesh.. As for you specific questions: - IB vs. 10GbE: IB hands down. Much better latency and more mature RDMA software stacks (e.g. for MPI and Lustre). Cheaper and higher BW as well. - GPU: NVidia Fermi 2090 cards. CUDA is far ahead of everything else at the moment.

    1. Re:Uh oh.. by Savantissimo · · Score: 2

      I'll assume you know more about this than me, but he did say that the nodes are going to be wired with 4x GigE. Might there be a penalty bridging from that to IB rather than 10GigE?

      Anyway, to get low latency those GigE links to the nodes need to be optimized. I thought this was interesting:

      High performance network technologies such as InfiniBand use a kernel by-pass method to improve performance. This capability is also available for Ethernet, but is not widely used outside of the HPC community. One such methodology is Intel® Direct Ethernet Transport (DET), which works by providing a User Direct Access Programming Library (uDAPL) interface like InfiniBand. uDAPL defines a single set of user APIs for all Remote Memory Direct Access (RDMA)-capable transports. DET includes a kernel module and an uDAPL library for Ethernet and will work on almost any Ethernet NIC. It can be linked with any software requiring a uDAPL library, such as an MPI version.

      Another popular kernel by-pass effort is the Open-MX project. Open-MX is based on the Myrinet MX protocol. Essentially, any software that links to the Myricom MX library should be able to link with Open-MX. Currently, Open MPI, MPICH2, and the PVFS2 file system have all been shown to work with Open-MX. While Open-MX will work with almost all GigE and 10-GigE chip-sets without modifying drivers, it does require kernel 2.6.15 or higher to work. Depending on the chip-set Open-MX latencies as low as 10 seconds for GigE have been reported.

      (From The Ethernet Cluster

      For 10GigE here's a recent low-latency benchmark:
      Audited STAC-M2 Benchmark of IBM LLM on an IBM-BNT G8264 switch, using IBM x3550 servers and Mellanox MNPH29C-XTR ConnectX®-2 EN with RoCE
      "Using standard Ethernet and RoCE protocols, at the base message rates set by the specs, the mean latency of the solution did not exceed 7 microseconds, while standard deviation of latency was measured at 1 microsecond. At the highest tested rate of 2.3 million messages/second, the mean latency of the solution was just 13 microseconds while the standard deviation of latency was measured at 2 microseconds."

      Chelsio claims 3 microsecond latency using RDMA over 10G Ethernet on their "T4" model: "Chelsio T4 Unified Wire adapters can run iWARP RDMA, TCP, iSCSI and FCoE simultaneously with full offload and deliver full wire speed throughput and extremely low latency between the computing nodes, the storage resources, and the user and cluster management nodes in any HPC environment." Not sure how much that really costs compared to IB, though.
      They also say:

      "Since IB lacks congestion management and adaptive routing, it quickly hits hot spots even in clusters of moderate size. iWARP over Ethernet, in contrast, achieves reliability via TCP, which results in a lower effective latency for useful applications."

      *"10Gb IB link is effectively 8Gb. Furthermore, InfiniBand cards, like Ethernet cards, are limited by PCIeGen2 x8. Independently of how many 10Gb or 40Gb ports an adapter exposes, the aggregate bandwidth is limited to about 26Gbps in each direction. Therefore, Chelsio’s T4 based adapters and the leading IB adapters offer the SAME bandwidth."

      *"Ethernet switch port prices have reached parity. The same can be said about adapter prices. However, an IB cluster further requires an Ethernet switch for management, a gateway for routing, and expensive IB storage available from a limited set of suppliers, as well as specialized IT personnel."

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    2. Re:Uh oh.. by Savantissimo · · Score: 2
      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
  5. Crysis 2 by Arnos · · Score: 2

    Perhaps this can actually run (gasp) Crysis?

  6. Re:So little detail by oobayly · · Score: 2, Funny

    Indeed, it's a bit like somebody writing in to Dear Deirdre and saying "I've a 13 inch cock, how can I make girls aware of this, and what's the best way to make use of it?"

  7. Pong by Vandilzer · · Score: 2

    One really smooth and acuter game of pong! or asteroids if that suits you fancy... though it will require a bit more computing power :)

  8. Re:Riiiiight by PPH · · Score: 3, Funny

    Happens to me when I visit Costco all the time.

    --
    Have gnu, will travel.
  9. EPIC TROLLING by jpedlow · · Score: 4, Insightful

    Wow, he just TROLLED THE CRAP out of slashdot. We mad, bros!

  10. What we do ... by Anonymous Coward · · Score: 4, Informative

    Similar size setup in bio-informatics in Europe. We run redhat 6.1, was centos 5 and LSF. single 1gbit to each server (blades). No need for 10gb or IB unless huge mpi which no one uses. 32GB to 2TB per node - some people like enormous R datasets. All works well for our ~500 users.

  11. Re:So little detail by webmistressrachel · · Score: 3, Interesting

    No it's not, some really ugly, nerdy guy out there has a big cock and nobody is interested in him - he can't just flop it out in public, so that might be a very real problem for him! Or maybe he does, and girls only want him for that?

    Back on topic, it's not like that at all because the computer is probably real, and if not, it's just another hypothetical "Ask Slashdot" for us to fantasize over. "What would you do if you had...". What's wrong with that? Just my 2 pence!

    --
    This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
  12. Did someone say Bitcoin!? BUY! BUY! by recrudescence · · Score: 2

    Holy crap! Someone mentioned the word "Bitcoins" on slashdot again! It's only a matter of time before its value hits the roof again! Quick! BUY! BUY!

    1. Re:Did someone say Bitcoin!? BUY! BUY! by blair1q · · Score: 2

      Fuck that. What's the ticker symbol for "Beowulf Cluster"?

  13. Monkeys! by eljefe6a · · Score: 2

    How about helping me out with some computing power for my monkeys project? http://www.jesse-anderson.com/2011/08/a-few-more-million-amazonian-monkeys/

  14. Totally believable. by khasim · · Score: 3, Interesting

    I totally believe the submitter's question.

    Next up on Ask Slashdot:
    I just got permission to buy the biggest fleet of trucks on the east coast ... and I was wondering if anyone on Slashdot had any ideas what I should do with them.

    Followed by,
    The company I work for just purchased 10,000 acres of land on the east coast and I was wondering if anyone on Slashdot had any idea what we should do with it.

    Happens all the time!

    1. Re:Totally believable. by blair1q · · Score: 3, Interesting

      Actually, it does.

      I remember taking possession of a spanking-new Thinking Machines cluster some <mumble> years ago.

      The principal investigator got it to do one particular calculation, and promised the excess would be put to good use.

      We spent our time trying to figure out what "good use" meant in that context.

      It hasn't got much easier.

      I say if you run out of numbers to crunch of your own, these days, just hook it up to some lucky grid-computing project and let it swamp the stats.

    2. Re:Totally believable. by blair1q · · Score: 2

      Things like that generally cost more to shut down and power back up than the power you use letting them run the screensaver.

    3. Re:Totally believable. by itamblyn · · Score: 2

      Right, and it's bad to turn off a car even for a second, and you're better off running the AC with the window open.

    4. Re:Totally believable. by ls671 · · Score: 2

      Well, shutting your car down and powering it up excessively will cause a car gas engine to wear faster since it is generally accepted that an important part of a car engine wear occurs when you power it up. For a short period, oil isn't evenly distributed and this cause excessive wear and stress compared to while it is running smootly.

      For the rest things like:
      -"not shutting your water heater when you leave for 3 months will save you money because it will cost more in the end to eat the water when you get back"
      -"it cost less in fuel or electricity to leave and engine running because restarting it will burn so much more -fuel/electricity"
      etc. etc.
      are mostly urban legend.

      --
      Everything I write is lies, read between the lines.
    5. Re:Totally believable. by TheRaven64 · · Score: 2

      The boot time for an SGI Altix is about 6 hours (I was at a fun talk by the guy at SGI doing the Xen port - he'd boot half a dozen machines so that he had one to work on when he'd crashed the last one). If you power a machine like this down when it's idle, the you're basically making it unavailable for a large category of jobs. If you can do the work in 6 hours on your computer or 10 minutes on the supercomputer, it's faster to do it on your computer because the supercomputer will still be booting up when you're finishing.

      --
      I am TheRaven on Soylent News
  15. hardly the biggest by zeldor · · Score: 2

    Amazon's HPC cluster there in Virginia I suspect is way bigger then your little toy..
    plus all the agencies.

    --
    If I could walk that way I wouldnt need cologne.
  16. While I find this highly doubtful.... by xzvf · · Score: 3, Interesting

    I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".

    1. Re:While I find this highly doubtful.... by geekmux · · Score: 2

      I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".

      Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.

      Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.

    2. Re:While I find this highly doubtful.... by kcitren · · Score: 2

      Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose?

      Nope, I never wonder because the answer is obvious. If they don't spend it this year, they won't get it next year.

    3. Re:While I find this highly doubtful.... by Zancarius · · Score: 2

      Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.

      In some parts of the DoD it's so bad that, due to the way the finances work, if there is unallocated parts of the budget they'll be removed for the following fiscal year, sending everyone into a scramble to spend whatever's left of their budget before the axe drops. It's no secret then that most divisions will then spend exactly their share (or request more) simply so that they don't receive a budget shortfall in the case they actually need the money.

      If you think about it, it's really just a symptom of a broken system. Government budgets should probably be based more on need than on historical performance; it makes sense that those divisions who don't really "need" the money this year would be willing to spend it all just on the offhand chance they have a bigger project next year and would otherwise become underfunded.

      Also, there was an excellent article on Kuro5hin a number of years ago detailing why bureaucratic red tape in departments like the DoD often lead to spending more rather than less. I can't seem to find it anymore, but perhaps someone with a better memory than I could link it. I haven't any idea how truthful it was, but I recall that it didn't seem all that unusual.

      --
      He who has no .plan has small finger. ~ Confucius on UNIX
    4. Re:While I find this highly doubtful.... by maxwell+demon · · Score: 2

      Which is total bullshit. The PROPER way to do it at the end of the year is if any money is left over you take decrease the budget by 1/2 the difference for the next year.

      Even that is a bad idea. If not spending all the money means a decrease in future budget, however tiny that decrease is, there will be efforts to spend that money, even if it doesn't make sense. OTOH, money not spent is money not spent, even if it had been allowed to be spent.

      Indeed, it would make more sense to reward those who do not spend all the money, by increasing their next year budget. Of course that extra budget part should not be included in the determination if they were below budget (i.e. if they are above the normal budget, but below the increased budget, they don't get an increased budget next year).

      --
      The Tao of math: The numbers you can count are not the real numbers.
    5. Re:While I find this highly doubtful.... by robotkid · · Score: 3, Insightful

      Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.

      Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.

      Yeah, I used to wonder that too. Then my wife got a job in state government. And the answer became painfully obvious judging by the maximum pace at which stuff gets done even when you have people willing to work hard and important problems sitting right in front of you. If you allowed unspent money to roll over indefinitely, that would create an irresistible incentive to do the cheapest job that won't get you in trouble and then hoard, hoard that money. Heck, you could stretch that 3-year project into a 5-year one by doing it very slowly. You could build up a war chest and use it on pet projects that noone approved. Or you could wait till no-one even remembers the project existed anymore and then embezzle it.

      So as inefficient as it is, the blanket rule that all money must be spent the year in which it is allocated is a simple way to increase transparency and accountability across the board. It may even be one of the driving forces anything gets done remotely on schedule in an environment where purchasing a USB cable requires 2 requisition forms, 3 vendor quotes, the signature of your boss (who is in an all-day meeting), your boss's boss (who is talking with legislators today and can't be disturbed), and pre-approval from someone in accounting (who just went on vacation yesterday).

      Of course, it would be great if getting the job done on time and under-cost were somehow rewarded. But that's incentivizing success, that's the profit maximizing, the corporate bottom line, whereas the the Gub'ment bottom line is minimizing "embarrassment" (be it from the media, the voting public, and especially legislators on the appropriations committee). You use a Gub'ment bureaucracy for things you can't trust the for-profit world to do on their own, so the service provided has to be somewhat divorced from the revenue stream if you want to ensure more reliable results than just contracting out to a private company. (I'm sure Ron Paul would beg to differ, but then again he also probably enjoys being able drink water out of the tap without getting sick). You wouldn't pay a health inspector, for example, just based on the number of sites inspected per day because that encourages as cursory a job as possible on as many sites as possible. Instead, you set a minimum quota they have to fulfill, and then make it known you'll have their head on a platter if a restaurant shows up in the news for salmonella poisoning the week after you've signed off on it. That's the Gub'ment way. .. .

  17. As a cluster admin myself.... infiniband!!! by Fallen+Kell · · Score: 2

    I can not stress this enough. As good as 10gb ethernet is, the latency is still horrible compared to infiniband.

    As for distributions, really, that depends on what you are doing and how your current applications are built/designed. Rocks cluster is fairly nice. Unfortunately we have not been able to deploy that due to our FOSS policies, which have really been hurting this project. So we have a mixed Red Hat and Solaris cluster using Grid Engine.

    --
    We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
  18. Re:Imagine Beowulf of those! by blair1q · · Score: 2

    I was imagining partitioning it into an enormous brigade of heterogenous virtual machines, then hooking those up as a Beowulf cluster.

  19. Cluster software & GPU experence by PAPPP · · Score: 5, Informative

    I assume this is an epic troll, but am going to give an honest answer anyway, because there are some legitimate questions buried in there.

    I work with a aggregate.org a university research group which has a decent claim to having built the very first Linux PC Cluster, set some records with them (KLAT2 and KASY0 were both ours), and still operates a number of Linux clusters, including some containing GPUs, so I feel like I have some idea of the lay of cluster technology. It is *way* overdue for an update (and one is in progress, we swear!), but we also maintain TLDP's widely circulated Parallel Processing HOWTO, which was the goto resource for this kind of question for some time.

    In a cluster of any size, you do _not_ want to be handling nodes individually. There are several popular provisioning and administration systems for avoiding doing so, because every organization with a large number of machines needs such a tool. The clusters I deal with are mostly provisioned with Perceus with a few ROCKS holdovers, and I'm aware of a number of other solutions (xCat is the most popular that I've never tinkered with). Perceus can pass out pretty much any correctly-configured Linux image to the machines, although It is specifically tailored to work with Caos NSA (Redhat-like), or GravityOS (a Debian derivative) payloads. Infiscale, the company that supports Perceus, releases the basic tools and some sample modifiable OS images for free, and makes their money off support and custom images, so it is pretty flexible option in terms of required financial and/or personnel commitment. The various provisioning and administration tools are generally designed to interact with various monitoring tools (ex. Warewulf or Ganglia) and job management systems (see next paragraph).
    Accounting and billing users is largely about your job management system. Our clusters aren't billed this way, so I can't claim to have be closely familiar with the tools, but most of the established job management systems like Slurm, and GridEngine (to name two of many) have accounting systems built in.
    The "standard" images or image-building tools provided with the provisioning systems generally provide for a few nicely integrated combinations of tools, which make it remarkably easy to throw a functioning cluster stack together.

    As for GPUs... be aware that the claimed performance for GPUs, especially in clusters, is virtually unattainable. You have to write code in their nasty domain-specific languages (CUDA or OpenCL for Nvidia, just OpenCL for AMD) and there isn't really any concept of IPC baked in to the tools to allow for distributed operations. Furthermore, GPUs are also generally extroridnarly memory and memory bandwidth starved (remember, the speed comes from there being hundreds of processing elements on the card, all sharing the same memory and interface), so simply keeping them fed with data is challenging. GPGPU is also an unstable area in both relevant senses: the GPGPU software itself has a nasty tendency to hang the host when something goes wrong (which is extra fun in clusters without BMCs), and the platforms are changing at an alarming clip. AMD is somewhat worse in the "moving target" regard - they recently deprecated all 4000 series cards from being supported by GPGPU tools, and have abandoned their CTM, CAL, and Brook+ environments before settling on OpenCL, and only OpenCL. Nvidia still supports both their C

  20. Yes, this is legit and no, we're not idiots by Supp0rtLinux · · Score: 5, Informative

    For everyone that thinks I trolled slashdot... here's the quick backstory behind my question(s): Our organization received a grant to pay for this from a private philanthropist that has a medical issue that is currently being researched by one of our labs (this happens to us not to infrequently). We have an existing HPC of roughly 300 nodes and 1200 cores that's all 1Gbps connected and running Rocks 5.1. The grant money came in in two different payments. We used the first payment to buy the nodes (which are in route to arrive in 2 weeks or so). The second payment was going to pay for the GPU's and the extra infrastructure (storage is one thing we currently have plenty of... both SAN and NAS). Unfortunately, we hit two issues: 1) one of our more seasoned enterprise admins took a new job at Apple's new NC datacenter and 2) our cluster admin passed away from a heart attack about a week after the purchase was made. This put us into a bit of a holding pattern. We're in the process of replacing both of them, but in the meantime we A) have the equipment arriving soon and B) have the second round of the grant money in hand now. We're smart enough to know that we lost two very valuable resources and we decided to step back, pause, and re-evaluate. The servers are already bought. The infrastructure, interconnects, and GPU's are not. The old admin knew which GPU's he wanted; unfortunately we haven't found his research anywhere to know what and why. He had also planned to go with the latest release of Rocks, but only because he was very familiar with it. We know there are other options out there and we've no idea how well Rocks can scale. Additionally, I don't see an option for chargeback with Rocks (at least not from a Google search), plus we've heard they recently lost a core developer. Thus, we went to the Slashdot community for advice. So I've already seen some good info on the IB versus 10GbE question and its much appreciated. We're still looking for info on which Linux distro and which GPU to go for. We want to make the best decision we can and use the money as wisely as possible. But we also realize that we know what we don't know and thought the Slashdot community could provide some experience to help us make the right decisions.

    1. Re:Yes, this is legit and no, we're not idiots by rish87 · · Score: 2

      Okay apparently you aren't trolling but you have to understand people's suspicions. I understand you've lost key people, but still, these sorts of decisions are important for initial phases of the design that everyone should be aware of. A few suggestions: If you are running a lot of smaller parallel jobs that do most of the computation within the same node (more of a SMP parallel vice mpi) then you may get away without using 10gbe unless you are also moving a lot of data through the network for storage. If you are doing a lot of cross-node computation among a lot of different jobs, or especially in very large cross node jobs, you are going to want IB. IB is very expensive, but there is a reason almost all of the top supercomputers use it. Depending on your application, you may be able to get away with 10 gbe, especially if IB is too expensive. If you are adding GPU's (go with NVIDIA. throw teslas in there if you have the money) you will most likely want IB as well. HPC code I help develop has CUDA ability, and once you start to feed huge datasets to the GPU's across the network, you are going to need IB level speed and throughput. If you are only doing GPU computation within the nodes, this won't be necessary. Basically if money isn't an issue, go with IB and NVIDIA teslas. If money is an issue, GTX 580's and 10gbe will probably be fine. I would be hesitant on using anything less on the networking front. As for OS, take a look at scientific linux.

    2. Re:Yes, this is legit and no, we're not idiots by enjar · · Score: 2

      So it seems you are still far adrift. I'd seriously not spend another penny until you understand what you are really doing. Otherwise you could dump a serious pile of money on hardware that won't solve your problem ... and I'll bet if you look at what was wrong or you didn't like with your old HPC setup, you'll get the answers to lots of your questions.

      It seems odd that you got a grant for ... something but you still are trying to "recoup costs". Also, I do understand you lost two key people, but you didn't need some sort of business case, schematic, problem statement, architecture diagram or grant proposal that you could use to figure out the answers to some of these questions? If someone granted you $M for doing research for something, then it seems that you should be concentrating on doing that research first -- and figuring out what to do with any slack time otherwise. Perhaps instead of trying to charge someone money you could find other groups that do similar research and give them time on your cluster as a gift?

      As for real-world advice, keep in mind the "customers" of your cluster. I'm going to take a wild guess that they aren't geeks who want to play with Linux distros, they are likely researchers working on their research. It's also highly likely that they already know how to get their submissions into the cluster, analyze results, and so on -- they have a "workflow". Take care with disruptions to this workflow. Also, it might make a lot of sense to actually talk to the people who do this work and ask them their opinions on what they like and don't like about the existing setup -- which can be turned into requirements that can drive the spec for the remaining equipment. It's certainly going to give you a higher chance of success than asking Slashdot .. and you also have a golden opportunity to step up to fill a leadership void. That kind of thing gains you enormous credibility if you do it well.

    3. Re:Yes, this is legit and no, we're not idiots by hackstraw · · Score: 2

      If you want to hire me send a mail to hpc.hackstraw@spamgourmet.com. Expert in the field.

    4. Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · · Score: 5, Funny

      "I've got 1200 servers shipping to me and my two best engineers are gone and we're not sure what to do with them when they get here."

      Best. IT horror story. Ever.

    5. Re:Yes, this is legit and no, we're not idiots by byteherder · · Score: 2

      If you are serious, go the SuperComputing 2011 conference. Pretty much all the supercomputing geeks hang out there and you can get all your question answers by experts.

      As for whether to go with IB or 10GbE, go with IB if you can afford it. IB has a bunch of advantages faster bandwidth, lower latency, but you pay for it in price.

      Good Luck.

      byteherder

    6. Re:Yes, this is legit and no, we're not idiots by bill_mcgonigle · · Score: 2

      Steve Jobs gave you how much funding?!

      And then hired their sysadmin out from under them? No.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    7. Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · · Score: 2, Informative

      Frequently in academic settings this is not an option. Grant money for equipment is not transferrable to personnel.

    8. Re:Yes, this is legit and no, we're not idiots by KainX · · Score: 2

      The first thing you need to do is realize you all are in over your heads. If you're desperate enough to post to Slashdot for help, you're already there.

      The second thing you need to do is look for a consultant to help you out until you can hire permanent help to fill your vacant positions. I can strongly recommend R Systems (http://www.rsystemsinc.com/). It's run by former NCSA HPC gurus. I've worked with them many times; they have the know-how you need to salvage this mess in short order. You can't call them quickly enough; trust me on that.

      Third, to answer some questions. The IB vs. 10GbE debate has been pretty well covered, but just to emphasize: if you need low latency (for tightly-coupled massive parallel processing), you *need* IB. Preferably QDR or FDR. For your core switches, go for a blade-style chassis whose backplane can handle FDR even if you opt for QDR for now. If it can handle EDR, even better, but I'm not sure those are shipping yet. FDR IB data rate is 56Gb and latency in the nanoseconds. Ethernet can't touch that yet.

      All the scientists working with GPUs here are using nVidia. We've got 2050s and 2070s, so the 2090s are probably the right choice at the moment.

      For management, xCat is by far the most scalable solution available right now, though we're working on an alternative. ROCKS does not scale well, largely due to its stateful nature. I'd caution you against using Scyld ClusterWare; it's based on BProc AFAIK, and as one of my friends is the former BProc maintainer, I can tell you that even *he* won't touch it with a ten-foot pole any more. It's too hairy and errorprone; it's also almost impossible to debug. Use something stateless and powerful but still relatively easy to maintain. Most of the large-scale shops (national labs and large academic sites) I know of use xCat or Perceus. Here at LBNL we use both xCat and Perceus with great success.

      For Linux distribution, using RHEL or a clone. I'd recommend Scientific Linux 6 at this point. It's the best-run and most professionally-maintained of all the clones.

      HTH. Good luck, and condolences on your recent loss(es).

      --
      Michael Jennings | HPC Systems Engineer, Lawrence Berkeley National Lab | Author, Eterm (eterm.org)
    9. Re:Yes, this is legit and no, we're not idiots by cherry-blossom · · Score: 2

      Mod this up. This sounds more like a personnel problem than a hardware/software problem. Get the right people into those vacant positions and let them make the decisions for you. Don't spend any money on hardware or software until those positions are filled.

    10. Re:Yes, this is legit and no, we're not idiots by DarwinSurvivor · · Score: 2

      Did you just ask for a job while posting as Anonymous Coward and THEN ask them to post their email as a public reply to it?!?

    11. Re:Yes, this is legit and no, we're not idiots by afidel · · Score: 2

      The 2050 is what HP uses in the SL390 cluster configuration because they can actually cool and power 8 of them in a 4U enclosure, since the M2070 has the same power draw it should be capable of the same density.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    12. Re:Yes, this is legit and no, we're not idiots by b30w0lf · · Score: 2

      Speaking as one of the people you will (temporarily) be supplanting, sounds like you have a tough spot to get through.

      I also admin life sciences clusters for a major university on the east coast. I'm going to assume that our workloads are going to be fairly similar (R, matlab, blast, HMMER, IDEA, maybe some mutual information codes, sequence alignment, etc.). If that's not the case, some of this advice may be off.

      So, a couple of things:

      - I think CentOS is a good idea for a cluster platform. I do not think Rocks will scale like you want it to to that size, and it's really not terribly flexible either. Let's put it this way, I often find that I could have just built from scratch by the time I get Rocks to do all the customization I need. We run Rocks on small clusters, but big ones we spin ourselves (e.g. CentOS, or sometimes Fedora + Kickstart + some utility scripts and a scheduler... we use SGE, now OGS). Finally, stay away from more fringe distributions. You'll find that commercial software vendors are pretty quick to let you know they just don't support running their software on XX distribution. There are other reasons too. I posted a bit of a rant on this a while ago at: http://slashdot.org/comments.pl?sid=2188634&cid=36255670
      - Infiniband vs. 10 Gbps. Well, InfiniBand is cool, and I've spent a lot of time working with it. I once had a project that involved writing some early stage block level storage protocols for InfiniBand... really, I like InfiniBand. That said, unless you plan to run a lot of MPI enabled MD simulations like Desmond, skip the IB and get 10 Gbps. There are a couple of exceptions to that rule, but most life sciences applications do not use MPI, and most of your traffic is going to be storage I/O. Depending on your storage solution, it's probably not InfiniBand enabled (in the front-end anyway, and you really don't want to be running IP over IB if you can help it). To say more I'd have to know a bit more about what you're going to be running.
      - GPUs. One thing sticks out to me a lot here. If you don't know which GPUs to get, that probably means no one has ported anything to GPU yet. If someone has done some porting, you should ask them what they ported to. If they ported to CUDA, you should probably be looking at 2050s or 2070s. If they haven't ported anything, and they don't have (good!) GPU ported applications... don't waste money on too many GPUs. We've run a couple of pilots where we tried to get people using GPUs, and here are a couple of observations: 1. most researchers can't/won't do the porting; 2. most pre-built applications, such as matlab and R _still_ require you to port the matlab, R, etc. code, which researchers will probably also not do; 3. some life sciences algorithms just don't work well on GPUs (e.g. they are branch-heavy or memory I/O heavy algorithms); 4. many of the pre-built GPU applications for life science are terrible (I know a particular sequence alignment tool, for instance, that is proud of it's 4x speedup over a single CPU... do the math... which costs more, a quad core CPU or a tesla?). GPUs can be great, but buy them sparingly at the beginning and integrate them as they are actually being used. If you're buying now you should be buying CUDA (i.e. NVidia). It's the only actual mature development kit (though I don't like that it doesn't let you control the scheduling on the card... but I digress).
      - Chargeback: So the bottom line is nothing is going to give you chargeback without some effort. You're going to have to manage that on your own. The best way to do it is to setup some basic accounting scripts that will dig your cluster logs (or database, depending on your configuration) and generate accounting reports. Note that it's the resource manager/policy manager (e.g. OGS, Torque/Maui, etc.) logs that you're going to do this with. You _could_ do it with Rocks as well as anything else (but again, I don't suggest Rocks for this project).

      Sounds like you have a fun project ahead of you... good luck!

    13. Re:Yes, this is legit and no, we're not idiots by Sgs-Cruz · · Score: 3, Interesting

      Are you at MIT and is your benefactor David Koch? Because in that case, we have some researchers up at the Plasma Science and Fusion Center that do simulation work that could definitely use access to a bigger cluster. As long as you can compile FORTRAN on it, the TRANSP runs and GYRO simulations that we do are already run on a (smaller) cluster. This falls under "energy research" and is way cool to boot.

      I'm not joking, if you are at MIT, please get in touch with Martin Greenwald (contact info on the PSFC staff page).

      --

      Karma: pi (Mostly due to circular reasoning in posts).

  21. Ditto On Redhat, w/PBS by cmholm · · Score: 2

    This is what the biggest USAF compute cluster uses (RH, PBS), the main difference being that it does include IB because MPI support was a requirement (and is used). Otherwise, you'd better hope your users' jobs are almost exclusively embarrassingly parallel. The cluster is based on Dell PowerEdge blades, which provided good mflop/$.

    They're playing with full size Tesla GPU cards in one of the blades. I'm not sure what will give you the best bang for the buck: Tesla/Fermi/FirePro cards in-blade, or the Nvidia 1u chassis that'll allow you to share the GPUs among several CPU blades/chassis. As of last year, there was a bit more overhead using OpenCL compared to CUDA on Nvidia h/w, but it does open up your h/w options Nvidia v. AMD.

    --
    Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.
  22. Use IB, CentOS or SciLinux, and xCAT (xcat.org) by datajerk · · Score: 2

    IB is faster and cheaper than 10GE. Unless you get 10GE from your IB vendor.

    All IB solutions support RH distros fairly well, so I'd stick with RH-like or RH-proper. CentOS has been our x86 Linux reference platform for xCAT development.

    Use xCAT for cluster management and use xCAT's stateless provisioning (no need for local HDs). With xCAT we were able to provision the fastest system in Canada (~4000 nodes) over 40:1 blocking GigE in 8 minutes (but we had 10 10GE-based service nodes). xCAT was also used for the first 1.0 and 1.1 Petaflop system (LANL Roadrunner).

    For billing and chargeback consider Moab with Gold. If you use Moab with xCAT and stateless provisioning, then you can power up nodes on demand and power them down automatically when not in use and track/bill one energy usage. You also have the ability to specify different OS loads on-demand so that your system can be more of an HPC cloud and not just a static homogeneous cluster. Lastly xCAT can support KVM if you want to throw a few VMs in there as well. Oh, and if get the itch to use Windows, xCAT supports that too.

  23. Re:SETI ! by Jarik+C-Bol · · Score: 3, Insightful

    screw SETI, run folding@home and find the cure for cancer. We need that a little more than we need to stare at the sky, wishing someone would call from alpha centauri or some such place.

    --
    I've decided to Diversify my Holdings. I've divided my cash between my left and right pockets, instead of all in one.
  24. OS, duh! by ThurstonMoore · · Score: 3, Funny

    The obvious answer is Windows Server 2008 HPC.

  25. What? by sycodon · · Score: 2, Insightful

    Isn't this shit you should have had all figured out before you even applied to whatever company, agency, government, etc, you got the money from?

    WTF is this? I can only hope you didn't get money from the feds.

    "Hey, look! The feds gave me a shit load of money to get this cool super computer...what should I do with it?"

    Seriously...if you got any government money for this then you are first class tool for not having all of this known before you even applied.

    --
    When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
  26. BS on "largest cluster" by sl3xd · · Score: 2

    I have to wonder what you're on the east coast of. East coast of Madagascar? I work in HPC; a thousand nodes just isn't that much. We sold larger clusters than that four years ago.

    --
    -- Sometimes you have to turn the lights off in order to see.
  27. rhel 6 , xcat , slurm , IB by stenWolf · · Score: 2

    " So, what's the best Linux distro for something of this size and scale?"
    RHEL6
    I would have suggested SL6 but their guy just left for RHEL, and CENTOS is still playing catch-up. since you obviously have money, go with best supportedOS.
    "Additionally, due to cost contracts, we have to choose either InfiniBand or 10Gb Ethernet for the backend: which would Slashdot readers go with if they had to choose?"
    IB, hands down.
    The main issue isn't even the bandwidth (which is 40Gb/s compered to 10Gb/s) - it's about latency and RDMA for whatever MPI you'll use.
    "Any suggestions on the most powerful Linux friendly PCI-e GPU available?"
    Go with nvidia tesla. Every self respecting HW vendor has them as an option for either blades or rackmounts now-a-days.

    Manage the whole thing with xCAT, schedule your jobs with slurm (wouldn't touch moab, now, can't justify the cost),personally I'd focus on openmpi (intel compiled, not gcc) with blcr checkpointing.
    You can set the entire SW stack in 4-6 hours if you know what you're doing.

  28. Re:SETI ! by jprupp · · Score: 2

    Screw folding@home, give me that cluster so I can make some Bitcoin!