Ask Slashdot: Best Use For a New Supercomputing Cluster?

SETI ! by Anonymous Coward · 2011-09-13 09:35 · Score: 1

help SETI !!

Re:SETI ! by stox · 2011-09-13 10:25 · Score: 1

That is exactly what we used to burn in the first SGI Origin 2000, and later the first few thousand nodes of a Linux cluster at Fermilab.

--
"To those who are overly cautious, everything is impossible. "
Re:SETI ! by Jarik+C-Bol · 2011-09-13 13:49 · Score: 3, Insightful

screw SETI, run folding@home and find the cure for cancer. We need that a little more than we need to stare at the sky, wishing someone would call from alpha centauri or some such place.

--
I've decided to Diversify my Holdings. I've divided my cash between my left and right pockets, instead of all in one.
Re:SETI ! by thesh0ck · 2011-09-13 14:19 · Score: 1

40-100gb backend or you will be left in the dust.
Re:SETI ! by jprupp · 2011-09-14 01:02 · Score: 2

Screw folding@home, give me that cluster so I can make some Bitcoin!

So little detail by Anonymous Coward · 2011-09-13 09:35 · Score: 0

You say all this, but you don't even say what group you're associated with. Is this even real?

Re:So little detail by oobayly · 2011-09-13 09:41 · Score: 2, Funny

Indeed, it's a bit like somebody writing in to Dear Deirdre and saying "I've a 13 inch cock, how can I make girls aware of this, and what's the best way to make use of it?"
Re:So little detail by webmistressrachel · 2011-09-13 09:47 · Score: 3, Interesting

No it's not, some really ugly, nerdy guy out there has a big cock and nobody is interested in him - he can't just flop it out in public, so that might be a very real problem for him! Or maybe he does, and girls only want him for that?
Back on topic, it's not like that at all because the computer is probably real, and if not, it's just another hypothetical "Ask Slashdot" for us to fantasize over. "What would you do if you had...". What's wrong with that? Just my 2 pence!

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:So little detail by mSparks43 · 2011-09-13 13:28 · Score: 1

True or not, its perfectly plausable, for one simple reason. There may well be better platforms (GPU/CUDA etc) to get the flops needed. But they already have a platform that's working, and are simply extending it to meet demand, with a load of spare capacity built in (e.g. they only expect to use 800 units in the next twelve months, but they want room to grow and 1200 won't be much difference in price than 800 so might as well get rid of the cash now, it's all a tax write-off anyway and I'd guess they're cash rich. Switching to a new codebase would bring to much delay as the code is reworked (it could be twelve months before they get a CUDA system working, and they need it spitting out answers next month).
That said, it sounds like going with infiband is the better choice, since the latency is lower it will reduce the time spent waiting for results to propagate across nodes. That said, 10G should "just work" would infiband require further development (assuming nodes are currently referencing each other by IP)?
Re:So little detail by DeathElk · 2011-09-13 15:09 · Score: 1

I do love a saucy webmistress ;)
Re:So little detail by webmistressrachel · 2011-09-13 22:50 · Score: 1

Why thank you! I do love an excuse to post reasonable, insightful yet saucy comments here...

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen

Lost some funding? by turkeyfeathers · 2011-09-13 09:35 · Score: 5, Funny

Start with the cheapest backend that'll get the system up and running, then use your supercomputer to mine Bitcoins for a few days, then use all the money you'll make to buy the InfiniBand backend (you'll probably have enough money left over to buy Monster cables to hook everything up).

Re:Lost some funding? by webmistressrachel · 2011-09-13 09:52 · Score: 0, Troll

Monster cables are only worth the investment for speakers and line-level / mic stuff (i.e. analogue signals). Having a Monster-cabled computer network would be no better than having the generics of the same cables.
We all know this - and MP3 and the so called "average listeners" (people who buy Britney Spears) have ruined the hi-fi industry with their cable sarcasm. Yes, MP3 will sound crap on a Monster cable, too. But 44.1KHz 16-bit sound, converted to analogue in the transport and sent to the amp via line leads WILL benefit from Monster / premium cables, as will speaker cables of any kind.

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:Lost some funding? by sexconker · 2011-09-13 09:55 · Score: 1

He won't even have GPUs initally.
CPU mining is about as productive as mining for (physical) gold with a toothpick.
GPU mining is slowly becoming less viable since people have rolled out FPGA miners.
And if anyone ever decides to go full-blown ASIC, mining will be dead for 99.9999999% of people.
More on point: This seems like a retarded cluster if you haven't built it with GPU shit initially in mind. It's not like CUDA/STREAM/DirectCompute/OpenCL are hard if you've already got the highly parallel algorithms and workloads in the first place. Dude's got 1200 boxen that could be outdone by 40 GPUs in 10 boxen for the vast vast vast majority of workloads. The time saved in setup and management, as well as the cost with regards to the communication backplane, power delivery, etc., could easily pay for a few programmers to rewrite any legacy code to be GPU-ready.
And with AMDs 7000 series coming out, the energy cost / performance would be a trifle. A trifle!
http://lenzfire.com/wp-content/uploads/2011/09/AMD-Radeon-7000-22.jpg
Look at that shit! LOOK AT IT!
Re:Lost some funding? by Liambp · 2011-09-13 10:08 · Score: 1

So you are saying that anyone who cannot hear the difference between standard cables and expensive premium cables must be a Britney Spears fan?
I think the Emperor isn't wearing any clothes.
Re:Lost some funding? by KZigurs · 2011-09-13 10:23 · Score: 1

To be fair some of Britneys recordings are (actually indeed) exceptionally well mastered. You might not like her music, but if you are a proper audiophile you will still enjoy it.
Re:Lost some funding? by Anonymous Coward · 2011-09-13 10:30 · Score: 3, Informative

Maybe the mods are a little more aware than you of the engineering and scientific FACTS about Monster Cable. Some things that you said:

Monster cables are only worth the investment for speakers and line-level / mic stuff (i.e. analogue signals). [...] But 44.1KHz 16-bit sound, converted to analogue in the transport and sent to the amp via line leads WILL benefit from Monster / premium cables, as will speaker cables of any kind.
are, I'm afraid, complete nonsense. Counterfactual, in fact. And yes, there's real science to support that. Let me gloss over it...
A 44.1 kHz sample rate before the DAC means the maximum frequency component the cables need to handle is 22 kHz. (This is due to the Nyquist limit, as in the Nyquist-Shannon Sampling Theorem.) 22 kHz is low. Really low. Practically any old piece of wire can carry audio frequencies with perceptually flat response across the audible range and nearly no loss as long as the cable lengths are as short as they are in a typical home stereo system. The only thing you need is large diameter wire for your speaker cables to ensure they're very low resistance so that the higher currents involved in powering a speaker don't cause resistive loss in the cable.
As for low-power line level signals (such as CD player to amp), the most likely source of problems is actually ground loops, where the source equipment has a different ground reference than the destination. (A lesser concern is interference.) The pros don't solve this with stupid Monster Cable, they solve it by using pro equipment with balanced (differential) signaling, which both eliminates the need for the source and destination to have a common ground and provides some noise immunity.
For home stereo systems, however, making sure that everything is grounded to the same point (3 prong plugs all plugged into a single grounded power strip) is generally good enough, and noise is rarely (if ever) a significant problem.
Re:Lost some funding? by chargersfan420 · 2011-09-13 10:41 · Score: 1

It appears with posts like this, that perhaps the opposite of your signature is also true.

Off-topic I know, but sorry, I couldn't resist.
Re:Lost some funding? by leenks · 2011-09-13 10:46 · Score: 1

Let me guess... you bought the green pens (or stick on rims) for the edges of your CDs too?
Re:Lost some funding? by Toonol · 2011-09-13 10:50 · Score: 1

Protestations notwithstanding, I still think you were trolling. That's a kindness, by the way; I'm generously assuming you don't actually believe what you're claiming.
Re:Lost some funding? by webmistressrachel · 2011-09-13 10:55 · Score: 1

No, I didn't. Even the best (Informative) reply to my post still concedes that decent speaker cable is necessary, and doesn't disagree about MP3...

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:Lost some funding? by Known+Nutter · 2011-09-13 11:01 · Score: 1

You're right - should've been Offtopic.
Nobody wants another monster cable tennis-match.

--
Beware of the Leopard.
Re:Lost some funding? by webmistressrachel · 2011-09-13 11:04 · Score: 2

To be fair, some people make very good recordings of power tools! And if I'm a "proper" audiophile I'll still enjoy it, will I?

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:Lost some funding? by poity · 2011-09-13 11:04 · Score: 0

I would have modded you off-topic had you been modded up previously. This post of you assuming that truth-hating mods can only come from one country, I'd have modded troll. But since you're resting at threshold 1, I leave you alone. Next time, don't be pedantically off-topic and don't jump to trollish conclusions. Thanks and enjoy slashdot.

--
your thin skin doesn't make me a troll
Re:Lost some funding? by Radworker · 2011-09-13 12:58 · Score: 2

Not to mention balanced input power and line conditioners where appropriate. Audiophiles can go to extremes to get that last 2%.
Re:Lost some funding? by Doc+Ruby · 2011-09-13 13:11 · Score: 1

You are at least as big an idiot as they are.

--
--
make install -not war
Re:Lost some funding? by Anonymous Coward · 2011-09-13 13:53 · Score: 0

Two whole percent? Some audiophiles would tear off their firstborn male child's limbs and beat their grandmothers to death with them for that sort of improvement. Actions as mild as "extremes" are the province of fractions of a percent.
Re:Lost some funding? by Anonymous Coward · 2011-09-13 13:56 · Score: 0

He said mastered, not recorded. There is a difference. a BIG difference. Anyone can throw a mic in front of something and record it. But not everyone can bring out the true nature of the music and have the listener really feel the feeling in the room at the time of recording. Britney is a bad example, but yes, I bet I can make a WAYYYY better "recording" of a power tool than you. The mix matters about as much as the performance. The mastering is where the magic happens as long as you are not afraid.
Re:Lost some funding? by Halo5 · 2011-09-13 15:02 · Score: 0

It's not a supercomputer without Monster cables... ;)

--
665: The mark on the forehead of Satan's slightly less evil brother, Stan.
Re:Lost some funding? by JanneM · 2011-09-13 16:51 · Score: 1

GPU-based machines' effectiveness depends a lot on the precise type of work you want to do. If each node can run largely independently, with little data exchange and with their working data set stored locally, they are really fast. If you need a lot of communication between nodes (simulating a neural network with large fan-in/fan-out for instance) then your savings are much more modest.
And you need to account for the time spent developing as well. Not much use saving a week of computer time using a GPU solution if you have to spend three extra weeks in development. It is somewhat telling that at the place where I do my large-scale computing work the regular clusters are always fully utilized whereas the GPU-based cluster sits idle much of the time.

--
Trust the Computer. The Computer is your friend.
Re:Lost some funding? by ls671 · 2011-09-13 17:40 · Score: 2

Maybe he is, I have always assumed that on Slashdot, nicknames like "webmistressrachel" could very well be owned by males ;-)

--
Everything I write is lies, read between the lines.
Re:Lost some funding? by tombeard · 2011-09-13 17:55 · Score: 1

I would be more interested in if my spectrum analyzer could tell the difference. I know how I would bet and it wouldn't be with the audiophile.

--
The reason we subjugate ourselves to law is to better procure justice. If law does not accomplish this purpose then it m
Re:Lost some funding? by Arrepiadd · 2011-09-13 20:40 · Score: 2

I work in computational chemistry and there's currently two or three codes out there using the GPU. Granted that number will only increase, but at this point having GPUs is almost useless (these codes don't do 10% of what other codes, or a combination of them, can do.
Your mileage may vary, but assuming someone is a moron just because he isn't doing what fits you perfectly is moronic itself.
Re:Lost some funding? by Outtascope · 2011-09-13 21:59 · Score: 2

I would be more interested in if my spectrum analyzer could tell the difference. I know how I would bet and it wouldn't be with the audiophile.
Why does everyone keep spelling alchemist incorrectly?
Re:Lost some funding? by webmistressrachel · 2011-09-13 22:58 · Score: 1

Since when did I suggest that I could make a better recording of a power tool than anybody? I was merely comparing, directly, the unpleasantness of a good recording of a power tool with the unpleasantness of having to listen to Britney "singing".
Now you're just waving your e-peen - nah, not even that, your wallet. "I bet I can make a WAYYYY better 'recording' of a power tool than you". Well, duh! Go right ahead, knock yourself out!

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:Lost some funding? by Skweetis · 2011-09-14 01:28 · Score: 1

'Decent speaker cable', according to the reply to your original post, is simply cable of sufficient diameter to lower resistance. This is correct -- any two conductors of sufficient size will work fine in this application. Induced noise isn't an issue at the voltage levels required to drive a loudspeaker, so no shielding is required (or desired -- shielded cable would introduce capacitance issues that would potentially cause your amplifier some distress). I do sound reinforcement for extra cash sometimes; I have personally used two sets of booster cables and 500 feet of barbed wire fence as a speaker 'cable' for an outdoor event. I mostly use bulk lamp cord in normal situations. Monster Cable is largely unnecessary and overpriced (and, frighteningly, is generally regarded as low-end among the cork-sniffing segment of the pro-audio world).
Re:Lost some funding? by webmistressrachel · 2011-09-14 03:25 · Score: 1

I'm sorry to drag you into all this; I was just trolling... the Monster / premium cable part should (And did!) give my game away. MP3 is still evil, though.
- Rachel

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen
Re:Lost some funding? by Anonymous Coward · 2011-09-14 04:34 · Score: 0

Answering your other question, you'll want AMD GPUs for integer processing or Nvidia GPUs if you do a lot of floating point work. If you've got the budget, time, long-lived applications, and developers that can make them go, then FPGA array cards will outperform either of the above.
Re:Lost some funding? by pseudofengshui · 2011-09-14 07:09 · Score: 1

OMG I'M LOOKING
MY mind has been BLOWN, sir!

--
[Text goes here]

Best Use For a New Supercomputing Cluster? by Jodka · 2011-09-13 09:35 · Score: 1, Funny

Generating Bitcoins

--
Ceci n'est pas une signature.

Re:Best Use For a New Supercomputing Cluster? by Monkey-Man2000 · 2011-09-13 09:39 · Score: 1

LOL, where's my mod points when I need them. The bitcoins will help offset the energy consumption I'm almost sure.

--
This post was generated by a Cadre of Uber Monkeys for Monkey-Man2000 (603495).
Re:Best Use For a New Supercomputing Cluster? by turkeyfeathers · 2011-09-13 11:32 · Score: 1

The bitcoins will help offset the energy consumption I'm almost sure.
The poster's boss pays for the electricity, the poster keeps the bitcoins... it's the perfect victimless crime.
Re:Best Use For a New Supercomputing Cluster? by Anonymous Coward · 2011-09-13 12:01 · Score: 1

The bitcoins will help offset the energy consumption I'm almost sure.
The poster's boss pays for the electricity, the poster keeps the bitcoins... it's the perfect victimless crime.
Nope. About ten or twelve years back a sysadmin got arrested for running dnetc on the servers he managed. He was accused of causing about $600,000 in damages. This was back in the day when idle cycles used just as much power as processing an instruction. I can't find a link, sorry. The same sort of thing would apply today.
Re:Best Use For a New Supercomputing Cluster? by Anonymous Coward · 2011-09-13 13:24 · Score: 0

The bitcoins will help offset the energy consumption I'm almost sure.
The poster's boss pays for the electricity, the poster keeps the bitcoins... it's the perfect victimless crime.
It's worthless too, so I wouldn't call it "perfect".
Re:Best Use For a New Supercomputing Cluster? by BluBrick · 2011-09-13 13:59 · Score: 1

The bitcoins will help offset the energy consumption I'm almost sure.
The poster's boss pays for the electricity, the poster keeps the bitcoins... it's the perfect victimless crime.
Not quite, bitcoin mining is perfectly legal. I think what we have here is a crimeless victim.

--
Ahh - My eye!
The doctor said I'm not supposed to get Slashdot in it!
Re:Best Use For a New Supercomputing Cluster? by maxwell+demon · 2011-09-13 17:26 · Score: 1

It depends. Mining bitcoins on your own equipment using energy you pay yourself is certainly not illegal (at least not that I know of), however mining bitcoins on computers someone else (like the university you work at) owns may be a bit different. Unless you have explicit permission by the owner to do that, of course.

--
The Tao of math: The numbers you can count are not the real numbers.
Re:Best Use For a New Supercomputing Cluster? by julesh · 2011-09-13 19:13 · Score: 1

About ten or twelve years back [...] when idle cycles used just as much power as processing an instruction
Bullshit. Intel processors have been using lower power when idle since at least the 486DX2 models (which switched down their clock multipliers when idle), and the optimizations have only been getting more aggressive with time.
Re:Best Use For a New Supercomputing Cluster? by Thud457 · 2011-09-14 02:56 · Score: 1

I'm pretty sure Superman took away Richard Pryor's red stapler for just that kind of misbehavior.

--
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

I call Shenanigans!!! by sconeu · 2011-09-13 09:36 · Score: 5, Insightful

No way in hell a project that big gets approved without a rationale.

And no way in hell the administrator of such a project would ask Slashdot what to do with it.

--
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.

Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 09:44 · Score: 2, Informative

Truth!
Two weeks away and still at the “thinking of cool shit to use it for” and “picking out hardware” stages? How does that even happen? Is this some kind of tax scam to burn as much money as possible?
I get that the submitter already have a primary use... but I imagine if I was ever given that kind of budget I’d probably have to account for every CPU cycle every hour of the day (especially since I’m a programmer and should have no business with something like this ;p). I can’t imagine a budget for something like this comprised of “and hopefully we’ll be able to recoup the millions of dollars by leasing it out to some TBD people”.
Also, the first person to mention bitcoin as an option gets to have their teeth rotated. I’m not joking.. we will find you..
Re:I call Shenanigans!!! by Amouth · 2011-09-13 10:00 · Score: 1

agreed - was just about to ask who was stupid enough to let someone buy that much hardware without an existing project/plan in place. and how can i get them to fund me and my start-up (don't have one now but you bring the cash i'll figure out something to do with it)

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 10:06 · Score: 2, Funny

Yes, it probably is a tax scam. It is now the US Federal Year End. Someone wrote a really good funding proposal and got it approved to get money for a HPC cluster to do *something*. Doesn't really matter. The grant application will have focused on broad ideas like # of cores and what not and not the details. A bit surprising that the network wasn't spec'd because that is such a major cost item, but whatever, maybe the grant application's work loads are not network bound.
So, now that the money is approved the task to build the thing falls to the inexperienced IT group who make all kinds of dumb choices, then will claim they are massively over worked/underfunded trying to get the thing to run and end up with shitty performance and a lot of wasted time and money. Oops. Your Money At Work.
The system you spec'd should be around the 250TF range, if you set it up properly with QDR IB and do all the work to get MPI optimized. If you are good. The correct way to design the network is to match a 36 port IB switch with 18 servers, and then correctly spread the resulting 1200 uplink ports across a pair of 686 port core switches. The cost of IB cables alone will be shocking and you'll regret not using a HPC blade server arrangement for this.
Considering the questions you are asking, you should have gone with an HPC focused integrator that could provide the full system for you. 4 GIGE's on every box? Waste of money. The IB equipped blade designs from SGI/Bull, etc are very nice, space and power efficient and much more cost effective. They even come with a pre-integrated and tested OS ready to go, working boot over IB and other long term cost saving features.
Gosh I hope you bought a few PB of HPC focused storage as well, otherwise you won't find anyone who can even use your machine for their problems.
Re:I call Shenanigans!!! by AdamHaun · 2011-09-13 10:08 · Score: 2

It did have one. Right there in the submission:

We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs.

--
Visit the
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 10:16 · Score: 0

If you mine bitcoins with CPUs, you will spend more money on electricity than you will earn in bitcoins. Only a select few AMD GPUs can profitably mine bitcoins these days. Every other CPU, GPU, or what-have-U will be a financial loss due to the higher bitcoin "difficulty factor" and the lower bitcoin dollar value.
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 10:20 · Score: 0

Get the brooms!
And no way in hell someone buys 1200 nodes without buying (or knowing) the interconnect, the OS, or the scheduler. And you don't buy nodes with plans to someday put in GPGPUs (note - it's not GPUs, but GPGPUs for HPC computing). I architect these all the time and I can't believe that an entity has actually spent this type of money without knowing this much...
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 10:21 · Score: 0

No way in hell a project that big gets approved without a rationale.
And no way in hell the administrator of such a project would ask Slashdot what to do with it.
My thoughts exactly!
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 10:29 · Score: 0

Daddy. Some people have mighty big trust funds/rich parents. I'm this close to having built that kind of wealth myself :) . now.... I'll just have to figure out what to do with it. Ha. And I just got my first apartment. My first plan of action is to buy a place of my own. I'm 26 and my business just turned 3 years old. I am bringing in about $300,000 / yr now. Not that you would know it from my tax returns! :) yet... anyway. Six months jesus christ only knows what that number is going to turn into. We've only touched the tip of what is possible. We're hitting basically two continents and only one language. Next up! Multiple continents and multiple languages! Not to mention new products will be out to take advantages of our current markets.
Re:I call Shenanigans!!! by geekmux · 2011-09-13 11:10 · Score: 1

It did have one. Right there in the submission:

We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind). Additionally, we'd like to lease access to recoup some of our costs.
Ah, stating an existing purpose on "fairly small" hardware vs. the justification to spend for the "largest x86_64-based supercomputer on the east coast of the U.S." are several orders of magnitude away from each other (and common sense for that matter). Sorry, but I'm calling Shenanigans on this too.
And if this turns out to be true, then I don't give a shit what they do with the HPC. I want to meet the person who managed to get this expense approved with basically little or no justification behind it, for that is far more amazing to me than the hardware.
Re:I call Shenanigans!!! by kcitren · 2011-09-13 11:18 · Score: 1

Government year end money; use it or lose it. I've seen this happen before, they're a few hundred thousand laying around allocated to hardware acquisition. They need to spend it fast, so they find something related to what they do and get something newer, bigger, and better...
Re:I call Shenanigans!!! by corbettw · 2011-09-13 11:23 · Score: 1

Well, it's on the east coast, so there a are a few possible culprits who come to mind who might do just that.

--
God invented whiskey so the Irish would not rule the world.
Re:I call Shenanigans!!! by DrgnDancer · 2011-09-13 11:23 · Score: 2

Also who the Hell buys hardware like this without vendor support? OS and backend choices should have been part of integration from the vendor. No one buys 3000 rack mount servers, a bunch of switches, some racks and some storage and builds "the largest x86_64-based supercomputer on the east coast of the U.S."
OP, if you are in anyway serious about this stop now. You don't want the largest supercomputer on the East Coast, you want a computer that works. Call SGI, IBM, Cray, or even (ewww) Oracle/Sun and get them to sell you a smaller system with full integration support. Trust me, I've done HPC, been there, done that, literally got the t-shirt (several of them). Even with integration support you'll be lucky if the vendor gets the thing up and running on schedule and as advertised. There's a thousand blades, 200 switches switches, a million cables... get the experts to figure out integration and you'll be a much happier camper.

--
I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 11:31 · Score: 1

Really?
http://www.itnews.com.au/Tools/Print.aspx?CIID=260608
"Victoria's auditor-general has ignited a storm over the state's $100 million life sciences computing facility... The Auditor-General's conclusion was based on an apparent lack of scoping and cost-benefit analysis undertaken by the University."
Re:I call Shenanigans!!! by Razed+By+TV · 2011-09-13 11:48 · Score: 1

No way in hell, indeed. Everything about this is stupid. Take the cost, spending 1200 servers * 2 cpus each * at least $200 and you're singing $480,000, not including the 1200 servers themselves which I'm going to lowball at $200 each because I don't feel like newegging it, and you get $720,000. If you really had a lot money to spend in a short period of time, is this the first thing you would think of to squander it on? Do you already have 8-core desktops with dual 50" HDTV's as displays? How's all your lab equipment? I'm pretty sure you wouldn't half ass a supercomputer like this.

And if this really is supposed to be a serious submission, the time to ask all of those questions was *before* you started ordering parts.
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 12:34 · Score: 0

No way in hell a project that big gets approved without a rationale.
aww. :) hey everyone, look at this cute little slashdot post! so full of youthful optimism ;)
*ahem*
The project has a rationale, but you missed it. Look again:
"In about 2 weeks time I will be receiving everything necessary to build the largest x86_64-based supercomputer on the east coast of the U.S. (at least until someone takes the title away from us).
That's it. 'The title' is the rationale. Supercomputing is mostly just an exercise in penis comparison these days, anyone who actually has real computing to do just gets the fastest SMP box they can afford and gets on with it.
Re:I call Shenanigans!!! by DrgnDancer · 2011-09-13 12:46 · Score: 1

My wife, with no HPC experience at all ever, just read this and said "Either this guy's an idiot or this is bullshit." I just thought the /. community should know :-)

--
I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
Re:I call Shenanigans!!! by ae1294 · 2011-09-13 13:11 · Score: 1

And if this turns out to be true, then I don't give a shit what they do with the HPC. I want to meet the person who managed to get this expense approved with basically little or no justification behind it, for that is far more amazing to me than the hardware.
Her name is Candy and she works really hard for that hardware.
Re:I call Shenanigans!!! by ae1294 · 2011-09-13 13:12 · Score: 1

My wife, with no HPC experience at all ever, just read this and said "Either this guy's an idiot or this is bullshit." I just thought the /. community should know :-)
Ask her the hotel room number for me...
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 13:32 · Score: 0

Smart woman. Does she have any stock market tips?
Re:I call Shenanigans!!! by Doc+Ruby · 2011-09-13 13:40 · Score: 2

They are buying a supercomputer because their lucrative medical research is too big for the smaller HPC, but not (yet) big enough for the biggest supercomputer of its type in the region. So they're also looking for some other apps to use the extra capacity instead of it going to waste.
That might not be true - this is just a Slashdot assertion. But there's nothing inconsistent in there to suggest it's false. It's perfectly plausible.
You are just one of the modern type of people who make up your mind on your preconceptions, say something out loud, then refuse to listen to any reason you could be wrong or might reconsider. Denial feels so powerful, who cares what's true, right?

--
--
make install -not war
Re:I call Shenanigans!!! by Doc+Ruby · 2011-09-13 13:41 · Score: 1

You are lying. "GPGPU" is the technique of General Purpose computing on Graphic Processor Units. Nobody installs "GPGPUs"; there is no such hardware called that. People install GPUs to do GPGPU.
Sure, you architect the largest East Coast x86_64 supercomputers all the time. Bullshit.

--
--
make install -not war
Re:I call Shenanigans!!! by Doc+Ruby · 2011-09-13 13:43 · Score: 2

Whether or not this is a true story, or whether or not it's a government project, there is as much budget-reserving in private industry like what you described as there is in government. Probably more, since government is more transparent than private business, and so more people have access to exposing that little game, which tends to inhibit it some.

--
--
make install -not war
Re:I call Shenanigans!!! by Nimey · 2011-09-13 13:55 · Score: 2

Modern? Your faith in your elders is cute.

--
Hail Eris, full of mischief...

E pluribus sanguinem
Re:I call Shenanigans!!! by coobal · 2011-09-13 14:24 · Score: 1

Agree so totally much. You cannot be this much money into a project without a project plan.
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 14:52 · Score: 0

Truth!
Two weeks away and still at the “thinking of cool shit to use it for” and “picking out hardware” stages? How does that even happen? Is this some kind of tax scam to burn as much money as possible?
How does this happen? Easy. It is probably one of the "shovel ready" infrastructure projects the White House is so proud of.
Re:I call Shenanigans!!! by afidel · 2011-09-13 14:58 · Score: 1

NVidia's Tesla cards are GPGPU's, they have no graphics hardware.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:I call Shenanigans!!! by Archimboldo · 2011-09-13 16:04 · Score: 1

Whether or not this is a true story, or whether or not it's a government project, there is as much budget-reserving in private industry like what you described as there is in government. Probably more, since government is more transparent than private business, and so more people have access to exposing that little game, which tends to inhibit it some.
Really? I haven't seen it at any of the private companies I worked at. I admit some were big companies and I saw only a few subgroups; I also admit my sample isn't big, but I wonder if you have any better data than I have to make your claim.
Re:I call Shenanigans!!! by Dahamma · 2011-09-13 16:52 · Score: 1

Yeah, but that's part of the point - why would they get a budget that would give them so much computing power that they didn't know what to do with the excess? And even sketchier, what academic project has ever looked for commercial uses for a big funded setup like this to "recoup some of the costs"?!?
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 21:42 · Score: 0

Nor would anyone have already ordered 1200 nodes without deciding what interconnect fabric they were using. Come to that, nor would anyone with the slightest clue ever "Ask Slashdot" about whether to use IB or 10GbE.
Re:I call Shenanigans!!! by ccguy · 2011-09-13 22:29 · Score: 1

And no way in hell the administrator of such a project would ask Slashdot what to do with it.
Why not? Believe it or not there's lots of clever people here (lazy as hell, but clever), and you might actually get good ideas. Just because you ask here doesn't mean you are going to actually do whatever is suggested. But getting feedback for a nerd community can't hurt...
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-13 23:29 · Score: 0

yes you would not get funding to do this with out knowing exactly what you are doing - when I looked into which disro to use (I am looking at ML for a big publisher) scentific linux is comonly used on big clusters.
Though ECL (Thor/Roxie) our parent company's open source hadoop replacement runs on centos suze and ubuntu http://hpccsystems.com/
Re:I call Shenanigans!!! by tsa · 2011-09-13 23:37 · Score: 1

Any boss without knowledge of IT but with an interest in 'cool stuff' would do that.

--
-- Cheers!
Re:I call Shenanigans!!! by TheRaven64 · 2011-09-13 23:50 · Score: 1

No way in hell a project that big gets approved without a rationale.
I wish that were true. The Welsh Assembly Government recently approved £40m of funding for two supercomputer centres with the rationale basically being 'with big-fast computers we can do loads of science! And industry! And it will make loads of money!' The facility is meant to be shared between industry and academia, but no one involved has the slightest clue what the possible industrial uses are for a (not yet designed or deployed) supercomputing facility.
So, it's a depressing question but, sadly, quite a plausible one.

--
I am TheRaven on Soylent News
Re:I call Shenanigans!!! by Amouth · 2011-09-14 00:19 · Score: 1

looks like someone forgot to lock the marketing monkey back in his cage before they left the office.

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:I call Shenanigans!!! by Amouth · 2011-09-14 00:33 · Score: 1

if the equipment is new then you are looking at ~2k per box (could be cheaper if they where using blades but as he is asking about interconnects it isn't)
1200 * 2k = $2.4m not counting the space to house it. because they aren't blades best case is 1u so ~30 racks if you can handle an incredible heat and power load per sqr/ft.,. more than likely need to double that or at least 1.5 it.. so 45-60 racks to house it and the associated data-center cost.
assuming it's an HPC it shouldn't need external bandwidth - if they where to rend space in a data center - the prices range from 35-75$ a month per U for the space and power no bandwidth. so 1200*[35-75] = 42k-90k a month in hosting/operating costs.
"Any boss without knowledge of IT but with an interest in 'cool stuff'" who authorizes $2.4m build and ~0.5-1m$ a year in operating costs on something that has no plan, has zero business being able to authorize purchases and should be removed from his job.

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-14 01:40 · Score: 0

I concur... an investment that large w/o major purpose, and on top, not sure what linux distro to use? bogus
Re:I call Shenanigans!!! by dkuntz · 2011-09-14 02:49 · Score: 1

if the equipment is new then you are looking at ~2k per box (could be cheaper if they where using blades but as he is asking about interconnects it isn't)
Still could be blades... if it's more than 1 chassis... 4-10 blades would not really be a super computing cluster, so there's still interconnects between the various units...

--
OMG... I have a sig?
Re:I call Shenanigans!!! by tsa · 2011-09-14 03:13 · Score: 1

You mean like a boss who starts a war in, let's say, Iraq? I can't agree more. Still, it happened and he got away with it. There must be many more bosses like him in all sorts of work.

--
-- Cheers!
Re:I call Shenanigans!!! by Amouth · 2011-09-14 03:20 · Score: 1

even if it was smaller blade chassis the comment of dual gige ports per node is a cue in that theses are more than likely 1u boxes rather than even small blade units.

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:I call Shenanigans!!! by mapsjanhere · 2011-09-14 03:30 · Score: 2

Everyone assumes this is a government funded project. I see an administrator at a start-up, running a bunch of promising biochemical/medical simulations stuff on a 20 machine cluster using some linux-based code. Now they got some serious venture capital investments, and venture capital wants fast scale-up for fast flipping. If the researchers say they can do their work in 3 years on the 20 machines or in 1 year with a couple million in new hardware, the couple million will not even cause a blink to a major investor. So the hardware gets ordered, and someone runs the numbers and finds out they don't have enough stuff to run on their machines, the machines not only cost a couple million but also serious dollars in maintenance/cooling etc, lets find something commercial to do with them (also generating a nice non-project bound income stream). And now the admin realizes that while their code runs mostly on the local nodes since was written and optimized 15 years ago on Redhead 5.0, that won't impress people who pay the big bucks for extra cycles. And since they told everyone "the OS is free, it's Linux" they don't have money and don't have time to get someone to explain all the little details of running a HPC for profit. Voila, your slashdot post of "Help, I'm in an airplane, how do you land this thing".

--
I'm aging rapidly, I bought a new game and had no idea if my machine was good for it.
Re:I call Shenanigans!!! by Anonymous Coward · 2011-09-14 06:00 · Score: 0

I completely agree with this. I have to jump through serious Purchasing and MO hoops to purchase even a little (96 core) HPC, and I work for one of the premiere academic institutions in the US...
But, just in case he is being serious...
No OS has chargeback systems built in. We like RHEL and have had good luck with it. You can go CentOS or SL if you don't want to pay for support.
Install every queuing and message passing system on that sucker (Torque, MPI, SGE, etc). Definitely go for the QDR Infiniband. With that many machines, messaging becomes a huge bottleneck. The extremely low latency and high bandwidth (40Gbps) of QDR IB make it very appealing. Also, it's getting more reasonably priced since many companies are getting into the IB game. We've had good luck with Melanox switches and cards. Beware the slightly faulty IB cable...it'll drive you up a wall until you find it.
As for GPU computing (mainly CUDA), we've had good luck with GTX 470 and GTX 460 cards, and we're testing out a GTX 550 as I write this. If you need double precision performance, you're better off with a TESLA card...
Good luck to you.
Re:I call Shenanigans!!! by Shotgun · 2011-09-14 06:32 · Score: 1

Heh, the Obama administration funded a solar cell company that couldn't be bothered to do the homework to see that China would be underselling them before they could get the plant built after the Bush administration wouldn't touch them, so I don't see this as being much of a stretch. Somebody must have made a few campaign contributions.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
Re:I call Shenanigans!!! by Shotgun · 2011-09-14 06:34 · Score: 1

No, he means one that would give over $500million to a company that can't figure out that the Chinese can sell the product cheaper than they can make it.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
Re:I call Shenanigans!!! by geekmux · 2011-09-14 09:59 · Score: 1

They are buying a supercomputer because their lucrative medical research is too big for the smaller HPC, but not (yet) big enough for the biggest supercomputer of its type in the region. So they're also looking for some other apps to use the extra capacity instead of it going to waste.
That might not be true - this is just a Slashdot assertion. But there's nothing inconsistent in there to suggest it's false. It's perfectly plausible.
You are just one of the modern type of people who make up your mind on your preconceptions, say something out loud, then refuse to listen to any reason you could be wrong or might reconsider. Denial feels so powerful, who cares what's true, right?
Well, to clarify, my preconceptions are based on personal experience, sound financial models, and common sense...you know the kind of decision making and math businesses USED to use back when the country was still healthy, instead of the steaming pile of financial shit we're in now, created by loophole justifications, "limitless" taxpayer funding, and "riders" like this. Sorry, but "not yet big enough" should have probably translated into the budget as "no, you're not going to build the largest one just so you can claim you have the largest one." We're not exactly riding the dot-com wave these days, and even that era should have been a lesson learned.
And the overall lack of prior planning for something as large as this(how do I hook it up??) was the real issue here, and I certainly wasn't alone here calling Shenanigans.

Ummm two things by Sycraft-fu · 2011-09-13 09:36 · Score: 3, Insightful

1) Something with 10gb really isn't a "supercomputer" it is a cluster. Fine, but call it what it is. I really wouldn't call a cluster with Infiniband a supercomputer either.

2) You really should maybe get someone who knows more about your project and someone who knows more about clusters/supercomputers. The questions you are asking are not ones I would want to see form the guy making the choices on a multimillion dollar project.

Re:Ummm two things by Anonymous Coward · 2011-09-13 10:15 · Score: 2, Interesting

You clearly have no idea what you're talking about. I was just part of a million-euro EU project consisting of a large partnership of universities and companies. Given the fact that none of them ever did anything, my professor gave up and defined the project on his own.
I coded the entire project on little more than minimum wage while I was also attending classes. I managed a couple of helpers who did web design and documentation, and dealt with the rest of the partners on my own, even interacting with fancypants EU higher-ups at some point. I was also in charge of administrative work such as financial reports. I dealt with the university accounting department directly as well as their administrative staff. I booked flights and physically walked over to the traveling agency. I represented the project at every single conference where it was demo'ed. As part of its end goal of meeting an audience target of a few thousand people, I took the initiative of aggressively promoting the project and was met with huge success.
The vast majority of the cash was spent on people who did absolutely nothing other than throwing one or two opinions in the 18 months the project lasted. Our university's share was used to buy new chairs and tables and repaint the walls etc.
Life in academia is serious research. Very serious. Investing in "science" will solve the world's problems.
Re:Ummm two things by Anubis350 · 2011-09-13 10:20 · Score: 2

1)You haven't been to any computer conference (like, say, SC) have you? or worked on a supercomputer? Most supercomputers these days are clusters, and hell, one of the most common interconnects is still gigE, not even 10gigE, though that's slowly changing (check the top500 stats if you don't believe me, but I've been at SC's top500 announcement every year for the past 4, and it's been mentioned each time. For that manner I run jobs on a gig based cluster everyday, and for many types of work it's not necessarily a hangup).

2)I'm going with "article is fake", no-one commits the resources to spec, build, and power a cluster of that size without a projected use. You should see the hoops you have to go to to spec machines a fraction of the size ::shudders::

--
"goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
Re:Ummm two things by Sycraft-fu · 2011-09-13 10:42 · Score: 2

They may call them "supercomputers" but in my mind that is mislabeling things. They work for cluster operations, where there's not a ton of inter-node communication and no need for access to memory outside your node. Well, that is what supercomputers were made for. So in a real supercomputer, you have the ability to do that. That is also why real supercomputers cost more.
I think it is an important distinction for that reason. While a supercomputer can do all a cluster can, the reverse is not true. Same with distributed computing vs a cloud. If you have something that takes basically no inter node communication, just occasional communication with a server, then you can distribute it all over the net, using low bandwidth links, unreliable nodes, and so on. A cluster can do that stuff too, but there are things a cluster can do that cannot.
Re:Ummm two things by blair1q · 2011-09-13 10:51 · Score: 1

I think 2) is not seeing the whole story there.
They do have a continual use for mass quantities of computation. But it looks like it's not a 24/7 workload. And with $/core dropping like a rock, this iteration of the "biggest" may be cheaper than the last, and therefore not the sort of budgetary lightning rod that building-sized supercomputers used to be.
Re:Ummm two things by bananaquackmoo · 2011-09-13 10:51 · Score: 1

I would mod this up if I could. All the other comments here seem to be about how this story is a fake, nobody would ask this question of Slashdot on a short timeframe with tons of funding and no idea where to go with it. Well I think you nailed it, 100%. It's likely someone's pet academic project.
Re:Ummm two things by wagnerrp · 2011-09-13 11:11 · Score: 1

The only real difference between clusters and shared memory "supercomputers" is that shared memory systems get a hardware assist to access remote data, while clusters have to do it all in software in the network stack and communications framework. When your infiniband backbone is running 5GB/s and latencies in the hundreds of nanoseconds between each node, where is the real cut off? It seems more like a gradual sliding scale to me.
Re:Ummm two things by Anonymous Coward · 2011-09-13 12:05 · Score: 0

Uhh...the 90's called and they want your supercomputer back. Modern Infiniband allows scaling to at least 100's of nodes without internode memory latency being the limiting factor.
Re:Ummm two things by Anubis350 · 2011-09-13 13:32 · Score: 1

That's just not true at all, particularly when you're talking about DDR and QDR infiniband (why yes, I do use software that does heavy internode communication on clusters, and am familiar with how performance scales with internode communication, in fact for a while it was a focal point of my research), not to mention 3D networks like BG's torus interconnect.

Simply put if you were involved in the HPC world you've been out of touch for a long time now in order to draw the distinction you're drawing. Yes, a gigE connected cluster is going to be outperformed by a shared memory system when nodal communication is an issue, but depending on the app that may go away at 10gigE and certainly fades completely as you get into higher speed infiniband. At that point it's a debate about implementation style, and will be based on specific targeted purpose of the machine, cost of equipment, and cost of operation of equipment available. An infiniband based cluster can and is comparable to a similarly priced and equipped shared mem machine.

--
"goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
Re:Ummm two things by Anonymous Coward · 2011-09-13 16:04 · Score: 0

You have NO clue of what you talk of. I have worked at HP, IBM, Bell Labs (when it was bell labs), Watson Labs, US West AT, and NASA. And yes, I have been on many projects in which we we went looking for ideas, customers, etc after we bought the equipment. For example, in 1985, our group bought a VERY expensive Teradata machine. THEN we ran around looking for customers. THis is COMMON. All too common. Even now.
Re:Ummm two things by Dahamma · 2011-09-13 16:59 · Score: 1

You really should read the post you replied to again, he is completely correct. Most supercomputers today are relatively general purpose hardware with high speed interconnects.
To be blunt, it doesn't really matter what "supercomputer" is defined as in your mind, it matters how the majority of supercomputer developers and researches define it in their minds...
Re:Ummm two things by Anonymous Coward · 2011-09-13 21:11 · Score: 0

Oh boy.

They work for cluster operations, where there's not a ton of inter-node communication and no need for access to memory outside your node.

Well, that is what MPI is made for.

So in a real supercomputer, you have the ability to do that. That is also why real supercomputers cost more.
What about you check the specs of a "real" supercomputer and see how similar they are to a "cluster"'s? Individual multi-core processors, tightly coupled with a low-latency, high-bandwidth interconnect. Whereas x86 or PPC, fat-tree IB or torus numalink, that's pretty much the same.

I think it is an important distinction for that reason. While a supercomputer can do all a cluster can, the reverse is not true.

A cluster can do inter-node communication, given the right interconnect. GigE won't get you that far, but QDR (and soon FDR) IB performs equally well. Heavens, they even do "supercomputers" over IB !
Re:Ummm two things by Phurge · 2011-09-13 23:27 · Score: 1

sigh, such a typical slashdot response
If the question was "what's the best gaming PC for $1,000?", the answer would be:
1) if you're not running Crysis on your overclocked Amgia, (on some weird-ass linux distro of course) then you have no right to be asking, and obviously don't know what you're talking about.
2) No way in hell your mom is gonna give you the $1,000.

--
I'll see your hokum and raise you a boondoggle.
Re:Ummm two things by Anonymous Coward · 2011-09-14 01:09 · Score: 0

Agreed on #2, and additionally, receiving "everything necessary" in 2 weeks and half of it isn't spec'd. What GPU, OS, and Interconnects? Systems of this cost and scale are completely spec'd before resources are commited. This is almost a joke, or maybe just the implementors are.
Re:Ummm two things by mjwalshe · 2011-09-14 06:53 · Score: 1

and without checking if the room they want to put this in has the required power available

Is this a joke? by Anonymous Coward · 2011-09-13 09:36 · Score: 1

You've got hardware for a supercomputer coming but you haven't thought out what OS you're going to use? Shouldn't this all be decided, designed and ready to go already?

Re:Is this a joke? by hedwards · 2011-09-13 12:09 · Score: 1

OS? I'm guessing he's planning to run Emacs on it. It should be just enough to get it working and a web browser at the same time.
Re:Is this a joke? by NeoMorphy · 2011-09-13 13:36 · Score: 1

OS? I'm guessing he's planning to run Emacs on it. It should be just enough to get it working and a web browser at the same time.
Unless emacs was your web browser :)

Uh oh.. by joib · 2011-09-13 09:36 · Score: 4, Insightful

Shouldn't you have figured out answers too all these (simple) questions before ordering several million $$$worth of hardware? Sheesh.. As for you specific questions: - IB vs. 10GbE: IB hands down. Much better latency and more mature RDMA software stacks (e.g. for MPI and Lustre). Cheaper and higher BW as well. - GPU: NVidia Fermi 2090 cards. CUDA is far ahead of everything else at the moment.

Re:Uh oh.. by Anonymous Coward · 2011-09-13 10:13 · Score: 1

CUDA programmatically has far more limitations than MPI running on a cluster like this guy is describing... however, for things like gene sequencing, and various cancer research CUDA is the better choice. That is neither here nor their though. TACC uses CentOS on Ranger I believe... and they do lease cycles on it, and/or let people pay to run with batches. https://portal.tacc.utexas.edu/ You could also contact xsede at https://www.xsede.org/ TACC is part of XSEDE, but they operate somewhat autonomously and neither generally knows what the other is doing... TACC is also generally far more accommodating toward things like what you are asking...
Re:Uh oh.. by Savantissimo · 2011-09-13 10:54 · Score: 2

I'll assume you know more about this than me, but he did say that the nodes are going to be wired with 4x GigE. Might there be a penalty bridging from that to IB rather than 10GigE?
Anyway, to get low latency those GigE links to the nodes need to be optimized. I thought this was interesting:

High performance network technologies such as InfiniBand use a kernel by-pass method to improve performance. This capability is also available for Ethernet, but is not widely used outside of the HPC community. One such methodology is Intel® Direct Ethernet Transport (DET), which works by providing a User Direct Access Programming Library (uDAPL) interface like InfiniBand. uDAPL defines a single set of user APIs for all Remote Memory Direct Access (RDMA)-capable transports. DET includes a kernel module and an uDAPL library for Ethernet and will work on almost any Ethernet NIC. It can be linked with any software requiring a uDAPL library, such as an MPI version.
Another popular kernel by-pass effort is the Open-MX project. Open-MX is based on the Myrinet MX protocol. Essentially, any software that links to the Myricom MX library should be able to link with Open-MX. Currently, Open MPI, MPICH2, and the PVFS2 file system have all been shown to work with Open-MX. While Open-MX will work with almost all GigE and 10-GigE chip-sets without modifying drivers, it does require kernel 2.6.15 or higher to work. Depending on the chip-set Open-MX latencies as low as 10 seconds for GigE have been reported.
(From The Ethernet Cluster
For 10GigE here's a recent low-latency benchmark:
Audited STAC-M2 Benchmark of IBM LLM on an IBM-BNT G8264 switch, using IBM x3550 servers and Mellanox MNPH29C-XTR ConnectX®-2 EN with RoCE
"Using standard Ethernet and RoCE protocols, at the base message rates set by the specs, the mean latency of the solution did not exceed 7 microseconds, while standard deviation of latency was measured at 1 microsecond. At the highest tested rate of 2.3 million messages/second, the mean latency of the solution was just 13 microseconds while the standard deviation of latency was measured at 2 microseconds."
Chelsio claims 3 microsecond latency using RDMA over 10G Ethernet on their "T4" model: "Chelsio T4 Unified Wire adapters can run iWARP RDMA, TCP, iSCSI and FCoE simultaneously with full offload and deliver full wire speed throughput and extremely low latency between the computing nodes, the storage resources, and the user and cluster management nodes in any HPC environment." Not sure how much that really costs compared to IB, though.
They also say:

"Since IB lacks congestion management and adaptive routing, it quickly hits hot spots even in clusters of moderate size. iWARP over Ethernet, in contrast, achieves reliability via TCP, which results in a lower effective latency for useful applications."
*"10Gb IB link is effectively 8Gb. Furthermore, InfiniBand cards, like Ethernet cards, are limited by PCIeGen2 x8. Independently of how many 10Gb or 40Gb ports an adapter exposes, the aggregate bandwidth is limited to about 26Gbps in each direction. Therefore, Chelsio’s T4 based adapters and the leading IB adapters offer the SAME bandwidth."
*"Ethernet switch port prices have reached parity. The same can be said about adapter prices. However, an IB cluster further requires an Ethernet switch for management, a gateway for routing, and expensive IB storage available from a limited set of suppliers, as well as specialized IT personnel."

--
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
Re:Uh oh.. by LWATCDR · 2011-09-13 11:05 · Score: 1

This has got to be a troll. I mean really setting up a cluster and you have no idea about the interconnects or GPUs? Not to mention cooling or power. I picture this being put together in a spare back room and walls of plastic shelving and APC UPSs from Best Buy.
Who would fund such a thing.
Here is the best of all suggestions if this is not a troll. FIND A VENDOR. http://www.linuxclusters.com/vendors.html

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Uh oh.. by LWATCDR · 2011-09-13 11:08 · Score: 1

I really want to believe that you are correct but I have dealt with government IT people before. This could be on the up and up, good lord help us all.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Uh oh.. by Anonymous Coward · 2011-09-13 12:46 · Score: 0

My goodness - iWARP, TCP, iSCSI *and* FCoE? Amazing! I hear it will even open multiple TCP connections *at the same time!*
And what's that I also see? A long blurb from the vendor claiming that a competing technology is inferior to that used by their product? I'm sure it's an absolutely unbiased and accurate accounting of the facts that doesn't gloss over different use cases or relative advantages or disadvantages of each method at all.
On the other hand, anyone using vendor marketing as an acceptable means of comparison between competing technologies should find it a perfect fit to ordering millions of dollars of hardware without a clue of how or what to use it for.
I'll make no representations as to the actual merit of different transports or the accuracy of this particular vendors claims, they may be spot on for all I know, however:
Here's a clue, kids - don't accept *anything* a vendor says at face value, including any "study" with "facts" they might have at hand. Oh, the number of "helpful studies" I've seen bandied about in meetings from different vendors trying to push their own (insert fabric here) switches, or SANs, or virtualisation technologies, all with very subtle or sometimes blatantly obvious deficiencies and biases in test methodology.
If you have to spend a sizeable amount of budget, do the research yourself, because you'll be the one holding the can when that amazing tech you bought on the vendors say-so falls flat on its face. They'll be the ones holding your money, and will have any number of helpful reasons to hand as to why it's a tragic and regrettable situation that it's not working in your specific case only, and that they can offer you technical support or additional hardware at a very reasonable rate that will do everything except fix the bad assumption you're continuing to throw money at.
PS - anyone that uses "reliability via TCP" and "low latency" in the same sentence *deserves* to be shot.
Re:Uh oh.. by Savantissimo · 2011-09-13 13:40 · Score: 1

Well, maybe so. It is marketing stuff.
However this independent comparison by IBM shows that this vendor's 10G products are neck and neck with IB, even faster on some tasks. 10G is a couple of percent lower performance on many tasks, and substantially slower on ~10% of tasks. The prices of adapters are somewhat higher for 10G, but the switches and extras such as cabling seem to more than make up for it. Basically as far as performance and HW cost, there isn't much difference at all. But 10G will likely be easier to maintain and monitor, there are more vendors available for equipment and it will be easier to get support and expertise for 10G.

--
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
Re:Uh oh.. by Savantissimo · 2011-09-13 13:42 · Score: 2

That IBM whitepaper link was supposed to be: Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet

--
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
Re:Uh oh.. by Anonymous Coward · 2011-09-13 13:51 · Score: 0

actually, in bioinformatics the vast majority of problems are embarrassingly parallel. IB is often unnecessary. We use 10GigE because we do tons of IO, but latency is not an issue (we rarely do message passing, most of our parallelism is a mixture of embarrassingly parallel and multi-threaded, sometimes combined in the same solution). We use 32 core nodes w/128GB RAM, 10GigE, and a big clustered filesystem. IB can be cheaper than 10GigE, but unfortunately our proprietary clustered filesystem is Ethernet on the front end (back end connecting storage nodes is IB though).
Re:Uh oh.. by afidel · 2011-09-13 15:31 · Score: 1

The switches are now really, really cheap. A Force10 S4810 which can have 64 10Gb ports (in 1U!) can be had as cheaply as $15k without even really shopping. Then again a Mellanox IS5025 with 36 40Gb ports can be had for only $5k. There are definite advantages to higher port density though as it reduces the complexity of the graph.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Uh oh.. by joib · 2011-09-13 18:23 · Score: 1

I'll assume you know more about this than me, but he did say that the nodes are going to be wired with 4x GigE. Might there be a penalty bridging from that to IB rather than 10GigE?
The way I read it, it means that the nodes have 4 1 GbE ports builtin on the MB. If you're going to use IB, you'll by separate PCIe IB cards for each node. The 1GbE ports can then be used to run management traffic etc. Or left unused, there's no law saying you have to use them all, and since 1GbE ports are practically free it's not like you're leaving any money on the table either.
Wrt RDMA over ethernet, iWARP this and that, yes I know it exists. My point was that RDMA has been supported on IB since day 1, the software stack is mature and widely used, which can't be said for ethernet RDMA. Since IB infrastructure so far is cheaper there's really no reason to go with 10GbE.
That's not to say that 10GbE is useless. Of course it's useful, e.g. if you run high-bandwidth services over TCP/IP accessible from outside the cluster. But that's not what you're doing on a cluster. A cluster interconnect is typically used for MPI and storage, both of which can run over RDMA, avoiding the by comparison heavy-weight TCP/IP protocol. And of course, at some point 10GbE will replace 1GbE as the cheap builtin stuff on MB's.
Re:Uh oh.. by joib · 2011-09-13 18:52 · Score: 1

One major advantage of IB here is that it natively supports multipathing; there's no need to avoid loops in the graph either by topology or by using spanning trees. This allows one to build networks with decent bisection BW without needing big and expensive über-switches.
There are a few efforts to bring similar capability to ethernet as well, TRILL and 802.1aq, AFAIK neither of which is ratified at the time of writing this.
Re:Uh oh.. by Anonymous Coward · 2011-09-13 19:23 · Score: 0

You've got to factor in the cost of populating all the 10GbE SFP+ ports on the switch (and potentially also the HBAs), 10GbE SFP+ modules can still be expensive, and that cost mounts quickly. It seems cheaper to use CX4 10Gb switches and adapters, which brings the cabling cost and difficulty up to that of IB. I'd also be wary of the bandwidth and port to port latency with some of the cheaper high density switches, although generally you can't go wrong.
Re:Uh oh.. by Amouth · 2011-09-14 00:55 · Score: 1

one thing i see all too often on the cheaper high density switches is a very real and too low limit on the switch fabric, too many people forget to look at that closely, also on some of them you have to look at it not just the total switch fabric rate but also at the switch block rate and the interlock rate (different than port to port speeds).

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Re:Uh oh.. by afidel · 2011-09-14 01:25 · Score: 1

SFP+ allows copper connections up to 15m, you should be able to keep most of your connections that close since that allows the cluster to be 30m x 30m with central switches.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Uh oh.. by dkuntz · 2011-09-14 03:11 · Score: 1

But, Force10 switches actually are crap. They're just like Cisco in every way except everything...
At my last job, we had nothing but problems with Force10 equipment...including one which started smoking (literally...) at about 40gbit...

--
OMG... I have a sig?

Crysis 2 by Arnos · 2011-09-13 09:37 · Score: 2

Perhaps this can actually run (gasp) Crysis?

Re:Crysis 2 by blair1q · 2011-09-13 10:52 · Score: 1

Perhaps it can program Crysis...
Re:Crysis 2 by poity · 2011-09-13 11:08 · Score: 1

With 7000 cores, it can probably ray-trace Crysis 0.o

--
your thin skin doesn't make me a troll
Re:Crysis 2 by Anonymous Coward · 2011-09-13 11:28 · Score: 0

Crysis 2 could run on most machines. It's the original Crysis game that really pushes hardware and still does today.
Re:Crysis 2 by AsmCoder8088 · 2011-09-13 15:34 · Score: 1

FYI, 1200 machines * 2 sockets per machine * 6 cores per socket = 14400 cores, not 7000.

Nothing to see here by BennyB2k4 · 2011-09-13 09:37 · Score: 1, Offtopic

Hey look, elephants!

Re:Nothing to see here by BennyB2k4 · 2011-09-13 09:38 · Score: 1

---installs BTC miner---

I'm sorry...What? by jpedlow · 2011-09-13 09:40 · Score: 0

So you're a sysadmin for a Large Commercial Cluster and you've got hardware on the way and dont have answers to these questions already?

I aplogize if I've misread, but something just doesnt seem to add up here. :\ I'd get it if you were saying you've got a stack of maybe dual quads and were like "hey i've got a half rack of computers, please hold my hand for HPC", but with something the magnitude you're speaking of, I--dont-even. Trollface.jpg?

Riiiiight by GrumpySteen · 2011-09-13 09:40 · Score: 1

We're supposed to believe that you've purchased 1200 servers, 2400 six core CPUs and all the associated hardware without deciding basic things like how you're going to connect it all or what distribution you're going to use?

Re:Riiiiight by PPH · 2011-09-13 09:43 · Score: 3, Funny

Happens to me when I visit Costco all the time.

--
Have gnu, will travel.
Re:Riiiiight by Jah-Wren+Ryel · 2011-09-13 10:05 · Score: 1

We're supposed to believe that you've purchased 1200 servers, 2400 six core CPUs and all the associated hardware without deciding basic things like how you're going to connect it all or what distribution you're going to use?
Sounds like they got some of that 75 billion dollars per year of anti-terrorism money.
Even though he's dead, Osama still knows how to make it rain!!

--
When information is power, privacy is freedom.
Re:Riiiiight by phil_aychio · 2011-09-14 08:46 · Score: 1

While all this talk about clustering and infiniband is making me all giddy, I think we are overlooking an obvious scenario. This isn't and can't be the actual person who is the sole lead engineer for this project. He's probably one of those (semi) technical writer types who wants to understand more about how all this stuff works under the hood so he can actually talk to the people who are planning and implementing it, or at least have something to break the ice. Maybe the people building this out are women. Extraordinary women. Can't blame him, I would too. "Ohhh..." he says as we walks by the voluptuous engineers, "I see you are wiring it up with 10GbE. The CPU's must be too slow for Infiniband.", he chuckles. An engineer turns to him, and in a sultry, yet raspy voice, responds "Maybe you can help give us get some extra bandwidth?" as she stares with her full pouty lips. Or something like that.

--
obvious redundancy is obvious

How we do things by Anonymous Coward · 2011-09-13 09:40 · Score: 1

We have this exact same setup (20% the size though) we use infiniband, we have not had good luck with 10Gige though we do use it for globus end points. 40Gig IB has better MPI performance and higher bandwidth at a lower cost. You can also get ethernet->IB gateways to help with any issues if you need to use IP over IB.
Also make sure your MPI library support OFED (open fabrics http://www.rce-cast.com/Podcast/rce-34-ofed-openfabrics-enterprise-distribution.html) or you won't get the performance you want.

As for GPU's look at the dell 410x, http://www.dell.com/us/business/p/poweredge-c410x/pd it connects upto 6 hosts upto 16 GPUs. Be ready with the 220V power.

Check out www.rce-cast.com for a bunch of podcasts on HPC type stuff.

Seriously? by Anonymous Coward · 2011-09-13 09:41 · Score: 0

If you have to ask, it's doubtful you're telling the truth. Any organization with enough resources to build a supercomputer has experts on staff who have already figured this out, because they'll have designed everything in advance.

professionals... by Anonymous Coward · 2011-09-13 09:42 · Score: 0

I think you need to hire a consultant or an expert. You have very specific, for-profit (even if it's cost-recoup) needs, with a chance of liability if things go wrong/ downtime. In short, requiring some warranty or insurance of some kind. Standard stuff that comes with commercialization.

There's friendly help, and then there's helping-you-do-your-work-for-free.

Pong by Vandilzer · 2011-09-13 09:43 · Score: 2

One really smooth and acuter game of pong! or asteroids if that suits you fancy... though it will require a bit more computing power :)

EPIC TROLLING by jpedlow · 2011-09-13 09:44 · Score: 4, Insightful

Wow, he just TROLLED THE CRAP out of slashdot. We mad, bros!

Use it to run Skein Hash... In Bash by Alain+Williams · 2011-09-13 09:46 · Score: 1

You are going to need something like that to get Skein Hash In Bash done in an acceptable time.

Mom & dad's new basement data center by macraig · 2011-09-13 09:46 · Score: 1

It would appear somebody got enough of a life to move out of mom and dad's basement and now wants to convert it into a Bitcoin mining hub....

What we do ... by Anonymous Coward · 2011-09-13 09:47 · Score: 4, Informative

Similar size setup in bio-informatics in Europe. We run redhat 6.1, was centos 5 and LSF. single 1gbit to each server (blades). No need for 10gb or IB unless huge mpi which no one uses. 32GB to 2TB per node - some people like enormous R datasets. All works well for our ~500 users.

Re:What we do ... by gknoy · 2011-09-13 10:00 · Score: 1

Thank you for posting the first informative post I saw, rather than mocking or trolling ones. :)
Re:What we do ... by Anonymous Coward · 2011-09-13 13:01 · Score: 1

Not sure what you mean by "huge MPI" but we do a lot of our work in OpenMPI. Sometimes this involves passing around more data than is probably sensible, because the time it takes to shoehorn existing code to fit the project is waaay less than the time it takes to "do it right.". (if there's one thing I've learned in CS, it's that doing things the dumb way often makes more financial sense). I wish we had IB for our teensy (128 core) cluster. We do have some about 1TB per node, mostly for things like confocal stacks, evolution strategy intermediate datasets, porn, stolen music, etc.
Re:What we do ... by Nimey · 2011-09-13 13:58 · Score: 1

I'm sure you mean RHEL 6.1 and not Red Hat 6.1.
That was an amusing picture of someone downgrading a very nice cluster to a 12-year-old version of Linux with a 2.2 kernel.

--
Hail Eris, full of mischief...

E pluribus sanguinem
Re:What we do ... by Anonymous Coward · 2011-09-14 01:58 · Score: 0

No you run redhat enterprise linux 6.1 (rhel 6.1) i hope
Redhat 6.1 came out late 1999

How about by Anonymous Coward · 2011-09-13 09:48 · Score: 1

LFTR modelling? It's one of the main things holding back the tech.

Re:How about by drwho · 2011-09-13 10:38 · Score: 1

OK, LFTR (Liquid Fluoride Thorium Reactor) development would be useful. Can you explain what modeling needs to be done? Is this merely a provisioning problem (you haven't got the computational resources), or it is also a programming problem, and perhaps even an algorithm problem (do you know what you want to compute)?
Another question is, who would own the results?
Re:How about by Anonymous Coward · 2011-09-13 12:32 · Score: 0

Modeling the reactions and daughter products, thermodynamic and fluid flow analysis, etc. on various reactor core designs and tweaks thereof.
Hopefully open sourcing the resulting data and information.
My main thoughts are anything which reduces long life isotopes in storage from current reactors, and which can be safer than what's currently out there, is worth researching.
As for me, I don't have the skills needed to do such things. I'm a sys admin and I hate software engineering. I've got a degree in physics and I've been following thorium development for the past 20 years, so that's some explanation of my interest.

At the top of his game by colinu · 2011-09-13 09:49 · Score: 1

Trolls don't get the respect they deserve. Supp0rtLinux is an artist.

Did someone say Bitcoin!? BUY! BUY! by recrudescence · 2011-09-13 09:50 · Score: 2

Holy crap! Someone mentioned the word "Bitcoins" on slashdot again! It's only a matter of time before its value hits the roof again! Quick! BUY! BUY!

Re:Did someone say Bitcoin!? BUY! BUY! by blair1q · 2011-09-13 11:32 · Score: 2

Fuck that. What's the ticker symbol for "Beowulf Cluster"?

Monkeys! by eljefe6a · 2011-09-13 09:50 · Score: 2

How about helping me out with some computing power for my monkeys project? http://www.jesse-anderson.com/2011/08/a-few-more-million-amazonian-monkeys/

Obligatory by Anonymous Coward · 2011-09-13 09:50 · Score: 0

shit coins.

Re:Obligatory by webmistressrachel · 2011-09-13 10:00 · Score: 1

Well, if he installs the bitcoin generator mentioned plenty of times above, I'm sure the computer will literally shit meta-coins!

--
This tagline was transcoded to result in at least one smirk. If you experience failure to smirk, please consult your Gen

Totally believable. by khasim · 2011-09-13 09:50 · Score: 3, Interesting

I totally believe the submitter's question.

Next up on Ask Slashdot:
I just got permission to buy the biggest fleet of trucks on the east coast ... and I was wondering if anyone on Slashdot had any ideas what I should do with them.

Followed by,
The company I work for just purchased 10,000 acres of land on the east coast and I was wondering if anyone on Slashdot had any idea what we should do with it.

Happens all the time!

Re:Totally believable. by Anonymous Coward · 2011-09-13 10:21 · Score: 1

And slashdot forum:

The barely-legal hot teen nymphomaniac convention asked me to be a judge at the tight pussy competition, but my WoW clan is going out to Taco Bell. What should I do?
Re:Totally believable. by ColdWetDog · 2011-09-13 10:24 · Score: 1

You're just trying to get us arrested. Better luck next time, Mr. DEA agent....

--
Faster! Faster! Faster would be better!
Re:Totally believable. by blair1q · 2011-09-13 10:47 · Score: 3, Interesting

Actually, it does.
I remember taking possession of a spanking-new Thinking Machines cluster some <mumble> years ago.
The principal investigator got it to do one particular calculation, and promised the excess would be put to good use.
We spent our time trying to figure out what "good use" meant in that context.
It hasn't got much easier.
I say if you run out of numbers to crunch of your own, these days, just hook it up to some lucky grid-computing project and let it swamp the stats.
Re:Totally believable. by Anonymous Coward · 2011-09-13 12:02 · Score: 1

Lease the land to the guy with the trucks for parking.
Re:Totally believable. by hedwards · 2011-09-13 12:08 · Score: 1

If you run out of numbers to crunch, wouldn't you then just shut the thing down? All that energy used isn't free. Or is there some way of writing off the cost of donating cycles to whatever charity project one chooses.
Re:Totally believable. by Anonymous Coward · 2011-09-13 12:08 · Score: 0

You have a supercomputer, why not have it make the decision?
Re:Totally believable. by blair1q · 2011-09-13 12:59 · Score: 2

Things like that generally cost more to shut down and power back up than the power you use letting them run the screensaver.
Re:Totally believable. by itamblyn · 2011-09-13 13:33 · Score: 2

Right, and it's bad to turn off a car even for a second, and you're better off running the AC with the window open.
Re:Totally believable. by Anonymous Coward · 2011-09-13 16:56 · Score: 0

Pot...both of them.
Re:Totally believable. by ls671 · 2011-09-13 17:58 · Score: 2

Well, shutting your car down and powering it up excessively will cause a car gas engine to wear faster since it is generally accepted that an important part of a car engine wear occurs when you power it up. For a short period, oil isn't evenly distributed and this cause excessive wear and stress compared to while it is running smootly.
For the rest things like:
-"not shutting your water heater when you leave for 3 months will save you money because it will cost more in the end to eat the water when you get back"
-"it cost less in fuel or electricity to leave and engine running because restarting it will burn so much more -fuel/electricity"
etc. etc.
are mostly urban legend.

--
Everything I write is lies, read between the lines.
Re:Totally believable. by ls671 · 2011-09-13 18:05 · Score: 1

Maybe he is going to feed /. comments in his supercomputer and have it do just that...

--
Everything I write is lies, read between the lines.
Re:Totally believable. by TheRaven64 · 2011-09-13 23:54 · Score: 2

The boot time for an SGI Altix is about 6 hours (I was at a fun talk by the guy at SGI doing the Xen port - he'd boot half a dozen machines so that he had one to work on when he'd crashed the last one). If you power a machine like this down when it's idle, the you're basically making it unavailable for a large category of jobs. If you can do the work in 6 hours on your computer or 10 minutes on the supercomputer, it's faster to do it on your computer because the supercomputer will still be booting up when you're finishing.

--
I am TheRaven on Soylent News
Re:Totally believable. by AmiMoJo · 2011-09-14 00:10 · Score: 1

Well, shutting your car down and powering it up excessively will cause a car gas engine to wear faster since it is generally accepted that an important part of a car engine wear occurs when you power it up. For a short period, oil isn't evenly distributed and this cause excessive wear and stress compared to while it is running smootly.
Not any more. My car has Stop & Go, basically when you stop at the lights or something the engine turns off. As soon as you put your foot down again it starts up. It is seamless and the starter motor and engine are designed to last the lifetime of the car (15+ years). We have been able to build stuff that reliable for decades, it's just that there was no demand before.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:Totally believable. by ls671 · 2011-09-14 00:49 · Score: 1

hmm..... price of fuel vs price to replace a worn engine ?
plus you definitely need a different battery I would assume...
This is just a patch to cope with fuel running out. I wouldn't believe that as easily as "not anymore" ;-)

--
Everything I write is lies, read between the lines.
Re:Totally believable. by godefroi · 2011-09-14 01:54 · Score: 1

-"not shutting your water heater when you leave for 3 months will save you money because it will cost more in the end to eat the water when you get back"
Modern water heaters are startlingly efficient. I looked into tankless heaters some time ago, assuming I'd win on the efficiency alone. I wouldn't.

--
Karma: Poor (Mostly affected by lame karma-joke sigs)
Re:Totally believable. by dkuntz · 2011-09-14 02:44 · Score: 1

Followed by,
The company I work for just purchased 10,000 acres of land on the east coast and I was wondering if anyone on Slashdot had any idea what we should do with it.
Happens all the time!
Housing development... or solid waste disposal...
or an illegal nuclear power plant, running off of stolen Soviet era uranium from their sub fleet...

--
OMG... I have a sig?
Re:Totally believable. by robthebloke · 2011-09-14 03:46 · Score: 1

at £6 a gallon for fuel (in UK), over the course of 100,000 miles (a conservative guess at the life of an engine) you'd be looking at fuel costs of £30,000 for averaging 20mpg, and £27,272 if you averaged 22mpg (assuming that this stop-and-go causes only a very minimal mpg saving - although that very much depends on your daily commute. i.e. someone living in london is likely to see a larger saving than someone living in the country). So over the lifespan of that engine you'd get a £2727 saving - assuming that the cost of petrol stays the same for the next decade or so. Given the somewhat crazy increases in the price of petrol in the last year or two, we can assume the savings will be much greater than that!

Last time I replaced an engine in my car (which was about 10 years old) the cost of a used engine (with 40k on the clock) was £800 from a breakers yard, and the cost of fitting was about £350 IIRC. Yes you could buy a brand new engine (which would cost mega bucks), but why would you buy an engine that's going to cost 3 times the value of the car at that point?

So yes. I'd go for fuel savings over the engine.
Re:Totally believable. by Shotgun · 2011-09-14 06:28 · Score: 1

For a short period, oil isn't evenly distributed and this cause excessive wear and stress compared to while it is running smootly.
This only happens if the car has been sitting a while and has had time for the oil to seep back into the pan. That generally takes at least several minutes. If it is happening faster than that, you're car will be oil starved and you'll have massive blowby...ie, you're engine won't work at all.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
Re:Totally believable. by CagedApe · 2011-09-15 00:06 · Score: 1

LOL. Ask Slashdot something related to computers and you'll get 100 experts who can give you and in-depth analysis or what how and why they are right. Ask them about something like gas engines, and you have a bunch of retards humping doorknobs. I love it. No, starting a car will not wear the engine. The parts inside your engine hold enough lubrication to protect it at start even when it has been sitting off for a long time. Anyone who tells you different is lying through their teeth. Next lets talk about framing a house, and see how much you nerds really know!
Re:Totally believable. by ls671 · 2011-09-15 00:24 · Score: 1

sure watson, and shutting down/turning on your tv/computer or just a light bulb a bazillion time a day won't make it wear any faster because no dilation of material occurs and dilation prevents wearing it out anyway...

--
Everything I write is lies, read between the lines.
Re:Totally believable. by ls671 · 2011-09-15 00:32 · Score: 1

even a more generic principle: changing the state of a system doesn't cause any wear ;-)

--
Everything I write is lies, read between the lines.
Re:Totally believable. by CagedApe · 2011-09-15 00:33 · Score: 1

We weren't talking about light bulbs, we were talking about engines, Have you ever rebuilt an engine? Guess what, the top end will still be covered in oil. Once the crank makes a single rotation it will become "evenly distributed" as you say. I beg of you to find ONE example of an engine that has been started to death. Good luck.
Re:Totally believable. by jc79 · 2011-09-15 19:04 · Score: 1

If you are only getting 20 mpg you are either a very poor driver or drive a tractor. My 11-year old Skoda (140000 miles on the clock) gets over 40mpg in town, and over 55mpg on the open road. Given the price of fuel, a less efficient car than that is just not worth it, not to mention the environmental cost.
As an aside, I still find it funny that USAians think 30mpg is efficient.

test by Anonymous Coward · 2011-09-13 09:51 · Score: 0

Your request timed out. Please retry the request.

hardly the biggest by zeldor · 2011-09-13 09:51 · Score: 2

Amazon's HPC cluster there in Virginia I suspect is way bigger then your little toy..
plus all the agencies.

--
If I could walk that way I wouldnt need cologne.

Re:hardly the biggest by Anonymous Coward · 2011-09-13 12:58 · Score: 0

Yeah, I was gonna say half the Intelligence Community is on the East coast. I gotta believe there are bigger systems than yours when I look at the size of the buildings at Fort Meade, with small parking lots, big fans and tall fences and lots of security.
Re:hardly the biggest by mjwalshe · 2011-09-14 06:57 · Score: 1

well the op doesn't seem to know what hes doing so probably has not heard of "black" systems before - Kids today eh.

the best use? by nimbius · 2011-09-13 09:53 · Score: 1

i dunno...someones still working on cancer i think...and i know a guy whos still trying to find the higgs bozon.
having solved all other problems, maybe dick around on jeopardy?

--
Good people go to bed earlier.

Successful troll... by jtownatpunk.net · 2011-09-13 09:55 · Score: 0

...is successful.

Multiple super-computers instead of a single one? by sisukapalli1 · 2011-09-13 09:55 · Score: 1

You need to specify additional information:

1) What about the data and storage? Many complex applications require vast amounts of data (e.g. climate change models, CFD models, GIS data sets that can complement or take advantage of modeling). Many end users may not be very adept at accessing these data.
2) What about the software? For example, CFD modeling software is very expensive. In some cases, open source software may not make the cut.
3) Does it have to be a single supercomputer? Why not split into multiple supercomputers and merge them as needed? That way, some groups have a more dedicated resource for themselves. The "biggest X ever" isn't as cool as it appears to be.
4) I presume the funds came in as a result of some proposal (using the word informally here, it could even be a one-pager that was sent to the university). The costs should be at least 5k per sever (based on what I've seen recently), so it's 6 million [I'd say 10 million even, unless my weak math is catching up with me]? So, that proposal would have some intended uses already.
5) Leasing it internally (to other groups in the university) may be reasonable -- it may even be a sweeter deal if you allocate a set of 10 or 20 servers for a group, instead of having it as part of a broader account access. You can tell them it is their "own machine".

I say this with no offense meant... I've noticed way too many people for whom the tool or technology seems to be the primary purpose (e.g. I do it using *EJB* or *distributed cluster* or *high availability database*). I spoke to someone that was working on app infrastructure for first responders, and was focusing on IDEs and integration, and his killer app was a download link to a weather channel app! When I mentioned that he needs some apps that really differentiate the system from others, his response was that we can run a contest for the apps. So, please avoid going that route -- in general, the tools are there to solve problems and not the other way round [with all caveats, sometimes the tools have to come first before we even realize what we had before that was very bad].

Well, congratulations on getting to play with 10M. I think I was rambling a bit, but the bottom line is: (a) don't make it one computer unless you can find a reason, and (b) approach different groups and offer the tool/service -- you need to do that till you get some traction.

Need help too! by gtirloni · 2011-09-13 09:55 · Score: 1

I'm also receiving all the parts needed to build a nuclear weapon but I still haven't figured out which one. Any ideas? It must be capable of destroying all trolling in the universe (including the ones that /. accepts as news).

--
none

Re:Need help too! by bedouin · 2011-09-13 15:46 · Score: 1

I would find out which kind is th largest on the east coast, and make a bigger one.

break blu-ray encryption by roc97007 · 2011-09-13 09:56 · Score: 1

Assuming it hasn't already been done.

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.

Re:break blu-ray encryption by julesh · 2011-09-13 19:23 · Score: 1

The presence of BDRIPs on TPB suggests it may have been.
Re:break blu-ray encryption by roc97007 · 2011-09-14 06:12 · Score: 1

You're probably right. Ok, let's break... I'm drawing a blank here... Some other entertainment-based encryption scheme. Any ideas?

--
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
Re:break blu-ray encryption by badkarmadayaccount · 2011-09-17 10:38 · Score: 1

Cablebox FireWire DV.

--
I know tobacco is bad for you, so I smoke weed with crack.

excuse me... by Thud457 · 2011-09-13 09:59 · Score: 1

but destroying the market for bitcoins has a quantifiable societal benefit. Burn down bitcoin's house while you burn in your hardware!

--

the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

Ethernet / Infiniband Tradeoff by Rhalin · 2011-09-13 10:00 · Score: 1

Pros and Cons for each link, and it depends on -which- speed of infiniband / how they are bonded. Infiniband can get quite fast in the right configuration, if you want to spend the money on it, and even then, you could do similar setups with 10GB bonded ethernet, that might be cheaper.

One advantage I think ethernet has over infiniband (and correct me if I'm wrong here, someone) is that infiniband requires a specialized network protocol to use, where ethernet can use standard TCP/IP sockets. This is perfectly fine, since many cluster libraries can use infiniband...but using ethernet would open up your cluster to situations and use-cases where things like MPI may not be appropriate architectures - for instance, some types of cognitive modeling can benefit from the CPU resources available on the cluster, but their architectures don't always bind well to MPI metaphors (and for some programming languages / cognitive architectures, getting MPI to work is non-trivial in a cluster environment).

Re:Ethernet / Infiniband Tradeoff by badkarmadayaccount · 2011-09-17 10:40 · Score: 1

IP runs just fine over IB.

--
I know tobacco is bad for you, so I smoke weed with crack.

security by Anonymous Coward · 2011-09-13 10:00 · Score: 0

Security with the GPU will be impossible.

They rarely have their own IOMMU due to the speed limit they put on DMA, and that opens up the system to major security disasters.

All of the systems I know of have no protection to prevent malicious/buggy GPU code from corrupting system memory and either taking over the node, or (better) just crashing the node.

While I find this highly doubtful.... by xzvf · 2011-09-13 10:01 · Score: 3, Interesting

I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".

Re:While I find this highly doubtful.... by Anonymous Coward · 2011-09-13 10:03 · Score: 1

You are more correct than you realize!
Re:While I find this highly doubtful.... by geekmux · 2011-09-13 11:17 · Score: 2

I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget. I can see a university get a late grant, that had to be spent in 30 days, could only be spent on technology, that can only come out of a pre-approved catalog, and some administrative type that just saw a Top 500 super-computer list with competing university names on it, bring up in a meeting that we should build a super computer, and some grad assistant saying how easy it would be. They found a room with a window in it and ordered a bunch of parts, and will walk prospective students and their parents by it saying "This is the largest super-computer on the east coast".
Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.
Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.
Re:While I find this highly doubtful.... by kcitren · 2011-09-13 11:22 · Score: 2

Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose?
Nope, I never wonder because the answer is obvious. If they don't spend it this year, they won't get it next year.
Re:While I find this highly doubtful.... by Zancarius · 2011-09-13 13:38 · Score: 2

Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.
In some parts of the DoD it's so bad that, due to the way the finances work, if there is unallocated parts of the budget they'll be removed for the following fiscal year, sending everyone into a scramble to spend whatever's left of their budget before the axe drops. It's no secret then that most divisions will then spend exactly their share (or request more) simply so that they don't receive a budget shortfall in the case they actually need the money.
If you think about it, it's really just a symptom of a broken system. Government budgets should probably be based more on need than on historical performance; it makes sense that those divisions who don't really "need" the money this year would be willing to spend it all just on the offhand chance they have a bigger project next year and would otherwise become underfunded.
Also, there was an excellent article on Kuro5hin a number of years ago detailing why bureaucratic red tape in departments like the DoD often lead to spending more rather than less. I can't seem to find it anymore, but perhaps someone with a better memory than I could link it. I haven't any idea how truthful it was, but I recall that it didn't seem all that unusual.

--
He who has no .plan has small finger. ~ Confucius on UNIX
Re:While I find this highly doubtful.... by UnknownSoldier · 2011-09-13 14:12 · Score: 1

Which is total bullshit. The PROPER way to do it at the end of the year is if any money is left over you take decrease the budget by 1/2 the difference for the next year.
How the hell is a department supposed to save money over a few years so that when they need that really BIG purchase that exceed their budget, they actually have some spare cash for it.
Any junior computer scientist can tell you about amortization costs when a vector grows. 2x used to be popular, but 1.5x is a tighter fit so you waste less memory.
Re:While I find this highly doubtful.... by jbengt · 2011-09-13 14:46 · Score: 1

In some parts of the DoD it's so bad that, due to the way the finances work, if there is unallocated parts of the budget they'll be removed for the following fiscal year, sending everyone into a scramble to spend whatever's left of their budget before the axe drops./blockquote.. I've seen the exact same behaviour from departments of private companies who want to maintain their budget; even to the point of urging me to bill for quick phone calls, saying that if they didn't spend their budget for outside consultants they might lose it next year. Of course, the Department of Defense has a much larger budget than they did.
Re:While I find this highly doubtful.... by Anonymous Coward · 2011-09-13 17:08 · Score: 0

I've seen government institutions have unallocated money at the end of some budget cycle, that was so micro-managed that it could only be spent on a certain type of widget.
The reason for the "micro-management" rule in the first place was most likely to prevent spending excess funds for unapproved purchases at the end of the year...
Re:While I find this highly doubtful.... by maxwell+demon · 2011-09-13 17:46 · Score: 2

Which is total bullshit. The PROPER way to do it at the end of the year is if any money is left over you take decrease the budget by 1/2 the difference for the next year.
Even that is a bad idea. If not spending all the money means a decrease in future budget, however tiny that decrease is, there will be efforts to spend that money, even if it doesn't make sense. OTOH, money not spent is money not spent, even if it had been allowed to be spent.
Indeed, it would make more sense to reward those who do not spend all the money, by increasing their next year budget. Of course that extra budget part should not be included in the determination if they were below budget (i.e. if they are above the normal budget, but below the increased budget, they don't get an increased budget next year).

--
The Tao of math: The numbers you can count are not the real numbers.
Re:While I find this highly doubtful.... by robotkid · 2011-09-13 18:00 · Score: 3, Insightful

Ever wonder why the option at the end of every damn Government spending cycle to NOT spend the money is never an option to choose? Like we have to wonder how the hell we ended up trillions of dollars in debt.
Sad to say, I've seen Government "last-minute" spending like this too, but not exactly to this level of magnitude. This is a shitload of money "left over". This may have come from somewhere, but "budget" obviously had nothing to do with it.
Yeah, I used to wonder that too. Then my wife got a job in state government. And the answer became painfully obvious judging by the maximum pace at which stuff gets done even when you have people willing to work hard and important problems sitting right in front of you. If you allowed unspent money to roll over indefinitely, that would create an irresistible incentive to do the cheapest job that won't get you in trouble and then hoard, hoard that money. Heck, you could stretch that 3-year project into a 5-year one by doing it very slowly. You could build up a war chest and use it on pet projects that noone approved. Or you could wait till no-one even remembers the project existed anymore and then embezzle it.
So as inefficient as it is, the blanket rule that all money must be spent the year in which it is allocated is a simple way to increase transparency and accountability across the board. It may even be one of the driving forces anything gets done remotely on schedule in an environment where purchasing a USB cable requires 2 requisition forms, 3 vendor quotes, the signature of your boss (who is in an all-day meeting), your boss's boss (who is talking with legislators today and can't be disturbed), and pre-approval from someone in accounting (who just went on vacation yesterday).
Of course, it would be great if getting the job done on time and under-cost were somehow rewarded. But that's incentivizing success, that's the profit maximizing, the corporate bottom line, whereas the the Gub'ment bottom line is minimizing "embarrassment" (be it from the media, the voting public, and especially legislators on the appropriations committee). You use a Gub'ment bureaucracy for things you can't trust the for-profit world to do on their own, so the service provided has to be somewhat divorced from the revenue stream if you want to ensure more reliable results than just contracting out to a private company. (I'm sure Ron Paul would beg to differ, but then again he also probably enjoys being able drink water out of the tap without getting sick). You wouldn't pay a health inspector, for example, just based on the number of sites inspected per day because that encourages as cursory a job as possible on as many sites as possible. Instead, you set a minimum quota they have to fulfill, and then make it known you'll have their head on a platter if a restaurant shows up in the news for salmonella poisoning the week after you've signed off on it. That's the Gub'ment way. .. .
Re:While I find this highly doubtful.... by Anonymous Coward · 2011-09-14 00:43 · Score: 0

Hey, I never said it would be *easy*... Just *very doable*.
Re:While I find this highly doubtful.... by Anonymous Coward · 2011-09-14 01:14 · Score: 0

This happens in Canada with the non-profit I work at. One year we just gave out bonuses because we couldn't find a more constructive use for the money that would allow us to spend it fast enough to not affect next years bottom line.
Re:While I find this highly doubtful.... by Shotgun · 2011-09-14 06:42 · Score: 1

The way to fix the system is to pay an accountant on commission to find ways where the money was wasted. Give them full access to all the books, and 10% of what they find.
You don't just give ANYONE someone else's money and expect them to do the right thing. You include oversight from someone that is able to police it.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba

Re:Wait a minute... by GREY_LENSMAN312 · 2011-09-13 10:01 · Score: 1

are you with the government?.......whose?

And they are going to let you administer it? by Anonymous Coward · 2011-09-13 10:04 · Score: 0

Anyone competent should be able to find answers to most of those questions.

As a cluster admin myself.... infiniband!!! by Fallen+Kell · 2011-09-13 10:06 · Score: 2

I can not stress this enough. As good as 10gb ethernet is, the latency is still horrible compared to infiniband.

As for distributions, really, that depends on what you are doing and how your current applications are built/designed. Rocks cluster is fairly nice. Unfortunately we have not been able to deploy that due to our FOSS policies, which have really been hurting this project. So we have a mixed Red Hat and Solaris cluster using Grid Engine.

--
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"

Re:As a cluster admin myself.... infiniband!!! by Anonymous Coward · 2011-09-13 16:50 · Score: 0

Not true anymore. The latency for usermode networking stacks like solarflare's openonload are competitive with infiniband (when combined with cut-through switching, i.e. arista). A well done implementation that works with an ld preload to accelerate apps written with standard sockets networking.
check out the code: http://www.openonload.org/
and one of the founder's blogs
check out http://10gigabitethernet.typepad.com/network_stack/2010/05/openonload-for-mpi-.html
fwiw, not affiliated but a happy user of their products after working with infinband (mellanox products)

Several projects in mind! by Anonymous Coward · 2011-09-13 10:06 · Score: 0

- Check if the Ultimate answer is really 42. Will take some time.

- Simulate the evolution of the US economy with and without stimulus. Ideal for Dem and Rep (depending of the result)

- Simulate the evolution of the US if the military budget were use for public health and education.

- Simulate some earthquakes and damages. (for insurance companies)

- Simulate a black hole. (for some astrophysics research)

- Simulate the chances of a non-educated young teen becoming a new rich rapper.

- Crunch numbers for SETI like projects.

- Look for patterns in NYSE.

- Estimate the public debt in 2020 if the politicians keep spending exponentially with some never ending wars in between.

- Calculate the mandelbrot set with super precision.

- Program a chess game.

The choice is obvious by gujo-odori · 2011-09-13 10:09 · Score: 1

Pr0n.

Re:Wait a minute... by 93+Escort+Wagon · 2011-09-13 10:09 · Score: 1

Wait a moment here. You're this close to receiving your hardware and you don't even know what O/S you're planning to use, what interconnect to choose, or what problems you intend to solve with it? Where do you get funding like this?

Yeah, I think we need more specific info here. I can't see any way a group would attract funding without spelling out all these items... however the submitter doesn't actually refer to funding, he states "I will be receiving everything necessary to build ...". What does that mean, exactly? Did he just buy hundreds of 386-based machines off the scrap heap? And, more importantly, does this person's supervisor know he apparently seems to think this is his own personal playground rather than a professionally run system?

Or maybe we have it all wrong. Reading between the lines, I immediately assumed he worked for an educational institution or a pharma company - but he doesn't say anything like that. For all we know this guy works for one of those rich pseudo-scientists... the kind of dabbler who has an "institute" with his own name in the title, and the mention of whose name makes real scientists roll their eyes. We just don't know enough.

--
#DeleteChrome

Total BS by friedmud · 2011-09-13 10:10 · Score: 1

I work with some of the largest supercomputers in the world... and I can tell you that this is BS. There is no way this guy got someone to give him enough cash to put this together without:

1. A Plan of what to buy / build
2. A sound reasoning behind what would be done with the machine.

Beyond that... that isn't even that large of a cluster. There are numerous computers on the east coast larger than that... at universities and government research labs (i.e. http://www.nccs.gov/computing-resources/jaguar/ although maybe he doesn't consider Oak Ridge to be on the "East Coast").

Re:Total BS by Anonymous Coward · 2011-09-13 15:58 · Score: 0

Okie Mr. Attitude, let's talk about Jaguar, shall we?
There is no way ORNL got DOE Office of Science to give them enough cash to put Jaguar together because:
1. ORNL had no plan of what to buy / build; As in Spider didn't arrive *with* the machine for instance, wasn't even much more than a dream when the machine was installed. They had no clue what to do relative to a file system when delivery of Jaguar began.
2. A sound reasoning behind what would be done with the machine; Still doesn't, Jaguar spends significant amounts of time on codes from INCITE grants, no? The workload isn't and never was determined until *after* grant awards. Something not even started for that machine until after it was installed.
Hey, but wow, DOE/OASCR *did* give ORNL the money. Seems an indisputable fact, even. Maybe something is wrong with your thinking about his machine then? Could be? Maybe?
In reality, it turns out that more than few of the DOE machines (both OASCR and NNSA sides of the house) are both ill-specified and incompletely specified. Many things are worked out on the floor, after they arrive.
Finally, Oak Ridge is in Tennessee. When I was there two weeks ago it was still a land-locked state. Perhaps North Carolina has fallen into the Atlantic since then and Tennessee now has ocean-front properties for sale? Otherwise, I sure as hell don't consider it to be on the "East Coast".

Advertisement or trolling? by Anonymous Coward · 2011-09-13 10:11 · Score: 0

Is there a difference nowadays?

Dear infiniband, stick it up you know where.

With articles like this by Anonymous Coward · 2011-09-13 10:11 · Score: 0

I can see why Taco left. Maybe he was on to something.....

obligatory by rust627 · 2011-09-13 10:11 · Score: 1

Running Windows 8 .......

--
da da da dum indeed.

Wow... by Anonymous Coward · 2011-09-13 10:12 · Score: 0

You're two week away from taking receipt of this gear and you haven't yet planned these things?

Either you're lying out your ass about the gear you're getting or one of us might have the chance to get a killer deal on some HPC gear at your bankruptcy liquidation....

Re:Wow... by GameboyRMH · 2011-09-13 10:33 · Score: 1

Makes me feel like an idiot for not applying to a job working on a small cluster used for climate research. I didn't apply because I didn't have any HPC knowledge.
If I knew I could get hired without knowing shit I would've given it a shot!

--
"When information is power, privacy is freedom" - Jah-Wren Ryel

Re:Wait a minute... by ArsonSmith · 2011-09-13 10:15 · Score: 1

Yea, I almost got caught by that super computer in the impulse buy section at the drugstore checkout too.

--
Paying taxes to buy civilization is like paying a hooker to buy love.

two weeks away, and you still haven't spec'ed... by capsteve · 2011-09-13 10:20 · Score: 1

two weeks away, and you still haven't spec'ed all your hardware?
c'mon, this is a put on!
if you're getting this monster installation, you would have spec'ed all aspects of the hardware, including 10gb and gpu's and OS months ago.

--
three can keep a secret, if two are dead - benjamin franklin

Maybe by arbulus · 2011-09-13 10:28 · Score: 1

Giving the benefit of doubt, I'm assuming that you mean that you have a purpose, but have spare processing power and would like to put it to use. In that case I would recommend maybe seeing if you could help out with Folding@home, SETI@home or CERN distributed computing.

Yeah 6 core is so 2007 by cheekyboy · 2011-09-13 10:29 · Score: 1

I agree, you Gita go for the big ass servers with amds new 12 core CPUs x 4 , 48 cores per box is hell nice . And it takes 1/8th the space and power.
You only need 150 boxes not 1200 boxes

--
Liberty freedom are no1, not dicks in suits.

Imagine Beowulf of those! by porky_pig_jr · 2011-09-13 10:33 · Score: 1

Come on, folks. Is that Slashdot or what?

Re:Imagine Beowulf of those! by blair1q · 2011-09-13 10:40 · Score: 2

I was imagining partitioning it into an enormous brigade of heterogenous virtual machines, then hooking those up as a Beowulf cluster.
Re:Imagine Beowulf of those! by Anonymous Coward · 2011-09-13 12:43 · Score: 1

I was imagining Natalie Portman.
Re:Imagine Beowulf of those! by confused+one · 2011-09-13 13:10 · Score: 1

but it is a Beowulf...

Better than SETI by tomhudson · 2011-09-13 10:35 · Score: 1, Offtopic

Help everyone here on earth

Generate every single possible combination of software or business method patent, and break the patent office once and for all.

Teaching.. by Anonymous Coward · 2011-09-13 10:38 · Score: 0

.. my cats to paint like the Masters!

troll by rish87 · 2011-09-13 10:39 · Score: 1

As somehow who works with supercomputers, I have serious suspicion about this . You do not get the funding for a supercomputer of this size without knowing these basic specifications. How can he be getting "everything necessary" in two weeks when he doesn't have a planned network, GPU's, OS or application? There is so much effort that goes into speccing these clusters, building them and then installing and configuring all of the administrative software such a queuing systems. Hell, if you're doing HPC work on supercomputers, you need an equally impressive storage solution to contain all of the data. It isn't a matter of sticking a sata hdd on each node and calling it a day.

Do us a favor by Anonymous Coward · 2011-09-13 10:43 · Score: 0

Some of us considering H/W upgrades to run this new Windows 8 and we will be immensely thankful to you if you could tell us if this new Windows 8 can run OK on it. Be honest about the results.

Realistic Simulations... by Anonymous Coward · 2011-09-13 10:44 · Score: 0

By which I mean furthering the science of computer generated porn to lifelike qualities. Then hopefully using that power to create scenes with your coworkers that will leave them traumatized and cowering in a corner. And even more afraid of clowns.

Re:Wait a minute... by Anonymous Coward · 2011-09-13 10:50 · Score: 1

I would guess that he's a Somali Pirate, and they've just hijacked a container ship from China... Several hundred containers worth of computer components.

Sounds like the only reasonable conclusion to me...

I have an idea... by Anonymous Coward · 2011-09-13 10:55 · Score: 0

Why don't you sign over your salary to me, and I'll do all the work for you? Since, you know, that's kind of what you're asking Slashdot to do.

"Dear Slashdot, if you were to build a computing environment to support a 300-500 person business, how would you go about it? Please be specific, with commands and actual configuration options, too! Really show me that you know what you're talking about!"

Well...let's see, what would I do? by certain+death · 2011-09-13 10:55 · Score: 1

Play Crysis! :o)

--
"My immediate reaction is "WTF? What kind of moron doesn't make things 64-bit safe to begin with?" Linus

Game show answer... by lopaka1998 · 2011-09-13 10:56 · Score: 1

Trebek: The best use for a new supercomputing cluster.

(buzzer of Contestant #3 triggers)

Trebek: Yes, Contestant #3.......

Contestant #3: What is a mega porn torrent server?

Trebek: Correct for $1,000.00

Run two million ELIZAs by Pf0tzenpfritz · 2011-09-13 10:57 · Score: 1

and let them drive one instance of Cleverbot insane. That will teach him being a smartass at Turing tests.

--
Oh, the beautiful gloss of greality!

Have a look at Red Storm by Anonymous Coward · 2011-09-13 11:00 · Score: 0

http://en.wikipedia.org/wiki/Red_Storm_%28computing%29

Closing the barn door? by Have+Brain+Will+Rent · 2011-09-13 11:10 · Score: 1

Not quite the perfect analogy but close enough. Seems to me that these questions should all have been answered before a single piece of hardware was ordered.

--
The tyrant will always find a pretext for his tyranny - Aesop

Cluster software & GPU experence by PAPPP · 2011-09-13 11:11 · Score: 5, Informative

I assume this is an epic troll, but am going to give an honest answer anyway, because there are some legitimate questions buried in there.

I work with a aggregate.org a university research group which has a decent claim to having built the very first Linux PC Cluster, set some records with them (KLAT2 and KASY0 were both ours), and still operates a number of Linux clusters, including some containing GPUs, so I feel like I have some idea of the lay of cluster technology. It is *way* overdue for an update (and one is in progress, we swear!), but we also maintain TLDP's widely circulated Parallel Processing HOWTO, which was the goto resource for this kind of question for some time.

In a cluster of any size, you do _not_ want to be handling nodes individually. There are several popular provisioning and administration systems for avoiding doing so, because every organization with a large number of machines needs such a tool. The clusters I deal with are mostly provisioned with Perceus with a few ROCKS holdovers, and I'm aware of a number of other solutions (xCat is the most popular that I've never tinkered with). Perceus can pass out pretty much any correctly-configured Linux image to the machines, although It is specifically tailored to work with Caos NSA (Redhat-like), or GravityOS (a Debian derivative) payloads. Infiscale, the company that supports Perceus, releases the basic tools and some sample modifiable OS images for free, and makes their money off support and custom images, so it is pretty flexible option in terms of required financial and/or personnel commitment. The various provisioning and administration tools are generally designed to interact with various monitoring tools (ex. Warewulf or Ganglia) and job management systems (see next paragraph).
Accounting and billing users is largely about your job management system. Our clusters aren't billed this way, so I can't claim to have be closely familiar with the tools, but most of the established job management systems like Slurm, and GridEngine (to name two of many) have accounting systems built in.
The "standard" images or image-building tools provided with the provisioning systems generally provide for a few nicely integrated combinations of tools, which make it remarkably easy to throw a functioning cluster stack together.

As for GPUs... be aware that the claimed performance for GPUs, especially in clusters, is virtually unattainable. You have to write code in their nasty domain-specific languages (CUDA or OpenCL for Nvidia, just OpenCL for AMD) and there isn't really any concept of IPC baked in to the tools to allow for distributed operations. Furthermore, GPUs are also generally extroridnarly memory and memory bandwidth starved (remember, the speed comes from there being hundreds of processing elements on the card, all sharing the same memory and interface), so simply keeping them fed with data is challenging. GPGPU is also an unstable area in both relevant senses: the GPGPU software itself has a nasty tendency to hang the host when something goes wrong (which is extra fun in clusters without BMCs), and the platforms are changing at an alarming clip. AMD is somewhat worse in the "moving target" regard - they recently deprecated all 4000 series cards from being supported by GPGPU tools, and have abandoned their CTM, CAL, and Brook+ environments before settling on OpenCL, and only OpenCL. Nvidia still supports both their C

Re:Cluster software & GPU experence by Savantissimo · 2011-09-13 12:36 · Score: 1

"Note that the difference between the special compute hardware ("Tesla" and "Firestream") and consumer cards tends to be that they have a little more memory, and are enormously more expensive , so the consumer cards are way ahead in terms of FLOPS per dollar. "
But if you need 64-bit performance the consumer cards are crippled - there is a factor of 2-4 difference compared to the Teslas and Quadros. Only the Teslas have ECC, which may be essential for some applications. The extra memory on the pro series cards makes a big difference for many applications, too. 6GB is a lot more than 2GB. Since accessing the main memory rather than the GPU memory is so much slower and splitting tasks across cards on different nodes is such a bitch, the extra expense for a card that can fit the whole problem in its own memory may be worth 5x the cost for 3x the the memory and 2x the 64-bit performance. In a cluster where you plan to split applications across multiple graphics cards on different nodes despite all the hassle entailed, that may not be enough of a selling point, though. If 32bit FLOPS are the only metric, then consumer cards can't be beat.

--
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
Re:Cluster software & GPU experence by Anonymous Coward · 2011-09-13 15:54 · Score: 0

I've much less experience building and administering a much smaller (18-node) cluster, but will repeat the advice of the parent. Perceus and Slurm together have proven to be a very reliable combination that's required very little maintenance. Slurm is overkill for us, but has been used to run huge clusters elsewhere. The choice of CaosNSA vs CentOS or RH seemed, at least 2 years ago when I built everything up, to come down to ease of OS installation vs. ease of cluster installation. CentOS appears more up-to-date and has nicer up-front installation utilities, while CaosNSA has better utilities to build the core (Perceus + MPI) clustering applications.

IMO by Osgeld · 2011-09-13 11:21 · Score: 1

It sounds like they choose the wrong person to handle this.

A) you don’t know what os to run on it
B) you don’t know what to use for the network
C) you don’t know what GPU's to use
D) your asking slashdot, which a significant portion of users will tell you Apple

Do you need help using a screwdriver as well?

Yes, this is legit and no, we're not idiots by Supp0rtLinux · 2011-09-13 11:28 · Score: 5, Informative

For everyone that thinks I trolled slashdot... here's the quick backstory behind my question(s): Our organization received a grant to pay for this from a private philanthropist that has a medical issue that is currently being researched by one of our labs (this happens to us not to infrequently). We have an existing HPC of roughly 300 nodes and 1200 cores that's all 1Gbps connected and running Rocks 5.1. The grant money came in in two different payments. We used the first payment to buy the nodes (which are in route to arrive in 2 weeks or so). The second payment was going to pay for the GPU's and the extra infrastructure (storage is one thing we currently have plenty of... both SAN and NAS). Unfortunately, we hit two issues: 1) one of our more seasoned enterprise admins took a new job at Apple's new NC datacenter and 2) our cluster admin passed away from a heart attack about a week after the purchase was made. This put us into a bit of a holding pattern. We're in the process of replacing both of them, but in the meantime we A) have the equipment arriving soon and B) have the second round of the grant money in hand now. We're smart enough to know that we lost two very valuable resources and we decided to step back, pause, and re-evaluate. The servers are already bought. The infrastructure, interconnects, and GPU's are not. The old admin knew which GPU's he wanted; unfortunately we haven't found his research anywhere to know what and why. He had also planned to go with the latest release of Rocks, but only because he was very familiar with it. We know there are other options out there and we've no idea how well Rocks can scale. Additionally, I don't see an option for chargeback with Rocks (at least not from a Google search), plus we've heard they recently lost a core developer. Thus, we went to the Slashdot community for advice. So I've already seen some good info on the IB versus 10GbE question and its much appreciated. We're still looking for info on which Linux distro and which GPU to go for. We want to make the best decision we can and use the money as wisely as possible. But we also realize that we know what we don't know and thought the Slashdot community could provide some experience to help us make the right decisions.

Re:Yes, this is legit and no, we're not idiots by rish87 · 2011-09-13 11:50 · Score: 2

Okay apparently you aren't trolling but you have to understand people's suspicions. I understand you've lost key people, but still, these sorts of decisions are important for initial phases of the design that everyone should be aware of. A few suggestions: If you are running a lot of smaller parallel jobs that do most of the computation within the same node (more of a SMP parallel vice mpi) then you may get away without using 10gbe unless you are also moving a lot of data through the network for storage. If you are doing a lot of cross-node computation among a lot of different jobs, or especially in very large cross node jobs, you are going to want IB. IB is very expensive, but there is a reason almost all of the top supercomputers use it. Depending on your application, you may be able to get away with 10 gbe, especially if IB is too expensive. If you are adding GPU's (go with NVIDIA. throw teslas in there if you have the money) you will most likely want IB as well. HPC code I help develop has CUDA ability, and once you start to feed huge datasets to the GPU's across the network, you are going to need IB level speed and throughput. If you are only doing GPU computation within the nodes, this won't be necessary. Basically if money isn't an issue, go with IB and NVIDIA teslas. If money is an issue, GTX 580's and 10gbe will probably be fine. I would be hesitant on using anything less on the networking front. As for OS, take a look at scientific linux.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 11:52 · Score: 1

First of all, you're screwed until you get some people who know the answers to these questions already. Don't do *anything* until you hire those people; you'll be wasting valuable time.
Second, if you already have infrastructure that works, use it. If not, use the most widely used solution for the particular problem you have. Don't dick around with specialized shit.
Third, Slashdot is a wasteland of uninformed dweebs. Ask a forum that actually has something to do with HPC.
(also, the guy who mentioned "redhat 6.1"? please do not go download redhat 6.1. it's like 15 years old. he meant RHEL 6.1)
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 11:59 · Score: 0

Rocks can scale ok, debian with MPI/PVM etc (DCC) is less of a pain to manage, but if you have rocks by all means use it. The GPU that works well with rocks is the NVIDIA Tesla M2090. if you need help and can fly me out from the west coast i can help with cluster setup etc. Have been doing em for the last 14 years. just reply back with a real email addy and i can email ya.
-Ys-
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:10 · Score: 1

Now post a link to job description so we can apply to fill in the hole!
Re:Yes, this is legit and no, we're not idiots by the_demiurge · 2011-09-13 12:11 · Score: 1

Contact other researchers who work in facilities similar to yours. I'm sure you collaborate with other labs that use HPC resources. Talk to them and get some information about how they do things there.
Personnel is really the key ingredient of HPC. Find someone, hire them quickly and give them the decision of what to buy.
Re:Yes, this is legit and no, we're not idiots by guruevi · 2011-09-13 12:12 · Score: 1

As far as distro goes, usually a Red Hat is chosen if you want to pay the money for support, Debian because it's stable or Scientific Linux because it has a certain packet you need, Gentoo or something similar if you need to run a very specific application and need the bottom of the can for performance. Ask your users what they're using and go with the platform that has the widest support.
As far as platform, that's harder to answer. Are they batch jobs - SGE (Sun Grid Engine) is very good and very extensive but I'm unsure what the status is as far as Oracle. Torque and Maui are also widely used. Does it need to look like a single machine then openMosix (or whatever it's called these days) or OpenSSI.
Networking is hard to answer. They both have their up and downs and really depends on the use cases. I would recommend InfiniBand if you can bear the costs but a well-implemented 10GbE will do very well too depending on the jobs (is there a lot of node-to-node communications, then go with IB, an existing or cheaper cluster can more easily be extended with Ethernet).
GPU - hands down nVidia. Sorry ATi/AMD but most researchers use CUDA and both of them do OpenCL. There are also 1U GPU cards from nVidia that are widely supported. ATi on Linux is still unstable, even for compute (although my experience is purely anecdotal with a small cluster). Do you have to buy Tesla cards? Not necessarily. Maybe you should have some for the amount of memory they have but the gaming gear will do just as well with properly programmed or small jobs - also, all GPU memory is mapped into RAM so make sure if you have 3GB on your card you have at least 3GB + however much your base configuration is.

--
Custom electronics and digital signage for your business: www.evcircuits.com
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:15 · Score: 0

I haven't used Rocks in a while; however, last I checked, they didn't have a reasonable way to do any kind of hardware management (IPMI, SoL, etc). Given this, I would highly suggest you look into xCAT (xcat.org). It takes a little bit to get familiarized with it, but once you do -- I think you'll find it's quite powerful. It originated as an AlphaWorks projects from IBM. It is regularly maintained and I've found it to be excellent. You can read about the accolades, but suffice to say it's running some of the largest machines in the world. No disclaimer required -- I'm just a satisfied user. I would suggest CentOS 5/6 for your OS, again, you can read specifics -- but it's fairly widely used in this arena.
As always, answers to your questions will highly depend on the goal of the machine -- and when I say 'goal', I mean -- what is the nature of the research and what kind of problems are you trying to solve? Are they: embarrassingly parallel, MPI-based, io-bound, etc? I've found that almost anything in this realm (particularly of this size) will require *lots* of customization. I don't know of anything 'off-the-shelf' that is going to meet all the needs you have (please note, this doesn't mean it doesn't exist -- it just means I'm not aware of it). Since you guys find yourself in such a pinch, bringing in a consultant, or frankly, a team of them from IBM or the likes might be the best option. Once you start the charge back model, I imagine that people will expect an SLA -- and things really change at that point. Best of luck.
Re:Yes, this is legit and no, we're not idiots by enjar · 2011-09-13 12:17 · Score: 2

So it seems you are still far adrift. I'd seriously not spend another penny until you understand what you are really doing. Otherwise you could dump a serious pile of money on hardware that won't solve your problem ... and I'll bet if you look at what was wrong or you didn't like with your old HPC setup, you'll get the answers to lots of your questions.
It seems odd that you got a grant for ... something but you still are trying to "recoup costs". Also, I do understand you lost two key people, but you didn't need some sort of business case, schematic, problem statement, architecture diagram or grant proposal that you could use to figure out the answers to some of these questions? If someone granted you $M for doing research for something, then it seems that you should be concentrating on doing that research first -- and figuring out what to do with any slack time otherwise. Perhaps instead of trying to charge someone money you could find other groups that do similar research and give them time on your cluster as a gift?
As for real-world advice, keep in mind the "customers" of your cluster. I'm going to take a wild guess that they aren't geeks who want to play with Linux distros, they are likely researchers working on their research. It's also highly likely that they already know how to get their submissions into the cluster, analyze results, and so on -- they have a "workflow". Take care with disruptions to this workflow. Also, it might make a lot of sense to actually talk to the people who do this work and ask them their opinions on what they like and don't like about the existing setup -- which can be turned into requirements that can drive the spec for the remaining equipment. It's certainly going to give you a higher chance of success than asking Slashdot .. and you also have a golden opportunity to step up to fill a leadership void. That kind of thing gains you enormous credibility if you do it well.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:20 · Score: 0

Given where you are, you definitely want InfiniBand. At 1200 nodes, you will get much, much better performance and probably break-even or better cost than 10GigE. To handle your management problem, you want to put all of those nodes into a batch pool. Users submit requests into the batch system and the batch system (e.g. PBS, Sun Grid Engine, etc) allocates the nodes to users. It will track usage by user for you - regardless of the Linux distribution.
But, you have a long, long road ahead of you. You need appropriate compilers and appropriate libraries and file systems and MPI and... you really, really need that admin replacement or your life is going to suck.
Re:Yes, this is legit and no, we're not idiots by hackstraw · 2011-09-13 12:25 · Score: 2

If you want to hire me send a mail to hpc.hackstraw@spamgourmet.com. Expert in the field.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:26 · Score: 0

Considering the scale of the project and the flexibility you're looking for, my advice would be to go with Red Hat Enterprise Linux and pick the brains of their people for the best way to go regarding software and setup. While a support contract costs money, it's probably the best way to go considering the loss of human assets at your organization. In other words, they have answers and you need someone professional to give you answers. Slashdot is not a good place to get answers for issues this big. ... Or any issues, really.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:26 · Score: 0

Regarding the GPU, I agree that NVIDIA is certainly the way to go as it's software stack is far more mature and supported, both by NVIDIA and by a growing HPC community.
Another poster recommended the Tesla M2090s. Those are indeed the fastest HPC cards available. However, the "M" designation means that it has a passive heat sink rather than a fan and heat sink combo for cooling. The upshot is that your servers must include the necessary high-speed fans for cooling. Check with the node vendor to see if they are supported. Otherwise you may have to go with the slightly slower Tesla C2070s, but you should also confirm that these will work.
Note that the Telsa cards aren't particularly cheap: e.g. the C2070 currently go for about $2500. If you are and .edu (or possibly nonprofit) inquire about educational discounts. If these are too pricey to justify, gamer cards (e.g. GTX 580) are much cheaper and can provide similar performance for *some* types of problems, with caveats:
1) Tesla card provide more GPU memory: 6 GB vs. 1.5 to 3 GB (important for some problems, like bioinformatics, less so for others, like molecular dynamics)
2) Tesla cards have ECC memory, which avoids random bit flips, but using ECC mode can yield a significant performance penalty.
3) Tesla cards provide much higher performance for double-precision floating point operations. Single-precision FLOPS are comparable.
4) Tesla cards are better tested, supported, and warranted. They should die less frequently. That said, the price differential can buy *alot* of replacement cards.
5) The power draw and thermal output of the GeForce cards can be higher -- they weren't designed for servers.
6) Tesla cards can communicate with others on the same PCI bus without going through CPU memory.
In general, if you can afford them, I would recommend going with the Tesla series as you will have better support and fewer headaches.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:27 · Score: 0

You could instead try asking a handful of separate questions on serverfault.com, where (as long as you briefly explain your situation) people will take you seriously, and there won't be so much, shall we say, extraneous discussion.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:31 · Score: 5, Funny

"I've got 1200 servers shipping to me and my two best engineers are gone and we're not sure what to do with them when they get here."
Best. IT horror story. Ever.
Re:Yes, this is legit and no, we're not idiots by byteherder · 2011-09-13 12:31 · Score: 2

If you are serious, go the SuperComputing 2011 conference. Pretty much all the supercomputing geeks hang out there and you can get all your question answers by experts.

As for whether to go with IB or 10GbE, go with IB if you can afford it. IB has a bunch of advantages faster bandwidth, lower latency, but you pay for it in price.

Good Luck.

byteherder
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:37 · Score: 1

Sorry, mate, but you still read like a student trying to answer hypotheticals on a midterm paper.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:40 · Score: 0

It Depends a lot on what jobs you are running. How scalable are your codes? What do they use to express parallelism (MPI? OpenMP?) Do you need a parallel filesystem? Do you have to fit with existing infrastructure?
If your codes are small scale ( 1000 cores) and not super bandwidth intensive, a 10 GigE network might work, if you have large parallel codes, go IB. If you are running molecular dynamics or other 3D physics codes, 3D mesh topologies tend to work nicely, but if you are not then a tapered fat tree might be best.
Generic Linux works fine for small clusters, but if you are trying to run a single job across the whole machine, you might want to consider a lightweight kernel like IBM uses on Blue Gene or Cray uses on their XT series.
Tell us more about your problem other than "I have a lot of money to spend" and you might get more precise results.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:45 · Score: 1

Due to the OFED stack required for InfiniBand, 10GbE is a tad easier to administrate, but as said: go for InfiniBand, it's faster and performs much better (especially in the 40Gb flavor).
As for the distro: chose whatever you feel familiar with. The real question is the batch scheduler. Oracle Grid Engine is what most clusters run today. It's not bleeding edge, but stable and allows for accounting, which is what you want if you wish to rent out your machine.
GPUs: in scientific computing NVIDIA hardware dominates. One reason is reliability (the Tesla cards come with ECC, which you'll really want on any larger installation). The other reason is that both, NVIDIA hard- and software are well suited for most workloads (AMD GPUs only dominate when mining for Bitcoins). You should go for Tesla C2050. They are much cheaper than the C2070 or M2090, but not much slower. The main difference is that they have less RAM, but at the current price it's always better to by two C2050 than one C2070 or M2090.
What type of uplink to the internet will your system have?
Re:Yes, this is legit and no, we're not idiots by Savantissimo · 2011-09-13 12:49 · Score: 1

The Tesla 2090 is the best thing on the market, but the 2070 is 85% as good for 65% the $.

--
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 12:54 · Score: 1

Our organization received a grant to pay for this from a private philanthropist that has a medical issue that is currently being researched by one of our labs
Steve Jobs gave you how much funding?!
Re:Yes, this is legit and no, we're not idiots by bill_mcgonigle · 2011-09-13 13:02 · Score: 2

Steve Jobs gave you how much funding?!
And then hired their sysadmin out from under them? No.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:Yes, this is legit and no, we're not idiots by Junta · 2011-09-13 13:04 · Score: 1

One more thing, while perusing slashdot comments is better than some places, you may want to repeat your inquiries in more tightly focused communities, like ROCKS mailing lists an/or xcat-user@lists.sourceforge.net
Those are audiences that live and breathe this stuff, many of whom may be in your area and even open for employment opportunities if you are looking to backfill.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 13:05 · Score: 0

Please provide shipping costs to my cousin Thatsa Fos Hur the African ambassador banker in order that we can assist your philanthropist.
Re:Yes, this is legit and no, we're not idiots by bill_mcgonigle · 2011-09-13 13:05 · Score: 1

The old admin knew which GPU's he wanted; unfortunately we haven't found his research anywhere
Did he wipe his browser history on a regular basis?
If not, you'll probably find him having looked at a large number of products, then as time approaches current, lots of page views on the details of operating one particular product.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 13:13 · Score: 0

You're going to HAVE to know two things:
1. What you're going to run in the cluster
2. What are your requirements for the O.S.
The OS vendor can help you with the possible hardware choices. Do *NOT* accept anything that is not stable. If you need MPI and RDMA, you will have to suffer IB, it is that simple: the rest is unstable, unproven stuff and the software stack will get you. If you have doubts about it, ask the OS vendor and netdev@kernel.org about the stability of the drivers for low-latency ethernet (ethernet RDMA), what you cannot trust is the marketing of the people behind the tech. And IB does a lot of stuff better than ethernet (and even if it is a driver issue, you're at least two years away from feature parity), are you going to benefit from FCoIB, for example?
Also, the NICs are *not* created equal, you actually get extra features (like DCA - direct cache access) in an Intel-based server with appropriate Intel 10Gbe NICs, for example. Nor are the NIC drivers created equal: you need at least two queues per CPU core for best performance, for example... The onboard stuff is usually the cheaper Intel or Broadcom chipsets that are severely limited when compared to the best 10GbE NICs, but works well enough for standard file-server needs. However, you're not building a fileserver cluster, so you have to access what you will need the network _for_: large dataset throughput? large packet throughput? low-latency packet exchange? iSCSI? FCoE?
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 13:15 · Score: 2, Informative

Frequently in academic settings this is not an option. Grant money for equipment is not transferrable to personnel.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 13:18 · Score: 0

I would stay far away from Rocks, we had nothing but trouble with it at my last company. I personally do not like Rocks, but it has its merits. I've been around rocks for the better part of the last decade. Too brittle. I know we probably talked it up quite a bit when you bought your first HPC from us years ago (if you are who I think you are.) Of course that company is no longer around :-)
I would use the lastest release of CENTOS, using the OSCAR suite or something similar. Really depends on your need. Rapid provisioning, alternate OS versions, etc.
For usage billing, LSF includes this functionality, but its a commercial product. SGE and others can provide similar tools, however LSF is a great tool for a system this size.
Infiniband vs 10 gigE isn't really comparable. They are for difference purposes. IB is for low latency inter process communication. 10 gigE is for high bandwidth situations.
If you are running any sort of parallel software on this, any sort of MPI stack with heavy internode messaging, go with IB. Its latency for low bandwidth high speed MPI is very good, not so much with ethernet.
Run standard gigabit ethernet for the management network, and IB for MPI.
Slashdot is NOT the place to get these answers though. You are in a tiny niche world here with your HPC. The issues you face are not well known by very many people who don't build/use them. I have built many of the largest HPC's in existence. Hopefully you don't end up with a jack of all trades, master of none HPC as many home grown clusters turn into. A system this size really really needs good experienced help in making these decisions.
You really need some hooks into the HPC underworld for this kind of advice. Its not free though :-) (as you can see you got us HPC consultants crawling out of the woodwork. I won't pitch you though. Just offering some friendly advice.)
Re:Yes, this is legit and no, we're not idiots by Zebthepilot · 2011-09-13 13:27 · Score: 1

a bit of that would have been useful in the original question, I too, thought this was a troll :D

--
http://www.zebpalmer.com
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 13:33 · Score: 0

best comment i've read all night...
Re:Yes, this is legit and no, we're not idiots by codepunk · 2011-09-13 13:36 · Score: 1

The first thing you will need to fix is the admin problem and a couple of experienced cluster admin is not going to be cheap.
As for the distro question unless you need something really specific if it where me it would be centos.

--

Got Code?
Re:Yes, this is legit and no, we're not idiots by KainX · 2011-09-13 13:44 · Score: 2

The first thing you need to do is realize you all are in over your heads. If you're desperate enough to post to Slashdot for help, you're already there.
The second thing you need to do is look for a consultant to help you out until you can hire permanent help to fill your vacant positions. I can strongly recommend R Systems (http://www.rsystemsinc.com/). It's run by former NCSA HPC gurus. I've worked with them many times; they have the know-how you need to salvage this mess in short order. You can't call them quickly enough; trust me on that.
Third, to answer some questions. The IB vs. 10GbE debate has been pretty well covered, but just to emphasize: if you need low latency (for tightly-coupled massive parallel processing), you *need* IB. Preferably QDR or FDR. For your core switches, go for a blade-style chassis whose backplane can handle FDR even if you opt for QDR for now. If it can handle EDR, even better, but I'm not sure those are shipping yet. FDR IB data rate is 56Gb and latency in the nanoseconds. Ethernet can't touch that yet.
All the scientists working with GPUs here are using nVidia. We've got 2050s and 2070s, so the 2090s are probably the right choice at the moment.
For management, xCat is by far the most scalable solution available right now, though we're working on an alternative. ROCKS does not scale well, largely due to its stateful nature. I'd caution you against using Scyld ClusterWare; it's based on BProc AFAIK, and as one of my friends is the former BProc maintainer, I can tell you that even *he* won't touch it with a ten-foot pole any more. It's too hairy and errorprone; it's also almost impossible to debug. Use something stateless and powerful but still relatively easy to maintain. Most of the large-scale shops (national labs and large academic sites) I know of use xCat or Perceus. Here at LBNL we use both xCat and Perceus with great success.
For Linux distribution, using RHEL or a clone. I'd recommend Scientific Linux 6 at this point. It's the best-run and most professionally-maintained of all the clones.
HTH. Good luck, and condolences on your recent loss(es).

--
Michael Jennings | HPC Systems Engineer, Lawrence Berkeley National Lab | Author, Eterm (eterm.org)
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 13:52 · Score: 0

op is a fag
Re:Yes, this is legit and no, we're not idiots by cherry-blossom · 2011-09-13 14:27 · Score: 2

Mod this up. This sounds more like a personnel problem than a hardware/software problem. Get the right people into those vacant positions and let them make the decisions for you. Don't spend any money on hardware or software until those positions are filled.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 14:35 · Score: 0

Craigslist it, buy a Cray. You're in way over your head if you think you can give your users something workable with a pile of PCs.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 14:40 · Score: 0

Charge-back is done via the resource manager, not the operating system, and is usually a pretty custom mechanism depending on the agreements made with the various parties involved. Any decent resource manager (PBS, Slurm w/ Maui, Torque w/ Maui, GridEngine, LSF) should provide enough accounting data to facilitate charge-back. You need to definitely hire some admins for this project, specifically, ones that know what they're doing. Where on the east coast are you and what's the pay look like? :)
When you say you have plenty of storage, define "plenty". Also, don't worry about how Rocks can scale. Its just Linux with fancy tools to assist in cluster management (CentOS/RHEL-like, in fact). IB is cheaper than 10Gb for this kind of work. Don't limit yourself by going w/ GigE. Some lower-end, top-of-rack, 48-port 1Gb switches and cat5e will handle the rest (for administration, job scheduling, etc.). Pray that your nodes have IPMI interfaces of some sort. Pray that you're offering enough compensation to hire reasonably good admins w/ HPC experience. They tend to not be cheap.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 15:01 · Score: 0

Why the fuck didn't you just say so in the first place?
Re:Yes, this is legit and no, we're not idiots by DarwinSurvivor · 2011-09-13 15:45 · Score: 2

Did you just ask for a job while posting as Anonymous Coward and THEN ask them to post their email as a public reply to it?!?
Re:Yes, this is legit and no, we're not idiots by afidel · 2011-09-13 15:50 · Score: 2

The 2050 is what HP uses in the SL390 cluster configuration because they can actually cool and power 8 of them in a 4U enclosure, since the M2070 has the same power draw it should be capable of the same density.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 15:53 · Score: 0

You should stop right now and spend your time hiring a good cluster admin. From your questions, it's obvious that you don't have much of a clue about what you're doing. If you make these decisions now without the input of an experienced cluster admin, you'll be shooting yourself in the foot. There are reasons why organizations choose IB over 10GigE and it has to do with using their money to get the type of cluster that will be the most productive for their users. You cannot make these decisions without asking what types of jobs will be run on this cluster. Will they need parallel processing power, job throughput, or large amounts of memory? You mentioned storage, but what about high speed parallel storage like Lustre or GPFS? Will the jobs run on this cluster require access to large data sets? Will you be doing any checkpointing? Are you going to run map reduce problems? DNA database searches? Molecular dynamic models? Gaussian? Although you obviously have money, trust me, I could easily spend whatever money you have on HPC hardware and still be bummed that I didn't have more money. Do the sane thing and hire staff to help you do the cost analysis and build you a cluster that will make the best use of the money you have.
Re:Yes, this is legit and no, we're not idiots by b30w0lf · 2011-09-13 16:02 · Score: 2

Speaking as one of the people you will (temporarily) be supplanting, sounds like you have a tough spot to get through.
I also admin life sciences clusters for a major university on the east coast. I'm going to assume that our workloads are going to be fairly similar (R, matlab, blast, HMMER, IDEA, maybe some mutual information codes, sequence alignment, etc.). If that's not the case, some of this advice may be off.
So, a couple of things:
- I think CentOS is a good idea for a cluster platform. I do not think Rocks will scale like you want it to to that size, and it's really not terribly flexible either. Let's put it this way, I often find that I could have just built from scratch by the time I get Rocks to do all the customization I need. We run Rocks on small clusters, but big ones we spin ourselves (e.g. CentOS, or sometimes Fedora + Kickstart + some utility scripts and a scheduler... we use SGE, now OGS). Finally, stay away from more fringe distributions. You'll find that commercial software vendors are pretty quick to let you know they just don't support running their software on XX distribution. There are other reasons too. I posted a bit of a rant on this a while ago at: http://slashdot.org/comments.pl?sid=2188634&cid=36255670
- Infiniband vs. 10 Gbps. Well, InfiniBand is cool, and I've spent a lot of time working with it. I once had a project that involved writing some early stage block level storage protocols for InfiniBand... really, I like InfiniBand. That said, unless you plan to run a lot of MPI enabled MD simulations like Desmond, skip the IB and get 10 Gbps. There are a couple of exceptions to that rule, but most life sciences applications do not use MPI, and most of your traffic is going to be storage I/O. Depending on your storage solution, it's probably not InfiniBand enabled (in the front-end anyway, and you really don't want to be running IP over IB if you can help it). To say more I'd have to know a bit more about what you're going to be running.
- GPUs. One thing sticks out to me a lot here. If you don't know which GPUs to get, that probably means no one has ported anything to GPU yet. If someone has done some porting, you should ask them what they ported to. If they ported to CUDA, you should probably be looking at 2050s or 2070s. If they haven't ported anything, and they don't have (good!) GPU ported applications... don't waste money on too many GPUs. We've run a couple of pilots where we tried to get people using GPUs, and here are a couple of observations: 1. most researchers can't/won't do the porting; 2. most pre-built applications, such as matlab and R _still_ require you to port the matlab, R, etc. code, which researchers will probably also not do; 3. some life sciences algorithms just don't work well on GPUs (e.g. they are branch-heavy or memory I/O heavy algorithms); 4. many of the pre-built GPU applications for life science are terrible (I know a particular sequence alignment tool, for instance, that is proud of it's 4x speedup over a single CPU... do the math... which costs more, a quad core CPU or a tesla?). GPUs can be great, but buy them sparingly at the beginning and integrate them as they are actually being used. If you're buying now you should be buying CUDA (i.e. NVidia). It's the only actual mature development kit (though I don't like that it doesn't let you control the scheduling on the card... but I digress).
- Chargeback: So the bottom line is nothing is going to give you chargeback without some effort. You're going to have to manage that on your own. The best way to do it is to setup some basic accounting scripts that will dig your cluster logs (or database, depending on your configuration) and generate accounting reports. Note that it's the resource manager/policy manager (e.g. OGS, Torque/Maui, etc.) logs that you're going to do this with. You _could_ do it with Rocks as well as anything else (but again, I don't suggest Rocks for this project).
Sounds like you have a fun project ahead of you... good luck!
Re:Yes, this is legit and no, we're not idiots by Sgs-Cruz · 2011-09-13 16:08 · Score: 3, Interesting

Are you at MIT and is your benefactor David Koch? Because in that case, we have some researchers up at the Plasma Science and Fusion Center that do simulation work that could definitely use access to a bigger cluster. As long as you can compile FORTRAN on it, the TRANSP runs and GYRO simulations that we do are already run on a (smaller) cluster. This falls under "energy research" and is way cool to boot.
I'm not joking, if you are at MIT, please get in touch with Martin Greenwald (contact info on the PSFC staff page).

--
Karma: pi (Mostly due to circular reasoning in posts).
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 16:08 · Score: 0

A few random suggestions (going in the order in which you mention things in this post, not the importance level of each point)
1) Don't spend a lot of money on GPUs right now
Using GPUs effectively is still very much a research area and the bioinformatics / computational biophysics areas are a bit behind in this area except for 'packaged' applications (eg, GROMACS). If your users are writing their own code, I'd suggest starting with a few GPUs, not a lot, and creating research projects for grad students with a solid background in coding and computational biology to modify or, ideally, totally rewrite your applications to take advantage of them. In the meantime, use the cluster as a typical cluster for your existing applications. GPUs require a somewhat difficult programming model, have issues with memory card transfers, and the time investment necessary would be better spent on accomplishing research, of which computation is merely a small part.
2) You say you have plenty of storage now, but do you have plenty of *usable* storage? You need bandwidth and scalability, so a parallel file system is going to be all but essential. NFS won't hold up under these sorts of loads, and with 12 cores per node, you could potentially be doing some serious throughput.
3) I'm sorry to hear of the loss of your admins. I've sent you an e-mail with some ideas.
4) Definitely, as others have said, go with IB over 10GE.
5) For Linux distro, the best is to go with what you know. If you're starting from ground zero, my suggestion is to go with CentOS or Scientific Linux, both of which are based of RedHat Enterprise Linux and thus are widely supported by commercial packages, compilers, libraries, and the academic community.
Re:Yes, this is legit and no, we're not idiots by morty_vikka · 2011-09-13 16:44 · Score: 1

Maybe pay a consultant for a day's worth of advice instead of trying to do it on the cheap by "asking Slashdot" -- in the long run, it might save you some of that rare philanthropy cash which seems in danger of being spent in a panick purchase.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 16:49 · Score: 0

Buy red hat enterprise and get the consulting support and training courses. Red hat makes products suited for this.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 17:15 · Score: 0

We pay extra and get support from our vendors (SGI). With that support, they setup the machine completely and is ready for computation. When a node fails, we just log a job with them and they fix it up.
Forget the GPUs. They are really a waste of time and electricity. The only real codes that make any dent regarding utilisation is NAMD. And that is only 3 times faster (4 Tesla GPUs -vs- 1 Node of dual X5650's) which makes it more expensive from a cost of hardware and electricity consumption point of view. They are also a headache regarding system maintenance.
Maybe just get one or two nodes with 4 gpus each. That should satisfy those extremely rare codes that should run well.
We have: Infiniband (run a lot of MPI stuff), Suse11 and SGI's cluster manager.
Re:Yes, this is legit and no, we're not idiots by robotkid · 2011-09-13 18:50 · Score: 1

Coming from the bio-molecular simulation world, you'll find that the GPU performance is not only vendor specific (as has been mentioned) but even among essentially equivalent simulation software will be implementation specific (i.e. package X was written for CUDA and needs double precision so you have to buy the expensive Tesla stuff vs package Y in single-precision will work with a consumer NVIDIA card vs something else that works in an AMD card) and even problem-specific (i.e. there's a speedup if you simulate this many atoms but not if you simulate more than that many atoms arranged in such and such a way.). It may work in a GPU but not even provide any speedup.
In such a scenario there's no substitute for in-house benchmarking on evaluation hardware with real-world test cases before you plunk down for a large GPU order. The majority stakeholders may already be aware of strict hardware requirements for any existing GPU code, so start there. If noone has used GPU's in their applications before now and this is an attempt to "future-proof" the cluster, don't do it! Delay the purchase until you can establish that the user group will actually see a benefit from GPUs before you buy, otherwise it will easily become a white elephant. .
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 19:52 · Score: 0

As someone who works for a company that makes these kinds of systems, you have a couple of options:
1. If the vendor you bought the nodes from is known as an HPC vendor (2 or more systems on top500.org), wave the money you say is coming in the second installment at them and ask them to give you a proposal. Spend a little time answering their questions about what types of applications you're running and what options they support with respect to linux distros, job schedulers, and interconnects. (You
2. If the vendor you bought the nodes from is not known as an HPC vendor, wave the money you say is coming in the second installment at a variety of HPC companies who can augment your nodes with interconnect, operating system, and cost tracking system.
You can certainly do both. For the kind of money you're talking about, you should be able to get good proposals from several companies that can present you with good options. Based on those proposals, you can decide what you're looking for and who you want to go with.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 20:09 · Score: 0

Does the "second round of the grant money" expire in 2 weeks too or is there some other deadline constraint you have to meet (e.g. this private philanthropist with the medical issue is going to die soon if you can't get this cluster up and running by tomorrow)?
If "no", then wait until you hire your new cluster admin (who hopefully will have plenty of experience in building HPC cluster) before spending any more money.
If "yes", then seriously hire an experienced HPC consultant on a short-term contract to help you at least initially. The big PC vendors (Dell, HP, IBM) all have technical sales engineer specialists that can help you get started (although they'll be biased towards selling their specific products). Since you've already purchased the server compute nodes, maybe it's easiest to just go back to that vendor and ask for more help.
Regarding software for managing HPC clusters, I like to use Rocks (http://www.rocksclusters.org/) but that's because my researchers don't like to spend money on software, only for hardware. If you want more commercial support, there are a couple of vendors that use the Rocks software stack. StackIQ (http://www.stackiq.com/) is one. StackIQ is where 2 of the core Rocks developers recently left UCSD and got new jobs at. There are others.
I like to use the Sun Grid Engine (SGE) scheduler that comes with Rocks for everything.
You didn't mention anything about the more boring parts of the infrastructure. With a cluster this size, it's not the same as just slapping a bunch of computers into a rack and plugging them into the nearest power strip and flipping the ON switch. The boring parts is where it's less about the IT side but more about the Facilities side. It's rare to find someone who knows both equally well. If you get the facilities side wrong, you won't be able to run the IT side correctly.
I've had 2 separate researchers drop money on 2 "small" HPC clusters ($20K for a 10-node, $130K for a 20-node) some years back without telling me about them first because the sales guys told them there would be no problem running the cluster in their 120-sq-ft. offices. About 1 week after everything was installed and running, both of them come running to me complaining about all the noise being generated (can't think in silence in my office any longer), the extra heat generated (the office is now always too hot), the space occupied (where am I going to put all my books), and the electrical power problems (why does the breaker keep tripping every time I start a compute run).
1) Electricity - Did someone do a calculation of how much power your 300-node old HPC cluster and your new 1200-node HPC cluster are going to draw under full load? Do you even have enough electrical circuits available in the room to spread the load? Can the transformers you have even power that many circuits. You might to have an electrical engineer at least write up a quick calculation to make sure. On most HPC clusters, only the frontend head node (plus separate user login node + separate storage server if you're not going to use your existing SAN/NAS) will be on UPS. All the other compute nodes are generally not. You might want to at least have K-rated transformers and possibly a line conditioner though. If you have to have an electrician install new circuits, you might want to look into the Starline Track Busway system (http://www.uecorp.com/busway/) to give you more flexibility for the future. Do you pay for your electricity as you go or is it just added into the building's general overhead costs. Does your facilities manager know how much its going to cost your organization per month?
2) Cooling - That many computers will generate A LOT of heat. Did someone do a calculation of how many BTUs are going to generated. Then did someone do a calculation of how many tons of air conditioning you're going to need to keep the temperature reasonable? Can your existing CRAC air hander units handle this or are you going to need more? Liebert CRAC units are great but
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 20:25 · Score: 0

Maybe a DUH! comment but..perhaps...Check his web history on his computer(s) to see if you can get any hints as to what GPU he was preferring? Check his emails or email lists he participates in for any discussions? At very least perhaps you can find some helpful discussions and inquiries.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-13 21:01 · Score: 0

Another potentially DUH! comment... but...
If you're not ready for the infrastructure, interconnects, and GPUs, might it at all possible to get the vendor of the servers to put your order on hold (and sell that hardware to someone else) while still holding onto your money? Then, in [insert time it takes to figure out the other stuff,and hire the right people] use that same money already paid to get more advanced servers?
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-14 00:29 · Score: 0

For everyone that thinks I trolled slashdot... here's the quick backstory behind my question(s):
1) Staff who purchased equipment no longer work here.
2) Need help finalising the design.
If that is the backstory, the title 'Ask Slashdot: Best Use For a New Supercomputing Cluster?' is completely OT....
Re:Yes, this is legit and no, we're not idiots by Sam+Nitzberg · 2011-09-14 01:02 · Score: 1

I like the other comment on this thread, to visit the supercomputing conference/s for ideas. Besides, a lot of useful contacts will likely be made, and it can help you or your team get familiar with a bunch of issues.
You indicated that you have a private philanthropist that has a medical issue being researched by your labs. I don't know if your organization is a non-profit, or what it's charter is, but if your current labs project/s aren't enough to keep a supercomputer / cluster running at capacity right now, perhaps you can team up with other centers working on related problems. You have a great opportunity with this gear to help solve real problems.
I took a quick look for what the NIH is doing with supercomputing (just for ideas), and found this link:
http://cit.nih.gov/Science/SupercomputingResources/
I'd actually look for a list of projects, and prioritize them based on your organizations mission. Depending on your setup (I'm not a clustering / super-computing expert), you may wish to break your array down into more groups of fewer computers, or run it as one large system.
Good luck...
- Sam
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-14 02:13 · Score: 0

Good lord this is for Steve Jobs....
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-14 02:38 · Score: 0

Perhaps another productive area for spare cpu cycles is simulations in economics.
It would be nice to predict the consequences of the things that are done to 'help' the economy.
It appears that might be some room for improvement in the current models.
Re:Yes, this is legit and no, we're not idiots by Anonymous Coward · 2011-09-14 03:02 · Score: 0

Instead of asking the "experts" at slashdot, I suggest you send an email to the admins/techs of various supercomputing centers, e.g., NCSA, TACC, PSC, ORNL, to name a few. Having working on many thousands of nodes at once, the biggest benefit that you can provide is the absolute fastest network possible. Our software scales linearly on up to 200,000 cores (tested on Jaguar) and only because of an efficient network.
Re:Yes, this is legit and no, we're not idiots by Nite_Hawk · 2011-09-14 06:13 · Score: 1

Hi,
I work for a Supercomputing Institute as an HPC system administrator. I thought you were trolling too. I'm still not entirely convinced, but I'll give you the benefit of the doubt. ;)
First, let me say that you are in a painful situation on multiple levels. Buying the nodes separate from the network (and storage! and GPUs! and infrastructure!) is a recipe for vendor in-fighting. You will almost certainly have at least some issues with one of these things at scale and there's going to be finger pointing. Be ready for it. You may want to consider conservative solutions just to increase your chances that everything will work when you hook it up. You should check with your node vendor to see what warranty implications putting 3rd party GPUs in their nodes has. If you can get the rest of the equipment from your node vendor and have them put together a plan for how it will all work together with some kind of acceptance criteria it's going to dramatically increase your chances of success.
The interconnect isn't going to be as painful to deal with as something like Jaguar, but it's still something you need to think carefully about. You really need to figure out if the applications you are going to run need low latency and/or high throughput. If that doesn't matter just do bonded gigE for data and leave at least one GigE port for management. If you need high throughput and are considering 10GE anyway you should also consider IB. QDR is expensive but DDR is more reasonable and uses cheaper cables. You still get lower latencies and higher throughput than 10GE. Just make sure you hire or train someone capable of supporting it. Also be aware that to get full throughput to every node on the machine you need to buy a solution that really supports that. Don't expect full throughput to every node at once if you cheap out.
For GPUs: Why do you need GPUs? Do you have production code ready to run on them? Do you just want a couple of nodes to do development on and play with? If you really need them, you need to figure out the power and cooling your nodes and general infrastructure can handle and plan accordingly. Depending on what you are doing, QPI/hypertransport will be the bottleneck with 2 GPUs per IO hub (more like 1 if you do IB), so keep that in mind if you are doing a lot of main-memory intensive GPU computation. Again, find out what your programmers are using. It's probably CUDA unless they are cutting edge enough to have switched over to OpenCL. You'll probably want to stick with nVidia 2050+ cards for now unless you know that something else will meet your needs (consumer grade cards in bulk, ATI cards with OpenCL, etc).
For infrastructure, do you have water cooling? You should! A system of that size is going to pump out a ton of heat. Look into active water-cooled doors for your racks. This is especially true if you are going to buy a significant number of GPUs. Do you know that you have enough power? Our 1k node cluster uses 30% more power than the vendor speced when running HPL. You already have the nodes so do some stress testing.
As far as operating system goes, you may not be able to run rocks at that scale. We don't use rocks at all, but the people I know who do says it's not really appropriate for systems with more than a thousand nodes. xCat with RH/CentOS/Scientific Linux/SLES should work well. SLES licenses can be pretty expensive per year to maintain and I'm not sure it's really worth it over the others (We run SLES here). Scientific Linux may not be the best to run on IB connected machines as most of its bigger users use GigE.
I feel bad saying this, but you guys are in over your heads. Frankly it kind of sounds like you were in over your heads before your other staff departed, you just didn't know it yet. Go back to your node vendor and see if they can put together a plan for you. There's lots of HPC on the east coast, you should be able to find a qualified consultant that could really look at your situation in detail and help you out. The worst thing you could do at this point is try to buy a bunch of commodity parts and cobble something together.
Re:Yes, this is legit and no, we're not idiots by RogerWilco · 2011-09-14 06:36 · Score: 1

I don't have much to add. I think this is sound advice and about what I was to write.

--
RogerWilco the Adventurous Janitor
Re:Yes, this is legit and no, we're not idiots by mcmonkey · 2011-09-14 09:27 · Score: 1

Dear Slashdot,
I never thought it would happen to me, but...
Re:Yes, this is legit and no, we're not idiots by unencode200x · 2011-09-15 08:01 · Score: 1

Get the guy that left for Apple's DC to do some on-the-side consulting for you. It will be well worth every penny and it's a win-win-win. At the very least, have him help you hire your next person. If you don't know what you're doing, it's going to be tough to hire someone who does. If you hire the wrong person you may end up squandering everything.

--

Chance favors the prepared mind.
Perfect is the enemy of good.

The best use ... switch it off while it's not used by Lazy+Jones · 2011-09-13 11:35 · Score: 1

Save some energy, switch it off until you find something useful to do with it. It's the Right Thing to do. ;-)

--
"I love my job, but I hate talking to people like you" (Freddie Mercury)

Bitcoins by Anonymous Coward · 2011-09-13 11:45 · Score: 0

Mine Bitcoins and send them to 1DVipMAKrZgtmVVcQfPG3qWkAEB2NPW3Nf .

Monitoring is always an afterthought by Zen · 2011-09-13 11:47 · Score: 1

I can't speak to many of your questions. However, I can provide small insight into your networking question. The industry I work in is application monitoring, and it's often an afterthought added only if there are problems. If you go with Infiniband, your choices for capturing and monitoring packets in order to help you analyze application issues will be extremely limited. However, if you go with the more widespread adoption of 10GbE you will have many vendors you can pick from with very advanced features to help monitor how your app is performing across your internal network. This entire supercomputer is nothing without its network or its application, so if it were me, I would spec in a very robust solution to monitor how the application is performing on the network. The most robust solutions are packet capturing appliances tapped or spanned in from the switches (taps are preferred). This is greatly superior to capturing traffic inside a server node itself because the OS and NIC will alter the speed and form of the packets when they are sent out onto the network.

Ditto On Redhat, w/PBS by cmholm · 2011-09-13 11:47 · Score: 2

This is what the biggest USAF compute cluster uses (RH, PBS), the main difference being that it does include IB because MPI support was a requirement (and is used). Otherwise, you'd better hope your users' jobs are almost exclusively embarrassingly parallel. The cluster is based on Dell PowerEdge blades, which provided good mflop/$.

They're playing with full size Tesla GPU cards in one of the blades. I'm not sure what will give you the best bang for the buck: Tesla/Fermi/FirePro cards in-blade, or the Nvidia 1u chassis that'll allow you to share the GPUs among several CPU blades/chassis. As of last year, there was a bit more overhead using OpenCL compared to CUDA on Nvidia h/w, but it does open up your h/w options Nvidia v. AMD.

--
Luke, help me take this mask off ... Just for once, let me butterfly kiss you with my own eyes.

Re:Ditto On Redhat, w/PBS by loftyhauser · 2011-09-13 13:52 · Score: 1

This is what the biggest USAF compute cluster uses (RH, PBS)
No, the largest USAF system (AFRL's Raptor, which is actually the largest DoD system) is a Cray XE6 and uses a custom built Linux environment, CLE.

Re:I call Shenanigans!!! - Perhaps not.... by Anonymous Coward · 2011-09-13 11:55 · Score: 0

Hmm. On the other hand, who says they paid for the systems? I'm aware of one institution that recently got several hundred AMD istanbul boxes for free, including enough motherboards to build out another couple of hundred systems.

If it is who I think it is, I recommend CentOS 5.5.

MOD PARENT UP by jamesh · 2011-09-13 12:00 · Score: 1

I wish you'd mentioned that in your original post, because it read like "in two weeks we are making an attempt to land on the moon. We are considering dusting off one of the old Saturn series rockets or maybe going with something newer... what does Slashdot think?"

Sorry to hear about your loss of staff. Hope it all works out for you.

Are you joking? by bryan1945 · 2011-09-13 12:18 · Score: 1

You are going to get one of the most powerful supercomputers and you haven't even figured out what the heck you are going to run on it? Let me guess, a government grant. Man, I wish I knew some senators I could get hookers to blow.

--
Vote monkeys into Congress. They are cheaper and more trustworthy.

Shouldn't you have this figured out already? by Vrtigo1 · 2011-09-13 12:20 · Score: 1

There's something wrong with your project process. In normal organizations, these questions would need to be answered well in advance of the "in two weeks we get to play with 1200 shiny new servers!" moment. It seems as though one or both of the following must be true. A) you're related to the project in some ancillary, not really important way and are just trying to help out the people really running the project, or B) your company has more money than they know what to do with and are dying to spend it on anything you ask for. If it's the latter, are you hiring?

Use IB, CentOS or SciLinux, and xCAT (xcat.org) by datajerk · 2011-09-13 12:23 · Score: 2

IB is faster and cheaper than 10GE. Unless you get 10GE from your IB vendor.

All IB solutions support RH distros fairly well, so I'd stick with RH-like or RH-proper. CentOS has been our x86 Linux reference platform for xCAT development.

Use xCAT for cluster management and use xCAT's stateless provisioning (no need for local HDs). With xCAT we were able to provision the fastest system in Canada (~4000 nodes) over 40:1 blocking GigE in 8 minutes (but we had 10 10GE-based service nodes). xCAT was also used for the first 1.0 and 1.1 Petaflop system (LANL Roadrunner).

For billing and chargeback consider Moab with Gold. If you use Moab with xCAT and stateless provisioning, then you can power up nodes on demand and power them down automatically when not in use and track/bill one energy usage. You also have the ability to specify different OS loads on-demand so that your system can be more of an HPC cloud and not just a static homogeneous cluster. Lastly xCAT can support KVM if you want to throw a few VMs in there as well. Oh, and if get the itch to use Windows, xCAT supports that too.

I see your point(s) by KingAlanI · 2011-09-13 12:26 · Score: 1

I see a couple valid points here:
firstly, that some audiophiles are overly concerned about the technical quality of the equipment, compared to the artistic quality of the music (or lack thereof).
secondly, that a lot of modern mainstream stuff is at least slickly produced, even if there are issues with things like poor singing or lack of lyrical depth.

P.S.
I'm not well-versed on Britney's discography in particular, but in my experience there tend to be some relative gems amongst modern mainstream songs/artists.

--
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.

Re:I see your point(s) by crutchy · 2011-09-13 15:01 · Score: 1

I find that the better the amplifier hardware the bigger range of music becomes tolerable. Britney on a cheap boombox sounds like crap (duh). Play the same thing on something half decent and there will be more music, less Britney so of course it will sound better :)

I'm not a fan of midrange in general (nor hissing or thud). I don't think that makes me in the same league as some audiophiles but I'm a little pickier than your avergae Britney fan. I like to be able to feel bass (particularly in movies) and be able to turn the music right up without my ears being blown out by midrange and higher frequency distortion. Anyone who cranks up their sound system enough to get bass distortion is obviously a dickhead but even cinemas and nightclubs have given me headaches with "noise".

btw, how did we get from supercomputing to Britney again?
Re:I see your point(s) by KingAlanI · 2011-09-14 01:54 · Score: 1

I haven't seen Britney, and I don't intend to - these are general comments:
Live sound systems do tend to have something going on that kinda washes out the vocals.
I don't hear such a problem in recordings, unless they're really rough bootlegs
cables in a computer network and cables in a sound system: that's how this tangent started

--
I listen to both RIAA and non-RIAA stuff if I like the music, tangential business/politics nonwithstanding.

Easy answer! by __aailob1448 · 2011-09-13 12:41 · Score: 1

You're putting GPUs on those things so here is what you do:

1-Start mining bitcoins
2-Watch their value soar
3-Sell them
4-Profit!

You're welcome!

Re:Easy answer! by tehcyder · 2011-09-14 01:24 · Score: 1

mod parent -1 for mentioning fucking bitcoins

--
To have a right to do a thing is not at all the same as to be right in doing it

The question is why... by damn_registrars · 2011-09-13 12:47 · Score: 1

Why would you bother building something like this anymore? There are so many places that will rent you cluster time on an as-needed basis, and it's cheaper to use their storage for it at the same time anyways. We were looking at building a new cluster at my work, but have been leaning more towards paying for time on a compute cloud (Amazon EC2 or similar) as it just makes more sense.

The way the summary was written, it doesn't sound like a whole lot of advance thought was put into this; what is the plan in 1,2,3, and 5 years as each different group of components sees its warranty expire? Are you buying spares already to have on had for when things break down? What about storage and data backup?

--
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.

Re:The question is why... by RogerWilco · 2011-09-14 06:39 · Score: 1

Cloud isn't really cheap. It only gets competitive with rolling your own if your problems are very CPU bound (high Ahmdahl number). If you're I/O and storage intensive, Clound services are actually very expensive.

--
RogerWilco the Adventurous Janitor

Penguin Computing Scyld ClusterWare by adamy · 2011-09-13 12:49 · Score: 1

I worked on Penguin's HPC offering a few years back. It PXE boots all of the compute nodes in a lightweight manner, and builds the entire cluter into a single process space. Distributed process space means that a signal sent to a parent process running on one node gets correctly forwarded to a child process running on another. You will not find an easier way to manage the cluster than Scyld.

http://www.penguincomputing.com/software

It can install on top of Red Hat Enterprise Linux or the Comparable CentOS

--
Open Source Identity Management: FreeIPA.org

Re:Penguin Computing Scyld ClusterWare by Anonymous Coward · 2011-09-13 23:14 · Score: 0

I have a Penguin cluster. Scyld has some "issues", but for the most part is pretty easy for our admins to manage

Get admin or get help by Junta · 2011-09-13 12:57 · Score: 1

Sounds like you were in a position to reasonably roll your own, but circumstances have changed. You may wish to consider talking to HP, IBM, or Dell sales reps (if your servers are already from one of those badges, you undoubtedly would have a strong relationship with their sales team). Balance that against community advice to give context. Basically, all those companies have experts and will gladly take your grant money to give you what you want. Alternatively, backfill with an expert in the field to fill the gap.

That said, my community advice:
Interconnect: Infiniband almost certainly at that scale. Cost per port, bandwidth, latency are all in favor of IB. Building a high scale IB fabric is child's play, doing the same in ethernet is possible, but more difficult technically and financially.

In terms of GPU, your server choice is critical to know. If the servers were not designed for GPU, you may be SOL for lack of room for heatsink or lack of power connector. Even amongst servers that do GPU, frequently the extra power harness or PSU is not provided unless the vendor knows ahead of time.

I cannot speak much to ROCKS, but xCAT does a pretty good job with RHEL/CentOS/SLES, and I think Ubuntu now. Generally, I see people go with RHEL/CentOS.

--
XML is like violence. If it doesn't solve the problem, use more.

But Does It Run Lotus Notes? by Doc+Ruby · 2011-09-13 13:07 · Score: 1

You should run Lotus Notes on it. At least that's what the Guardian (giant NYC insurance/finance corp) did with their two IBM SP2 supercomputers in the late 1990s when I was bringing OOP to their development division.

No, I'm not kidding. Lotus Notes on 2 SP2s. To get the results they could have gotten with two Sparc20s running SENDMAIL, Apache and Sybase.

I tried to get them to switch to that alternative platform, and bet them an SP2 it would work as fast, and better. They didn't take my bet.

--

--
make install -not war

Re:But Does It Run Lotus Notes? by ajlitt · 2011-09-13 14:37 · Score: 1

I hope somebody finally got fired for buying IBM.

The question to the answer by Anonymous Coward · 2011-09-13 13:12 · Score: 0

Use it to figure out what 42 means

Re:The question to the answer by maxwell+demon · 2011-09-13 18:05 · Score: 1

Use it to figure out what 42 means
Don't do that. You don't want to attract Vogon construction fleets.

--
The Tao of math: The numbers you can count are not the real numbers.

Better than SETI :::: by wideBlueSkies · 2011-09-13 13:14 · Score: 1

Protein Folding : folding@home

http://folding.stanford.edu/

--
Huh?

Nobody on Slashdot Can Read by Doc+Ruby · 2011-09-13 13:21 · Score: 1

The most interesting aspect of this discussion is that so many posters are whining about this supercomputer arriving without having any applications planned for it, so it's asking Slashdot for recommendations. It's hilarious not because it's probably fiction (though that's always possible), nor because Slashdot's reply is that it's fiction or some government or trust fund boondoggle. It's kinda hilarious because right there the poster says:

We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage

It's perfectly clear that they're buying a bigger computer than they'll be able to use (at first), so they're looking for something else worthwhile as the machine starts to go stale (out of the oven, like bread). But they're doing the kind of computing already that makes big money from data crunching, though it's certainly possible they've bought bigger than they can deploy their current workload to for 100% capacity usage.

But what makes that so hilarious is that they've got a supercomputer burning a hole in their datacenter, and they think asking this gang of illiterates is the way to decide what to do with it. Of course they should mine bitcoins! It's idiot's delight in here.

--

--
make install -not war

Re:Nobody on Slashdot Can Read by tehcyder · 2011-09-14 01:22 · Score: 1

It's kinda hilarious because right there the poster says:

We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage
It's perfectly clear that they're buying a bigger computer than they'll be able to use (at first), so they're looking for something else worthwhile
What's hilarious is that you think that story is completely reasonable. What sort of organisation deliberately spends a huge amount of extra money on something and then has to ask slashdot readers how to use up the slack? It makes no sense.

--
To have a right to do a thing is not at all the same as to be right in doing it

over your head by Anonymous Coward · 2011-09-13 13:44 · Score: 0

if you are getting the components in two weeks and you don't know these answers yet you are WAY WAY WAY over your head (someone on the east cost involved in the HPC industry for the past 6 years)

Cluster by Anonymous Coward · 2011-09-13 14:06 · Score: 0

fuck

Trade it to me... by pyneiii · 2011-09-13 14:15 · Score: 1

Trade it to me, I've got a bridge in brooklyn you might like...but in all seriousness, can it run Crysis?

OS, duh! by ThurstonMoore · 2011-09-13 14:17 · Score: 3, Funny

The obvious answer is Windows Server 2008 HPC.

What? by sycodon · 2011-09-13 15:37 · Score: 2, Insightful

Isn't this shit you should have had all figured out before you even applied to whatever company, agency, government, etc, you got the money from?

WTF is this? I can only hope you didn't get money from the feds.

"Hey, look! The feds gave me a shit load of money to get this cool super computer...what should I do with it?"

Seriously...if you got any government money for this then you are first class tool for not having all of this known before you even applied.

--
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.

Re:What? by justsayin · 2011-09-14 02:41 · Score: 1

Totally agree, this is a scam article submission. Someone is playing with us.
Re:What? by HalWasRight · 2011-09-14 06:39 · Score: 1

I sure hope the funding body reconsiders giving money to these n00bs.

--
"This mission is too important to allow you to jeopardize it." -- HAL
Re:What? by HappyPsycho · 2011-09-14 07:44 · Score: 1

I see you didn't read the summary...
"We primarily do life-science/health/biology related tasks on our existing (fairly small) HPC. We intend to continue this usage, but to also open it up for new uses (energy comes to mind)."
They are upgrading and to help recoup the costs it would be nice if it could do some other stuff as well, what do you suggest...
Re:What? by sycodon · 2011-09-14 13:19 · Score: 1

It would seem they over bought then eh? Gee. I need a bigger car to take my kids to school...so I get the feds to buy me a bus. Now what should I do with all those extra seats?

--
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
Re:What? by HappyPsycho · 2011-09-14 15:49 · Score: 1

It would be short sighted to not build in extra capacity to handle bursts and natural growth, ever see anyone on the top500 drastically increase their capacity? If the poster's company have a HPC and are upgrading then business must be good to the point that the current infrastructure is loaded (possibly overloaded). Rollouts like these are not exactly quick and easy, so whatever goes in now has to be able to do what they want for at least as long as it takes to demonstrate ROI.
Amazon is probably the poster child for this mindset. They built in enough capacity and redundancy to handle the seasonal spikes in traffic around the holidays, the result of the extra capacity? EC2, which sounds eerily similar to what the poster hopes to do.
A bit off-topic question, where did the assumption that the US federal government (or any government entity) is paying for this? Last news I heard on this matter is that quite a few government data centers were being shutdown http://www.informationweek.com/news/government/leadership/231002215 so it seems unlikely that they would now be setting up such a large one (especially since that would create a large single point of failure in a hurricane prone area). Also if the government is paying for it why would the poster care about recouping some of the costs?

Private philanthropist? by SigmoidCurve · 2011-09-13 16:55 · Score: 1

Our organization received a grant to pay for this from a private philanthropist that has a medical issue that is currently being researched by one of our labs (this happens to us not to infrequently).

Dude, that's not a philanthropist, that's a sociopath. Let's see, I am super rich but I have a rare terminal disease, but maybe I can cheat death if I purchase all the world's best scientists to work on *my* health problem. Never mind if they were previously occupied trying to save sick children. Fuck those kids, I'm rich and therefore I'm a higher priority.

I hope your benefactor enjoys his or her remaining time on Earth, because Hell is going to be a real bitch.

--
Dictionaries are for loosers.

Re:Private philanthropist? by Anonymous Coward · 2011-09-13 17:27 · Score: 0

Yes, because nothing says pure evil like funding research that might help countless people. Get a grip.
Re:Private philanthropist? by SigmoidCurve · 2011-09-13 17:48 · Score: 1

The point is it distorts the market by diverting attention toward a disease suffered by old rich people. Given the choice, scientists would allocate their time toward more deserving patients, not simply those with a bankroll. Look what's happening with the Gates foundation and malaria research: other areas of research are being abandoned because everyone's flocking to the malaria grants. It wasn't their intent to discourage scientists away from other areas of research, but that is what happens when you flood the market.
If you're going to be a philanthropist, you don't decide what science should do with the money, you let the experts decide.
To truly qualify as philanthropy, your donation should not come with strings attached. Despite AC's mockery, what this Koch guy is doing is not really motivated by any sense of altruism, he's just trying to save his own ass (yeah, it's intended). Here is a major research lab being required to study what would most benefit him, and you call that charitable? Seriously, you don't think it's evil that this twisted fuck can yank money and resources away from sick children just so he can have a few more miserable years on the planet?

--
Dictionaries are for loosers.
Re:Private philanthropist? by Anonymous Coward · 2011-09-13 23:21 · Score: 0

One would expect a "rich philanthropist" businessperson (unless they inherited all their money) to have a good enough sense of realism that they would know, giving money to basic medical research isn't going to help unless their illness is chronic and not immediately life-threatening - as it'll take at least several years to produce even the roughest experimental treatment..

BS on "largest cluster" by sl3xd · 2011-09-13 17:23 · Score: 2

I have to wonder what you're on the east coast of. East coast of Madagascar? I work in HPC; a thousand nodes just isn't that much. We sold larger clusters than that four years ago.

--
-- Sometimes you have to turn the lights off in order to see.

Re:BS on "largest cluster" by phantomfive · 2011-09-13 19:41 · Score: 1

Was it x86_64? Because maybe no one's had a need for 64 bit processors before in that kind of cluster.

--
"First they came for the slanderers and i said nothing."
Re:BS on "largest cluster" by sl3xd · 2011-09-14 03:23 · Score: 1

We were selling x86_64 clusters at least six years ago.
Just look at the top 500 list - you'll note plenty of x86_64 clusters of well over a thousand nodes.

--
-- Sometimes you have to turn the lights off in order to see.
Re:BS on "largest cluster" by Anonymous Coward · 2011-09-14 06:13 · Score: 0

There are greatly diminishing returns on greater node counts when you can shove 48 cores and 256GB ram into a pizza box for ~$12k each.
Re:BS on "largest cluster" by Anonymous Coward · 2011-09-15 17:07 · Score: 0

We have an existing HPC of roughly 300 nodes and 1200 cores.
Reading comprehension helps.
I wonder how large the new one is going to be.
Re:BS on "largest cluster" by Anonymous Coward · 2011-09-16 03:20 · Score: 0

4 years ago you didn't have 10 core CPUs.

why not mosix? by Anonymous Coward · 2011-09-13 18:51 · Score: 0

OS: With Mosix there's no need for MPI or openMP. Simple administration and configuration. Google for mosix in e.g. biosciences for success stories that claim a fair amount of productivity increase. Create groups of clusters across the campus etc.

GPU's: use scripting languages for performance increase. Just google for gpu+your favorite scripting lang. More complicated problems require a new cuda/opencl code through.

Infiniband by Anonymous Coward · 2011-09-13 18:56 · Score: 0

Infiniband is the choice !! Just one word: RDMA

quote:
remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters.

I've lived this in corp... by Anonymous Coward · 2011-09-13 19:10 · Score: 0

I've been there, done that. I think the use/lose budget practice is too strict. It should have the option of delay and review without having to go through the annual budget process.

The hard part is when you are planning on spending the money in Winter and the Project is delayed a quarter. You need to buy now because the budget is lost. Instead of waiting a seeing if the requirements change and/or if you can get better pricing a quarter later.

I once needed 6 servers (5K each) and asked for the exact amount needed. As was routinely the case and being a new manager, I didn;t know the budget amount was cut in half and approved. Well that sucked since I *needed* 6 machines. Next Project I needed 2 servers (60K each) but knowing the budget process I requested 6 again. This time it was approved *as is* 360K. Sweet 1.5TB of RAM and more hardware than really needed but useful to have support role servers.

The process was total bullshit but you learn to work within it.

Compete with amazon by Anonymous Coward · 2011-09-13 19:15 · Score: 0

That'll get your money back ;-)

Or figure out how to share computation resources with other universities without bureaucracy, and be a champion of mankind!

rhel 6 , xcat , slurm , IB by stenWolf · 2011-09-13 19:49 · Score: 2

" So, what's the best Linux distro for something of this size and scale?"
RHEL6
I would have suggested SL6 but their guy just left for RHEL, and CENTOS is still playing catch-up. since you obviously have money, go with best supportedOS.
"Additionally, due to cost contracts, we have to choose either InfiniBand or 10Gb Ethernet for the backend: which would Slashdot readers go with if they had to choose?"
IB, hands down.
The main issue isn't even the bandwidth (which is 40Gb/s compered to 10Gb/s) - it's about latency and RDMA for whatever MPI you'll use.
"Any suggestions on the most powerful Linux friendly PCI-e GPU available?"
Go with nvidia tesla. Every self respecting HW vendor has them as an option for either blades or rackmounts now-a-days.

Manage the whole thing with xCAT, schedule your jobs with slurm (wouldn't touch moab, now, can't justify the cost),personally I'd focus on openmpi (intel compiled, not gcc) with blcr checkpointing.
You can set the entire SW stack in 4-6 hours if you know what you're doing.

you don' t by Anonymous Coward · 2011-09-13 19:59 · Score: 0

If you have to ask the question to slashdot what to do with a HPC cluster and which distro to use, then don't install such a system. Pretty please!

This guy is clueless by Douglas+Goodall · 2011-09-13 20:46 · Score: 1

These days all supercomputers only stay viable until either someone with more money or someone with a higher power cpu comes along. So you pay a fortune to set all this up and you have a window of opportunity during which it is exciting and likely to draw some serious users it's way. If it is fast enough you might even get the NSA to just lease all the time. But (and this should be obvious) you have to hit the ground running. By the time the hardware is ready, you have to have all the software architecture well understood and the business structure that goes with it in place so that when you turn it on, there are customers lined up ready to go, otherwise you are pissing in the wind. IF you do everything right, you may even get enough clients to pay for the thing before some other nerd has one better and they don't ring your phone any more. If you are just now wondering about a distro, and not sure what GPU you are buying in bulk, then you are a rank amateur and bound to lose.

east coast HPC guy here by oudzeeman · 2011-09-13 23:18 · Score: 1

I work at a genetics research lab in New England and specialize in High Performance Computing. Let me know if you want to talk privately. I would be willing to give a little advice for free, or more as a consultant.

Put a party in my pants by hesaigo999ca · 2011-09-14 01:09 · Score: 1

Download an obscene amount of p0rn..... or try to replicate the sun's life cycle including solar flares and eventual collapse...either way, it would be awesome.

Hold on a sec by tehcyder · 2011-09-14 01:16 · Score: 1

So this guy has spent hundreds of thousands of dollars on a supercomputer and he doesn't know which OS to choose, nor what to use it for? Sounds fucking unlikely to me.

--
To have a right to do a thing is not at all the same as to be right in doing it

uggboots by Anonymous Coward · 2011-09-14 01:20 · Score: 0

Welcome to the cheap uggs,a nature shop website offering a comprehensive range of sheepskin cheap uggs and ugg boots. Aside from our highly competitive pricing|we offer free fast shipping world wide included in all of our prices and a 100% products original package of ugg boots.
http://www.esaleugg.com/

notes on GPUs by Anonymous Coward · 2011-09-14 01:22 · Score: 0

I have an Nvidia GPU and i've done some programming on it and I have to agree with previous poster here: one must bear in mind when using them the main bottleneck is often memory bandwidth and especially over the PCI bus. (The programming guides even say that -- that the cards are for computation-intensive workloads as opposed to bandwidth-intensive loads.) But then again when you're saturating your north bridge you're probably doing pretty well on the computation end. The performance payoff of a GPU can range anywhere from 4 times to 50 times that of a CPU, largely dependent on the memory access pattern of the algorithm, as again, they're bandwidth-limited. (But compute capability 3 and up are better at this when given multiple different computing tasks due to an improved threading engine.) Also, yes, the consumer cards (GeForce) are way cheaper and the performance is comparable to the "workstation" cards (Tesla). as far as CPU/GPU balance goes, i wouldn't go over one GPU per CPU core. even on intensive GPU loads you still need the CPU to do some pushes and some bookkeeping.

Debian by Anonymous Coward · 2011-09-14 01:50 · Score: 0

There are big clusters running Debian:

http://www.debian.org/News/2011/20110729

In addition, Debian is looking for friendly cluster/cloud providers to help us run rebuild testing and static analysis tools. Please contact the Debian project leader about this:

http://wiki.debian.org/Teams/DPL

Easy!! Windows 8!!! by daboochmeister · 2011-09-14 02:05 · Score: 1

Everything else is SO day-before-yesterday!

--
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci

Minecraft by Sylak · 2011-09-14 02:57 · Score: 1

Minecraft SMP Server, duh~

I smell a rat by Anonymous Coward · 2011-09-14 03:46 · Score: 0

Maybe I'm just being dense, but aren't these questions exactly the sort of thing any good engineering team would have already spec'ed out ages ago, and that any good management team would insist on seeing before handing over the cash to "build the largest x86_64-based supercomputer on the east coast of the U.S." ???

If it's really the largest x86_64-based supercomputer on the east cost, you're already talking several tens of millions of dollars (that's what the existing ones cost anyway). And you're just now getting around to asking which software set to support, the network architecture of the backbone, and how to support GPU computation?

No, I smell a rat. This isn't a system about to go into production. This is a design proposal and someone's too lazy (or incompetent) to do their own homework so they're asking on, of all places, slashdot because everyone know slashdot posters are all experts on high-performance computing.

-JS

Bull fucking shit. by binford2k · 2011-09-14 03:56 · Score: 1

What the fuck is a question like this doing on here? If it's for real, then for fucks sake, why the hell isn't it spec'ed?

Sorry, not the biggest by whitroth · 2011-09-14 04:51 · Score: 1

From :
The NIH Biowulf cluster is a GNU/Linux parallel processing system designed and built at the National Institutes of Health and managed by the Helix Systems Staff. Biowulf consists of a main login node and 2300 compute nodes with a combined processor core count of over 12000. The computational nodes are connected to high-speed networks and have access to high-performance fileservers.

And it's been running here for years.

Someone asked "why do this, and not rent cloud space?" We'll skip my rant about cloud space, and cut to the chase: in our division, I know of at least one person who runs jobs on one of our clusters, between 10 and 48 servers, ranging from old 4 core to newer 48 core machines... and his jobs can run, literally, for weeks. And they use a *lot* of the full power of the cluster. There's more folks who run jobs on the same clusters (things like protein folding modelling) that "only" run for 3-5 days; again, eating most of the CPU on the clusters.

That's why. Oh, and let's not forget funding....

mark, who speaks neither for the US Federal Gov't, nor my employer; I speak for me (got a problem with that?)

Finally, a way to make Windows... by gestalt_n_pepper · 2011-09-14 05:25 · Score: 1

responsive enough to use in real time.

--
Please do not read this sig. Thank you.

It's simple. by caseyweederman · 2011-09-14 09:50 · Score: 1

Low Orbit Ion Cannon.

The obvious response by Anonymous Coward · 2011-09-14 10:30 · Score: 0

Pics or it didnt happen.

Solve a 100-year old graph theory puzzle by Anonymous Coward · 2011-09-15 01:30 · Score: 0

Is a hamiltonian cycle of 5040 vertices possible according to these rules:

Vertices are permutations of 7 items.

Edges are permutations

Allowed edge permutations:
(2,3)(4,5)(6,7) == 1 (as the item in position 1 is fixed, the other adjacent pairs swap)
(1,2)(4,5)(6,7) == 3
(1,2)(3,4)(6,7) == 5
(1,2)(3,4)(5,6) == 7

according to these rules:
Edges must be traversed in lists of 6.

First choice of list of 6 permutations
'plain six'
3.1.3.1.3.7
Second choice of list of 6 permutations
'bobbed six'
3.1.3.1.3.5

With 5040/6 = 840 choices of sixes, can you find a list of plain and bobbed sixes to traverse all 5040 vertices without repetition, arriving back at the starting point?

It's a CPU intensive problem which parallelises trivially.

Look up bobs-only Erin Triples for more information.

Is Rome NY not east coast? by Anonymous Coward · 2011-09-15 02:22 · Score: 0

Something tells me that a 14,400 core system will not beat the 42,712 core system installed at AFRL in Rome NY. It currently ranks #19 in the top500.org list. Yes, it is "Cray", but it is an Opteron system, x86_64 based.
http://top500.org/site/systems/78

Infiniband vs 10GbE by zpiro · 2011-09-20 09:38 · Score: 1

I doubt this is a serious question, if you even glimpse at the price difference here, its quite clear what to do.
Especially considering that you can run IP over Infiniband.
10GbE simply does not belong in HPC solutions.

Best use supercompter - Software & Users by Anonymous Coward · 2011-09-23 22:31 · Score: 0

The question of best use for a supercomputer has several aspects:
Hardware, os, user/Application Software and Users.
To my view the Most Important ödste Today is the Application Software and the associated Users.

Thus whatever you do think about the intended use.

Also work incrementally, thus your Cluster should have a Small Development subset to allow Testung of Hardware configurations and Software installs.

Ideally using something Hardware/OS/Application which has a user
Community to Support you is a good Choice.

If you want to try some Software Application a boing.Org project os a good Choice.
Schiebi

Slashdot Mirror

Ask Slashdot: Best Use For a New Supercomputing Cluster?

387 comments