Teraflop In A Box At SC2003
HPC Prophet writes "For those of you that can't go to SC2003 or can't afford the US$750 late registration, here is a small taste of what we put together for our friends at Mellanox Technologies...It benches out at over 1.2TFLOP (192 dual Intel Xeon Processor blades, 64 in a Rackable chassis, 128 crammed into a Ciara chassis and all connected via InfiniBand) and loaded up with Callident Rx (based on NPACI Rocks) OS/Middleware. Total estimated time to unpack, build and get up and running was 17 hours." Read on for some details on this power-hog.
"We had the single-most power density for the smallest size booth they offer (380amps @ 208v in a 5U of rack space (look closely at the bottom of the middle rack containing all the cables and InfiniBand switches). Cooling was very nice too, we maxed out our Liebert HVAC when building it initially. Oh, by the way, this would end up somewhere in the neighborhood of #38 on the June 2003 Top500 list. There are a couple of other pictures on there too of some of the other attractions at SC2003 like the 128-node cluster that NPACI folks will build in a 2 hour period. Sorry about the cheezy slide show, I had to be quick."
I like that they actually put this demo together with Windows XP Power Toys.
"For those of you that can't go to SC2003 or can't afford the US$750 late registration"
What about those of us that don't have a clue what sc2003 is?
In case anybody wants it, the link to the conference is at
http://www.sc-conference.org/sc2003/
Several of the lectures are being broadcast via high bandwidth video if
you are on Internet2.
A box full of Pentium Xeons in a cluster. So what? This stuff is getting rather passe. Where is the invention and innovation?
Rotten kids, cant trust 'em these days.
Speaking at Defcon 12 - Credit Card Networks Revisted: Pen
http://www.testdrivehpc.com/sc03/SC2003_booth_1011 _TFLOP_Cluster/html/35.htm
nohup rm -rf ~/. >& zen &
a computer that will be able to run Windows Longhorn!
Windows XP Powertoys?
"If anyone needs me, I'm in the angry dome."
If you look at the more recent November 2003 list instead of the older June 2003 one, this cluster would rate more like #84 than #38.
/cj
Ummm...I'll bite... Any modeling or visualization...anything application in which you need to calculate the complex interplay of many little components.
I'm writing an application that simulates the evolution of language in a population of ~1000 neural networks. Try running that on your 386SX with math coprocessor.
I only wish the price of these things would slide down a little more. Something like a PS2 cluster would be excellent for me if the linux kit wasn't so costly.
Funny you say that ... MS does daily automated builds of Windows for all it's supported CPU platforms and does installs to a large farm of workstations. For Win2k, the build cluster was comprised of Compaq 8x processor Xeon servers. I imagine they may have moved to larger hardware like a Unisys E7000 by now. Windows is well over 20 million LOC now, and doing a daily build takes over 10 hours.
meh.
...something tells me that they aren't running it on their 1 tflop box. ;o)
I am NaN
I only wish the price of these things would slide down a little more.
Cost of this 1 teraflop Mellanox machine is less than US$1e6 according to this brochure.
That's considerably less than the US$50e6 that the first teraflop machine cost (Sandia's ASCI Red see this SC1996 flier) 7 years ago.
I don't have a spare million, either, but that kind of 98% price reduction is still fairly impressive.
"Provided by the management for your protection."
I don't have a spare million, either, but that kind of 98% price reduction is still fairly impressive.
Over 7 years, in terms of pure FLOPS, you'd expect the price to be halved about 5 times. So the price should be 1/32, about a 97% reduction.
Is Moore's Law impressive? Sure. Is this particular case impressive against the background of general computing progress? No.
It's not actually the speed that matters, here. It's how well the applications are parallelized. Things like protein folding, most population modelling simulations, graphics rendering, etc are -highly- parallel in nature, and run beautifully on clusters and large SMP machines (by large we're talking >32 way).
A really good example is the genomic search tool BLAST. The "stock" version from NIH isn't natively parallel, however due to it being available in source form, it's been modified to run in parallel....and it's -much- faster that way.
Basically, if your problem set can be broken into chunks and -then- worked on, you can make good use of any sort of parallel system. Clusters are really the "poor man's" way of parallelizing computation...they're also becoming the most prevalent -because- you get a lot of bang for your buck...think about it: Earth Simulator cost 8 figures to build, IIRC, to get 17 TFlops. Earth Simulatr is a more tradition vektor system, so it's -really- freaking good at certain operations...but it's also freakishly expensive to design and build.
Meanwhile, IBM recently built the prototype for a single BlueGene/L node, and it manages to cram 1024 PPC440 processors, with a Rpeak of 2Teraflops, and an Rmax of over 1.4TF into about half the space of the full racks mentioned in this article.
While this article is obviously about a somewhat less custom system than BlueGene/L, I'd have to say I'm much more impressed with IBM's achievement.
"The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge