Japan's Petaflop Supercomputer
slashthedot writes "Japan has built the fastest supercomputer in the world. While the BlueGene/L contains 130,000 processors, Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop."
Making that computer must have been harder than getting a story from MSN posted on the main page of Slashdot!
It now costs 15 dollars per gigaflop. In the early 90s, a million dollars per gigaflop was normal.
Religion for nerds. Stuff that really matters
The original article seems to be unreachable, so I can't read it, but the precis has the wrong chip count: It does have 4808 LSI chips, but it also has 19,122 Xeon processors.
Will this run Vista at a decent speed, or should I wait for the Rev B and SP1?
If this petaflop supercomputer really only costs $9 million and only occupies the space of a large walk-in closet, why don't they mass-produce it and sell it. No, not to individuals but to corporations and governments. Folding@Home and Seti@Home could suddenly be like, sorry guys we don't need you anymore - we got something better. Having hundreds of copies of this super computer could quickly solve problems across the globe that much slower supercomputers are currently having trouble with!
... and in the DRM, bind them.
NOT what the VP of Marketing wants to hear:
"Not just a flop, but a flop a million billion times over."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Having a computer do something very very fast is only of any use if you have the software to do what you want done very very fast. As far as I know, the hard part of what you suggest is writing such capable software, not running it.
The revolution will not be televised... but it will have a page on Wikipedia
Great. 9 million dollars to build the thing, 15 million dollars to build the infastructure to power and cool it, probably.
The Cell processor can do ~200 GFLOPS - not IEEE quality FLOPS however, however they're 'good enough single precision FLOPs' for it's target uses. This is probably why this new supercomputer won't get into the Top500 list, because it's very specialised and thus probably nowhere near as good at IEEE conformant calculations.
The Cell processor is not running at 200GHz. There's this concept called 'parallelisation', it's how your graphics card can do dozens, if not hundreds, of operations per clock cycle. In Cell's case it can do 8 (number of SPUs) * 4 (128-bit registers, SIMD) * 2 (units) = 64 SP FLOPS per clock cycle, and that's not including the PPU which has VMX128 and an FPU itself.
However make the Cell processor calculate IEEE conformant FLOPS, and it gets a double precision score of around 20GFLOPS. Still good though.
The above was from memory, details may vary, figures are roughly correct, YMMV, etc.
"Show me the MFlops/Watt rating of this?"
No problemo!
The number of flops: (10 ^ 15) / 4808 = about 207,986,688,852 flops per chip, - from a previous poster.
The number of watts: 300,000 - from the manufacturers' site = 62 watts/chip
207,986,688,852 / 62 = 33,546,240 flops (33 MFlops) / watt.
Well the examples that you mention are not really the same as "attempting to break software and search for problems long before release." If I understand these issues correctly: (1) (with apologies to crypto specialists) RC5 cracking required lots of CPU time to factor a big-ass number, (2) projects like Folding@Home aren't "looking for a cure for cancer," they're running (I think) quantum chemistry simulations to find out how certain molecules can act in certain situations, and (3) SETI@Home is looking for specific patterns in signal data. In all three of these cases, there's a few common (maybe not so simple) operations that need to be applied to a large set of data or initial conditions, and that's why they need lots of machines, or fast machines.
Figuring out how clever people will take advantage of a particular implementation of a web browser or TCP/IP stack is a completely different class of problem IMHO. Yeah, maybe there's some clever AI techniques that may simulate attack attempts, and maybe they could come up with attacks that nobody has thought of yet, but a really fast computer will not somehow magically solve these kinds of problems for us. There's a lot of hard science and software engineering that needs to be done first.
[b.belong('us') for b in bases if b.owner() == 'you']
Oh, please. This machine only uses 300kW - that's maybe the equivalent of 150 American homes. These folks are building a specialized (as in not "more of the same") machine to support a particular bit of science (molecular dynamics simulations) that isn't gonna make for flashy headlines, and I say more power to them. I'd rather there were more scientists out there doing basic research that may actually be useful, than have them chasing after stuff for headlines that will make you happy.
And if you're trolling, yeah, you got me, so congratulations.
[b.belong('us') for b in bases if b.owner() == 'you']
Quoting another link you can see how they reached these numbers (which I take issue with):
- http://mdgrape.gsc.riken.jp/modules/tinyd0/index.
With that answered, I'm confused. Another poster sent along that link which explains what Riken will do. I'm confused about that actually. Reading the page, based on the verb usage, either someone didn't understand future and past tense (possible, but unlikely), or they haven't built the entire box yet. Perhaps I'm reading a bit too much into it... it's quite possible that someone simply hasn't updated the website.
Based on the webpage, all of the calculations to reach 1 petaflop are based on theoretical peak performance measurements, extrapolated from the theoretical peak of a single special-purpose ASIC which has been built, but may or may not have been actually placed into a fully configured system. Nothing talks about measured benchmarks, and the OP's article contains the same theoretical extrapolated numbers.
Anyone know if they've actually built it?
~ Mike
Michael C. Hollinger
but I thought Japan already had a lot of studys on protein?
I've seen the videos of it a few times and stumbled across entire collections of them! they call it something like bukkake.
The problem with that is that this computer is very specialised to molecular simulations. It can't very easily do other things, like seti or folding (okay, well, maybe that it can do). It was easy to design and cheap because it didn't have to be general purpose and adaptable, like BlueGene/L is.
I love deadlines. I like the whooshing sound they make as they fly by. - Douglas Adams
(10^15)/4808 = 207 986 688 852, i.e. ~208 billion flops, i.e. if the chip executed only 1 instruction per clock, it would be 208GHz (not THz as you imply). Except of course the chip does more than 1 instruction per clock. Modern x86 chips do multiple flops per cycle. A Cell should be able to do at least 9 per cycle. I imagine that a dedicated vector processor, of the sort that NEC used to make, can do tens of flops per cycle.
Furthermore, many processor architectures have instructions to do several basic floating point instruction in one step. For instance, PowerPC has a one-cycle multiply-accumulate instruction (multiply and add in one step), so for marketing purposes, a PowerPC has twice the flops. Now, imagine if you have a vector processor that has a highly-optimized instruction for taking square roots or doing trig in one cycle. A square root operation will translate into dozens of basic flops (add, multiply, subtract). Such a processor might therefore be rated at 208 gigaflops even though its operating frequency is <1GHz.
>Imagine a Beowulf cluster of these!
/know/ that you're going to get modded down.
With a side order of hot grits!
A tip: if you can fit your message in the subject line, then do it, particularly when you
I remember back when that comment would have gotten +5 "Whoa duuuuude" mods.
Yet you can still get good mods if you say:
"A petaflop that fits in a closet for just $9M for the first one? You could make more for a couple million, at least by the time you got your [impressive knowlegeable-sounding ultra-tech adjectives] cluster interconnect together - why not spend a quarter of a billion and push the limits of computing out another couple orders of magnitude? This thing can do protein folding, so it can likely do bomb physics and a bunch of other big-money problems that can be represented in similar math."
Which translates to:
"Imagine a Beowulf cluster of these!"
"Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
Of course, if you compare USEFUL results, it's Folding@home: lots (over 50 papers), SETI: 0
The Japan box will be faster for a little while then Folding@home, but will also likely produce RESULTS instead of just alot of global warming.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/