Slashdot Mirror


Japan's Petaflop Supercomputer

slashthedot writes "Japan has built the fastest supercomputer in the world. While the BlueGene/L contains 130,000 processors, Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop."

19 of 161 comments (clear)

  1. Wow by 9x320 · · Score: 4, Funny

    Making that computer must have been harder than getting a story from MSN posted on the main page of Slashdot!

  2. Progress by Eightyford · · Score: 4, Informative

    It now costs 15 dollars per gigaflop. In the early 90s, a million dollars per gigaflop was normal.

  3. Incorrect chip count by Bushcat · · Score: 4, Informative

    The original article seems to be unreachable, so I can't read it, but the precis has the wrong chip count: It does have 4808 LSI chips, but it also has 19,122 Xeon processors.

    1. Re:Incorrect chip count by rgravina · · Score: 5, Informative

      This article here from Riken themselves has some more technical details:

      http://mdgrape.gsc.riken.jp/modules/tinyd0/index.p hp

  4. Purchasing Advice by ZachPruckowski · · Score: 4, Funny

    Will this run Vista at a decent speed, or should I wait for the Rev B and SP1?

  5. Uses a large walk-in closet? by StarWreck · · Score: 5, Interesting

    If this petaflop supercomputer really only costs $9 million and only occupies the space of a large walk-in closet, why don't they mass-produce it and sell it. No, not to individuals but to corporations and governments. Folding@Home and Seti@Home could suddenly be like, sorry guys we don't need you anymore - we got something better. Having hundreds of copies of this super computer could quickly solve problems across the globe that much slower supercomputers are currently having trouble with!

    --
    ... and in the DRM, bind them.
  6. Not just a flop by davidwr · · Score: 4, Funny

    NOT what the VP of Marketing wants to hear:

    "Not just a flop, but a flop a million billion times over."

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  7. Re:machines like this by x2A · · Score: 4, Insightful

    Having a computer do something very very fast is only of any use if you have the software to do what you want done very very fast. As far as I know, the hard part of what you suggest is writing such capable software, not running it.

    --
    The revolution will not be televised... but it will have a page on Wikipedia
  8. 9 million? by jacklebot · · Score: 4, Insightful

    Great. 9 million dollars to build the thing, 15 million dollars to build the infastructure to power and cool it, probably.

  9. Re:Say what?!? by hattig · · Score: 5, Informative

    The Cell processor can do ~200 GFLOPS - not IEEE quality FLOPS however, however they're 'good enough single precision FLOPs' for it's target uses. This is probably why this new supercomputer won't get into the Top500 list, because it's very specialised and thus probably nowhere near as good at IEEE conformant calculations.

    The Cell processor is not running at 200GHz. There's this concept called 'parallelisation', it's how your graphics card can do dozens, if not hundreds, of operations per clock cycle. In Cell's case it can do 8 (number of SPUs) * 4 (128-bit registers, SIMD) * 2 (units) = 64 SP FLOPS per clock cycle, and that's not including the PPU which has VMX128 and an FPU itself.

    However make the Cell processor calculate IEEE conformant FLOPS, and it gets a double precision score of around 20GFLOPS. Still good though.

    The above was from memory, details may vary, figures are roughly correct, YMMV, etc.

  10. Re:Our penis so small, your american penis so larg by tomhudson · · Score: 4, Informative

    "Show me the MFlops/Watt rating of this?"

    No problemo!

    The number of flops: (10 ^ 15) / 4808 = about 207,986,688,852 flops per chip, - from a previous poster.
    The number of watts: 300,000 - from the manufacturers' site = 62 watts/chip
    207,986,688,852 / 62 = 33,546,240 flops (33 MFlops) / watt.

  11. Re:machines like this by NewbieProgrammerMan · · Score: 3, Informative
    If the resources are available to crack rc5, to do distributed based work on a cure for cancer, and crunch data captured from radio antennas in search of little green men from mars, then I think we have the know-how necessary get some thing like this up and running.

    Well the examples that you mention are not really the same as "attempting to break software and search for problems long before release." If I understand these issues correctly: (1) (with apologies to crypto specialists) RC5 cracking required lots of CPU time to factor a big-ass number, (2) projects like Folding@Home aren't "looking for a cure for cancer," they're running (I think) quantum chemistry simulations to find out how certain molecules can act in certain situations, and (3) SETI@Home is looking for specific patterns in signal data. In all three of these cases, there's a few common (maybe not so simple) operations that need to be applied to a large set of data or initial conditions, and that's why they need lots of machines, or fast machines.

    Figuring out how clever people will take advantage of a particular implementation of a web browser or TCP/IP stack is a completely different class of problem IMHO. Yeah, maybe there's some clever AI techniques that may simulate attack attempts, and maybe they could come up with attacks that nobody has thought of yet, but a really fast computer will not somehow magically solve these kinds of problems for us. There's a lot of hard science and software engineering that needs to be done first.

    --
    [b.belong('us') for b in bases if b.owner() == 'you']
  12. Re:Our penis so small, your american penis so larg by NewbieProgrammerMan · · Score: 4, Insightful

    Oh, please. This machine only uses 300kW - that's maybe the equivalent of 150 American homes. These folks are building a specialized (as in not "more of the same") machine to support a particular bit of science (molecular dynamics simulations) that isn't gonna make for flashy headlines, and I say more power to them. I'd rather there were more scientists out there doing basic research that may actually be useful, than have them chasing after stuff for headlines that will make you happy.

    And if you're trolling, yeah, you got me, so congratulations.

    --
    [b.belong('us') for b in bases if b.owner() == 'you']
  13. Re:Say what?!? by Hollinger · · Score: 3, Informative
    Yeah, it's a bit obvious that you didn't.

    Quoting another link you can see how they reached these numbers (which I take issue with):
    The following figure shows the block diagram of the MDGRAPE-3 chip. It consists of 20 force calculation pipelines, a j-particle memory unit, a cell-index controller, a master controller, and a force summation unit. The force calculation pipeline is the most important part of the chip which performs calculations of two-body forces such as Coulomb and van der Waals forces. Each pipeline performs 33 equivalent floating point operations per cycle when it calculates Coulomb force. Thus, when it operates at 250 MHz its performance will reach 165 Gflops with 20 pipelines. The chip also has the j-particle memory unit, which corresponds to the main memory of the CPU. Therefore, no extra memory is needed to attached with the chip.

    - http://mdgrape.gsc.riken.jp/modules/tinyd0/index.p hp

    With that answered, I'm confused. Another poster sent along that link which explains what Riken will do. I'm confused about that actually. Reading the page, based on the verb usage, either someone didn't understand future and past tense (possible, but unlikely), or they haven't built the entire box yet. Perhaps I'm reading a bit too much into it... it's quite possible that someone simply hasn't updated the website.

    Based on the webpage, all of the calculations to reach 1 petaflop are based on theoretical peak performance measurements, extrapolated from the theoretical peak of a single special-purpose ASIC which has been built, but may or may not have been actually placed into a fully configured system. Nothing talks about measured benchmarks, and the OP's article contains the same theoretical extrapolated numbers.

    Anyone know if they've actually built it?

    ~ Mike
  14. Re:Apparent source page for device data by Traiklin · · Score: 3, Funny

    but I thought Japan already had a lot of studys on protein?

    I've seen the videos of it a few times and stumbled across entire collections of them! they call it something like bukkake.

  15. Specialised by SamAdam3d · · Score: 3, Informative

    The problem with that is that this computer is very specialised to molecular simulations. It can't very easily do other things, like seti or folding (okay, well, maybe that it can do). It was easy to design and cheap because it didn't have to be general purpose and adaptable, like BlueGene/L is.

    --
    I love deadlines. I like the whooshing sound they make as they fly by. - Douglas Adams
  16. giga not tera by tetromino · · Score: 3, Insightful

    (10^15)/4808 = 207 986 688 852, i.e. ~208 billion flops, i.e. if the chip executed only 1 instruction per clock, it would be 208GHz (not THz as you imply). Except of course the chip does more than 1 instruction per clock. Modern x86 chips do multiple flops per cycle. A Cell should be able to do at least 9 per cycle. I imagine that a dedicated vector processor, of the sort that NEC used to make, can do tens of flops per cycle.

    Furthermore, many processor architectures have instructions to do several basic floating point instruction in one step. For instance, PowerPC has a one-cycle multiply-accumulate instruction (multiply and add in one step), so for marketing purposes, a PowerPC has twice the flops. Now, imagine if you have a vector processor that has a highly-optimized instruction for taking square roots or doing trig in one cycle. A square root operation will translate into dozens of basic flops (add, multiply, subtract). Such a processor might therefore be rated at 208 gigaflops even though its operating frequency is <1GHz.

  17. Re:Imagine... by Savantissimo · · Score: 4, Funny

    >Imagine a Beowulf cluster of these!

    With a side order of hot grits!
    A tip: if you can fit your message in the subject line, then do it, particularly when you /know/ that you're going to get modded down.

    I remember back when that comment would have gotten +5 "Whoa duuuuude" mods.

    Yet you can still get good mods if you say:
      "A petaflop that fits in a closet for just $9M for the first one? You could make more for a couple million, at least by the time you got your [impressive knowlegeable-sounding ultra-tech adjectives] cluster interconnect together - why not spend a quarter of a billion and push the limits of computing out another couple orders of magnitude? This thing can do protein folding, so it can likely do bomb physics and a bunch of other big-money problems that can be represented in similar math."

    Which translates to:
    "Imagine a Beowulf cluster of these!"

    --
    "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
  18. Re:Efficiency by Duncan3 · · Score: 3, Insightful
    To put that into perspective, consider that the Blue Gene/L has 65536 processors. seti@home has over a million hosts and folding@home has a couple hundred thousand more.
    Try comparing active hosts to active host. SETI "active" means anyone they have ever seen, and always has. Just compare TFLOPS. Folding@home has been larger for a very long time, tho SETI may be catching up, depending on how much you bend their stats.

    Of course, if you compare USEFUL results, it's Folding@home: lots (over 50 papers), SETI: 0

    The Japan box will be faster for a little while then Folding@home, but will also likely produce RESULTS instead of just alot of global warming.
    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/