Slashdot Mirror


Japan's Petaflop Supercomputer

slashthedot writes "Japan has built the fastest supercomputer in the world. While the BlueGene/L contains 130,000 processors, Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop."

39 of 161 comments (clear)

  1. Wow by 9x320 · · Score: 4, Funny

    Making that computer must have been harder than getting a story from MSN posted on the main page of Slashdot!

  2. Progress by Eightyford · · Score: 4, Informative

    It now costs 15 dollars per gigaflop. In the early 90s, a million dollars per gigaflop was normal.

  3. machines like this by Neuropol · · Score: 2, Interesting

    should be used in conjunction with the topic from the previous article. Creating coutless means by which, to not only find vulnerabilities in things like Javascript, but equally, construct fixes to those vulnerabilities. Once it creates an open door, it generates the fix for closing it and keeping it closed. Machines like this can think thousands of times faster than your average black-hat-crackah, so why not use them as a fight fire with fire tool?

    Every one is so concerned with internet safety, on would think that at some point massive resources with be set forth in order to effectively deal with the flaw finding few out there making it difficult for the rest of to simply enjoy the benefits of the internet.

    1. Re:machines like this by x2A · · Score: 4, Insightful

      Having a computer do something very very fast is only of any use if you have the software to do what you want done very very fast. As far as I know, the hard part of what you suggest is writing such capable software, not running it.

      --
      The revolution will not be televised... but it will have a page on Wikipedia
    2. Re:machines like this by NewbieProgrammerMan · · Score: 3, Informative
      If the resources are available to crack rc5, to do distributed based work on a cure for cancer, and crunch data captured from radio antennas in search of little green men from mars, then I think we have the know-how necessary get some thing like this up and running.

      Well the examples that you mention are not really the same as "attempting to break software and search for problems long before release." If I understand these issues correctly: (1) (with apologies to crypto specialists) RC5 cracking required lots of CPU time to factor a big-ass number, (2) projects like Folding@Home aren't "looking for a cure for cancer," they're running (I think) quantum chemistry simulations to find out how certain molecules can act in certain situations, and (3) SETI@Home is looking for specific patterns in signal data. In all three of these cases, there's a few common (maybe not so simple) operations that need to be applied to a large set of data or initial conditions, and that's why they need lots of machines, or fast machines.

      Figuring out how clever people will take advantage of a particular implementation of a web browser or TCP/IP stack is a completely different class of problem IMHO. Yeah, maybe there's some clever AI techniques that may simulate attack attempts, and maybe they could come up with attacks that nobody has thought of yet, but a really fast computer will not somehow magically solve these kinds of problems for us. There's a lot of hard science and software engineering that needs to be done first.

      --
      [b.belong('us') for b in bases if b.owner() == 'you']
  4. Efficiency by Eightyford · · Score: 2, Interesting

    The article says that this machine is much more efficient than other supercomputers. Is it actually cheaper to run large programs like SETI@HOME on a supercomputer? Electricity isn't cheap.

    1. Re:Efficiency by Jerry+Coffin · · Score: 2, Interesting
      Is it actually cheaper to run large programs like SETI@HOME on a supercomputer?
      This computer is efficient at what it does largely because it's extremely specialized. It's built specifically for working on molecular dynamics, but from the looks of things, it's probably close to useless for nearly anything else.

      As such, it would probably work quite nicely for Stanford's folding@home project (which studies protein folding, i.e. molecular dynamics). It probably would not work very well for seti@home, because SETI isn't studying molecular dynamics, and it would probably be difficult to cast the problems they're working on into a form that would "look" enough like molecular dynamics to work well on this machine (this, BTW, is why this machine probably shouldn't go onto the top500 list or anything like that -- it's really not a general purpose computer at all).

      As far as using other supercomputers for these kinds of jobs, here's what the folding@home FAQ has to say about it (from the F@H FAQ):

      Why not just use a supercomputer? Modern supercomputers are essentially clusters of hundreds of processors linked by fast networking. The speed of these processors is comparable to (and often slower than) those found in PCs! Thus, if an algorithm (like ours) does not need the fast networking, it will run just as fast on a supercluster as a supercomputer. However, our application needs not the hundreds of processors found in modern supercomputers, but hundreds of thousands of processors. Hence, the calculations performed on Folding@Home would not be possible by any other means! Moreover, even if we were given exclusive access to all of the supercomputers in the world, we would still have fewer cycles than we do with the Folding@Home cluster! This is possible since PC processors are now very fast and there are hundreds of millions of PCs sitting idle in the world.

      To put that into perspective, consider that the Blue Gene/L has 65536 processors. seti@home has over a million hosts and folding@home has a couple hundred thousand more. As the quote above notes, most supercomputers aren't drastically faster on a per-processor basis than PCs -- not nearly enough to make up this deficiency in sheer number of processors.

      My guess is that the Blue Gene/L is probably somewhat more power efficient than the average contributor to seti@home or folding@home -- but mostly because the majority of the latter are probably Pentium 4's, which are notoriously inefficient in terms of power usage. As the world transitions away from the Netbust architecture, it's nearly certain that the efficiency of seti@home, folding@home, etc., will go up (considerably).

      That brings up another point worth considering: the way things are right now, the computers used for seti@home, folding@home, BOINC, etc., get updated on quite a regular basis. If they spent millions of dollars for a single fast machine, it would might be more efficient right now -- but in a few years it would fall behind the curve -- but most budget committees (and such) would be reluctant to spend millions of dollars to replace it simply because something better was available.

      --
      The universe is a figment of its own imagination.
    2. Re:Efficiency by Duncan3 · · Score: 3, Insightful
      To put that into perspective, consider that the Blue Gene/L has 65536 processors. seti@home has over a million hosts and folding@home has a couple hundred thousand more.
      Try comparing active hosts to active host. SETI "active" means anyone they have ever seen, and always has. Just compare TFLOPS. Folding@home has been larger for a very long time, tho SETI may be catching up, depending on how much you bend their stats.

      Of course, if you compare USEFUL results, it's Folding@home: lots (over 50 papers), SETI: 0

      The Japan box will be faster for a little while then Folding@home, but will also likely produce RESULTS instead of just alot of global warming.
      --
      - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
  5. Re:Yeah by paganizer · · Score: 2, Funny

    Not unless that is what they are going to use to render the tentacle porn; it IS a Japanese Supercomputer, after all.
    Y'know, I have a feeling I should really post this as anonymous coward.

    --
    Why, yes, I AM a Pagan Libertarian.
  6. Incorrect chip count by Bushcat · · Score: 4, Informative

    The original article seems to be unreachable, so I can't read it, but the precis has the wrong chip count: It does have 4808 LSI chips, but it also has 19,122 Xeon processors.

    1. Re:Incorrect chip count by Savantissimo · · Score: 2, Informative

      I read the article - don't waste your time. No doubt it's a cool machine, but the artile was the flimsiest puff-piece I've ever seen linked on Slashdot. Complete lack of technical detail, moron-level explainations of common terms - I feel stupider having read it.

      Are there any good articles on this machine that anyone would care to share?

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    2. Re:Incorrect chip count by rgravina · · Score: 5, Informative

      This article here from Riken themselves has some more technical details:

      http://mdgrape.gsc.riken.jp/modules/tinyd0/index.p hp

  7. Purchasing Advice by ZachPruckowski · · Score: 4, Funny

    Will this run Vista at a decent speed, or should I wait for the Rev B and SP1?

  8. Uses a large walk-in closet? by StarWreck · · Score: 5, Interesting

    If this petaflop supercomputer really only costs $9 million and only occupies the space of a large walk-in closet, why don't they mass-produce it and sell it. No, not to individuals but to corporations and governments. Folding@Home and Seti@Home could suddenly be like, sorry guys we don't need you anymore - we got something better. Having hundreds of copies of this super computer could quickly solve problems across the globe that much slower supercomputers are currently having trouble with!

    --
    ... and in the DRM, bind them.
  9. Not just a flop by davidwr · · Score: 4, Funny

    NOT what the VP of Marketing wants to hear:

    "Not just a flop, but a flop a million billion times over."

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  10. cheaper and more efficient by john_uy · · Score: 2, Insightful

    the supercomputer is quite cheap. they can probably sell a lot of these machines and will sweep the top500 list. however, it mentioned that the processor is specialized in doing astrophysics calculation. i am not sure if this will be useful for other fields.

    but the good think about it is that it is more energy efficient. it seems the trend in desktop/servers right now are also going to the supercomputers. maybe they could include a performance per watt ratio in the top500 list as well.

    --
    Live your life each day as if it was your last.
  11. Say what?!? by mosel-saar-ruwer · · Score: 2, Informative


    Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips...

    FLOP = floating operation [per second].

    PETA = 10 ^ 15, or "a quadrillion".

    (10 ^ 15) / 4808 = about 207,986,688,852, which would indicate that each chip is running at several hundred TERA-hertz [and, even then, the machine would have to possess an operating system so efficient that it could consistently perform one floating point operation per clock increment, which seems extraordinarily unlikely].

    Or is this an "analog" computer and are these "analog" FLOPS?

    And no, I did not RTFA.

    1. Re:Say what?!? by hattig · · Score: 5, Informative

      The Cell processor can do ~200 GFLOPS - not IEEE quality FLOPS however, however they're 'good enough single precision FLOPs' for it's target uses. This is probably why this new supercomputer won't get into the Top500 list, because it's very specialised and thus probably nowhere near as good at IEEE conformant calculations.

      The Cell processor is not running at 200GHz. There's this concept called 'parallelisation', it's how your graphics card can do dozens, if not hundreds, of operations per clock cycle. In Cell's case it can do 8 (number of SPUs) * 4 (128-bit registers, SIMD) * 2 (units) = 64 SP FLOPS per clock cycle, and that's not including the PPU which has VMX128 and an FPU itself.

      However make the Cell processor calculate IEEE conformant FLOPS, and it gets a double precision score of around 20GFLOPS. Still good though.

      The above was from memory, details may vary, figures are roughly correct, YMMV, etc.

    2. Re:Say what?!? by bloosqr · · Score: 2, Interesting

      Yea its specialized hardware, the mdgrape basically calculates Newton's law in the hardware so it does the inverse ^2 calculation really super fast. There used to be a md-grape equivalent which did the same thing for coulombs law (as you would think there is more money in doing biosims than astrosims), but i think that died as the market was too small.

      I think this was an ibm/fujitsu collaboration and ibm had md-grape and dropped it because of the market and fujitsu is still making the grape..

      FYI the reason even though it is specialized, this is cool is that any simulation you want to do classically (i.e gravity, coulomb), basically goes as N^2 where N is the number of things (i.e. you have to calculation the interaction btwn each thing and every other thing, so there are lots of tricks to make approximations (clever versions of stuff far away doesn't matter so much). This goes up fast as simulations get bigger, hence the GRAPE tricks, which let people do monster simulations as if they had terahertz machines!

      (On the other hand some people will object the "approximations" make real simulations go as N log N, so its not like we were all twiddling our thumbs waiting around for GRAPE)

    3. Re:Say what?!? by Hollinger · · Score: 3, Informative
      Yeah, it's a bit obvious that you didn't.

      Quoting another link you can see how they reached these numbers (which I take issue with):
      The following figure shows the block diagram of the MDGRAPE-3 chip. It consists of 20 force calculation pipelines, a j-particle memory unit, a cell-index controller, a master controller, and a force summation unit. The force calculation pipeline is the most important part of the chip which performs calculations of two-body forces such as Coulomb and van der Waals forces. Each pipeline performs 33 equivalent floating point operations per cycle when it calculates Coulomb force. Thus, when it operates at 250 MHz its performance will reach 165 Gflops with 20 pipelines. The chip also has the j-particle memory unit, which corresponds to the main memory of the CPU. Therefore, no extra memory is needed to attached with the chip.

      - http://mdgrape.gsc.riken.jp/modules/tinyd0/index.p hp

      With that answered, I'm confused. Another poster sent along that link which explains what Riken will do. I'm confused about that actually. Reading the page, based on the verb usage, either someone didn't understand future and past tense (possible, but unlikely), or they haven't built the entire box yet. Perhaps I'm reading a bit too much into it... it's quite possible that someone simply hasn't updated the website.

      Based on the webpage, all of the calculations to reach 1 petaflop are based on theoretical peak performance measurements, extrapolated from the theoretical peak of a single special-purpose ASIC which has been built, but may or may not have been actually placed into a fully configured system. Nothing talks about measured benchmarks, and the OP's article contains the same theoretical extrapolated numbers.

      Anyone know if they've actually built it?

      ~ Mike
  12. Re:1,500 $ by Eightyford · · Score: 2, Insightful
    But, you still can't get 100 gigaflops for 1,500 dollars. :(
    I'm sure Sony's PS3 will be advertised as having 1000 gigaflops for a few hundred dollars.
  13. 9 million? by jacklebot · · Score: 4, Insightful

    Great. 9 million dollars to build the thing, 15 million dollars to build the infastructure to power and cool it, probably.

    1. Re:9 million? by AC-x · · Score: 2, Insightful

      "Riken's machine occupies the space of a large walk-in closet and is an energy-sipper"

      Remember the green cross code: Stop, Read, then Post.

  14. Our penis so small, your american penis so large.. by tomstdenis · · Score: 2, Insightful

    Nuff said.

    Where are the really neato results we should be getting from these? I'm tired of "Country X builds massive TeraWatt computer system." I want to read about "Country X mapped the cancer genome" or some such.

    Besides, these are relatively not impressive. Sure in the 50s, 60s, 70s, 80s we were maturing the technology. Inventing new technology, analyzing it, etc. Now it's more of the same. Huge budget, lots of space and infiniband connections...

    Show me the MFlops/Watt rating of this? Are they improving it? Are we wasting less resources? The irony of this is they pollute by wasting tons of energy, all so we can predict global warming or whatever.

    Tom

    --
    Someday, I'll have a real sig.
  15. 4808 chips -- Alas, it is still bottlenecked by... by nethneta · · Score: 2, Funny

    ...its Geforce MX 420.

  16. For once the subtitle is right on by John+Muir · · Score: 2, Informative

    ROFL at the "From the renders-a-million-tentacles-a-minute dept" ... nice choice!

  17. Re:Our penis so small, your american penis so larg by tomhudson · · Score: 4, Informative

    "Show me the MFlops/Watt rating of this?"

    No problemo!

    The number of flops: (10 ^ 15) / 4808 = about 207,986,688,852 flops per chip, - from a previous poster.
    The number of watts: 300,000 - from the manufacturers' site = 62 watts/chip
    207,986,688,852 / 62 = 33,546,240 flops (33 MFlops) / watt.

  18. Re:Our penis so small, your american penis so larg by NewbieProgrammerMan · · Score: 4, Insightful

    Oh, please. This machine only uses 300kW - that's maybe the equivalent of 150 American homes. These folks are building a specialized (as in not "more of the same") machine to support a particular bit of science (molecular dynamics simulations) that isn't gonna make for flashy headlines, and I say more power to them. I'd rather there were more scientists out there doing basic research that may actually be useful, than have them chasing after stuff for headlines that will make you happy.

    And if you're trolling, yeah, you got me, so congratulations.

    --
    [b.belong('us') for b in bases if b.owner() == 'you']
  19. Re:Apparent source page for device data by Traiklin · · Score: 3, Funny

    but I thought Japan already had a lot of studys on protein?

    I've seen the videos of it a few times and stumbled across entire collections of them! they call it something like bukkake.

  20. glxgears by jonathansizz · · Score: 2, Funny
    Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop.
    Wow! I bet it gets loads of fps in glxgears!
  21. Not even close! by bockelboy · · Score: 2, Insightful

    You've all been had by a reporter with an overactive imagination talking to a researcher selling his own shit. The MDGrape is a specialized processor (you can actually buy it commercially as a separate board for your computer) that does exactly one thing: particle simulation using traditional laws of physics. This will allow it to do computational molecular dynamics on the small scale or universe modeling on the large scale. All it understands is data input in the form of particle positions and will output the new positions in the next time step. Can you place two numbers in a register and ask it to add the results? No. Can it do any piece of the HPL benchmark required to get on the supercomputing list? No. It does one thing, but it does it well. This whole article is like comparing the rendering capabilities of your new Nvidia GPU and the latest AMD CPU, then concluding AMD is full of idiots who can't engineer because the Nvidia chip renders more polygons.

  22. Specialised by SamAdam3d · · Score: 3, Informative

    The problem with that is that this computer is very specialised to molecular simulations. It can't very easily do other things, like seti or folding (okay, well, maybe that it can do). It was easy to design and cheap because it didn't have to be general purpose and adaptable, like BlueGene/L is.

    --
    I love deadlines. I like the whooshing sound they make as they fly by. - Douglas Adams
  23. giga not tera by tetromino · · Score: 3, Insightful

    (10^15)/4808 = 207 986 688 852, i.e. ~208 billion flops, i.e. if the chip executed only 1 instruction per clock, it would be 208GHz (not THz as you imply). Except of course the chip does more than 1 instruction per clock. Modern x86 chips do multiple flops per cycle. A Cell should be able to do at least 9 per cycle. I imagine that a dedicated vector processor, of the sort that NEC used to make, can do tens of flops per cycle.

    Furthermore, many processor architectures have instructions to do several basic floating point instruction in one step. For instance, PowerPC has a one-cycle multiply-accumulate instruction (multiply and add in one step), so for marketing purposes, a PowerPC has twice the flops. Now, imagine if you have a vector processor that has a highly-optimized instruction for taking square roots or doing trig in one cycle. A square root operation will translate into dozens of basic flops (add, multiply, subtract). Such a processor might therefore be rated at 208 gigaflops even though its operating frequency is <1GHz.

  24. Not Quite Progress Yet by TubeSteak · · Score: 2, Informative
    From Page 2 of TFA
    No other supercomputer at the top of the rankings can muster so much calculating brawn on such a tiny budget. That's partly because MDGrape-3 relies on fewer chips and less circuitry than rivals. It's also because the chief scientist, It's also because the chief scientist, Dr. Makoto Taiji, working with only two other researchers, had plenty of help from Hitachi, Intel, and NEC subsidiary SGI Japan.

    Those companies supplied the hardware -- Hitachi made the central processing unit, or CPU -- and absorbed part of the cost of building the machine. One measure of the MDGrape-3's ultra-efficient computing muscle is its cost per gigaflop (1 billion floating-point calculations per second), which Riken puts at $15.
    Only if you're getting subsidised by 3 global corporations.

    If it costs $15/gigaflop, then they would have paid... $15 million
    A $6 million subsidy (40%) isn't small change.
    --
    [Fuck Beta]
    o0t!
  25. Re:Imagine... by Savantissimo · · Score: 4, Funny

    >Imagine a Beowulf cluster of these!

    With a side order of hot grits!
    A tip: if you can fit your message in the subject line, then do it, particularly when you /know/ that you're going to get modded down.

    I remember back when that comment would have gotten +5 "Whoa duuuuude" mods.

    Yet you can still get good mods if you say:
      "A petaflop that fits in a closet for just $9M for the first one? You could make more for a couple million, at least by the time you got your [impressive knowlegeable-sounding ultra-tech adjectives] cluster interconnect together - why not spend a quarter of a billion and push the limits of computing out another couple orders of magnitude? This thing can do protein folding, so it can likely do bomb physics and a bunch of other big-money problems that can be represented in similar math."

    Which translates to:
    "Imagine a Beowulf cluster of these!"

    --
    "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
  26. Not comparable by News+for+nerds · · Score: 2, Informative

    Though the theoretical performance of this computer is higher than that of BlueGene and may have higher realworld performance too, you can't compare this supercomputer with BlueGene and other TOP500 supercomputers since it can't run LINPACK. It's just too specialized for its use.

  27. Comparison MDGrape-3, BlueGene/L & Earth Simul by yalla · · Score: 2, Informative

    I compiled some quick facts which compare those three supercomputers and added pointers to other resources for your convenience:
    http://www.bloglines.com/blog/ITnomad?id=126

    Cheers, Alex.

    --
    You look like a million dollars. All green and wrinkled.
  28. Idiotic summery. by imsabbel · · Score: 2, Informative

    This computer, like all the previous (md)grape generations, is a central force potential calculation accelerator.

    it does nothing but calculate 1/sqrt(dx^2+dy^2+dz^2)*variable, but really really often.

    Grape 6, 5 years or so ago, was already running at 200Mhz, had a throughput of one force calculation per pipleline and 6 pipelines on once chip. So it counts as 1.2 billion force calculations, each being (1* inverse, 1 sqrt, 3 adds, 3 squares, 2 fmul, ect).
    A lot of flops, but totally useless as general purpose computers.

    --
    HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  29. Re:Singularity by DeXOR · · Score: 2, Funny

    Yes, we must avoid a singularity gap!