Slashdot Mirror


Japan's Petaflop Supercomputer

slashthedot writes "Japan has built the fastest supercomputer in the world. While the BlueGene/L contains 130,000 processors, Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop."

161 comments

  1. Imagine... by Anonymous Coward · · Score: 0, Funny

    Imagine a Beowulf cluster of these!

    1. Re:Imagine... by Savantissimo · · Score: 4, Funny

      >Imagine a Beowulf cluster of these!

      With a side order of hot grits!
      A tip: if you can fit your message in the subject line, then do it, particularly when you /know/ that you're going to get modded down.

      I remember back when that comment would have gotten +5 "Whoa duuuuude" mods.

      Yet you can still get good mods if you say:
        "A petaflop that fits in a closet for just $9M for the first one? You could make more for a couple million, at least by the time you got your [impressive knowlegeable-sounding ultra-tech adjectives] cluster interconnect together - why not spend a quarter of a billion and push the limits of computing out another couple orders of magnitude? This thing can do protein folding, so it can likely do bomb physics and a bunch of other big-money problems that can be represented in similar math."

      Which translates to:
      "Imagine a Beowulf cluster of these!"

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    2. Re:Imagine... by Schraegstrichpunkt · · Score: 1

      But does it run Linux?

    3. Re:Imagine... by Anonymous Coward · · Score: 0

      In Soviet Russia, Linux runs you! /got nothin'

    4. Re:Imagine... by packeteer · · Score: 1

      Usually the Japanese are not to keen on working to make nuclear bombs.

      --
      unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep
  2. Wow by 9x320 · · Score: 4, Funny

    Making that computer must have been harder than getting a story from MSN posted on the main page of Slashdot!

    1. Re:Wow by rk87 · · Score: 0, Flamebait

      Almost as hard as the dupe thats about to appear in ... 7 minutes

      --
      I'M NOT ANGRY!
    2. Re:Wow by Jeff+DeMaagd · · Score: 1

      Better question: What's eating Gilbert MDGrape-3?

  3. Yeah by Eightyford · · Score: 0, Redundant

    Yeah but does it run Linux?

    1. Re:Yeah by paganizer · · Score: 2, Funny

      Not unless that is what they are going to use to render the tentacle porn; it IS a Japanese Supercomputer, after all.
      Y'know, I have a feeling I should really post this as anonymous coward.

      --
      Why, yes, I AM a Pagan Libertarian.
    2. Re:Yeah by Anonymous Coward · · Score: 0

      Actually they have been rendering it just fine all these years in countless sweatshops. I fear they are now capable to have the Physic engine for it. Hopefully they use it for figuring out how to make genetic alterations for producing cat girls instead.

    3. Re:Yeah by Anonymous Coward · · Score: 0
      Hopefully they use it for figuring out how to make genetic alterations for producing cat girls instead.

      I'll take one Mithra, please. With pink, long, curly hairs. You know who I'm talking about.
  4. Progress by Eightyford · · Score: 4, Informative

    It now costs 15 dollars per gigaflop. In the early 90s, a million dollars per gigaflop was normal.

    1. Re:Progress by ToasterofDOOM · · Score: 1

      Most of that power is in the GPU, and GPUs are extremely specialized and at the present are not very good at much of anything but graphics processing.

      --
      I am Spartacus
    2. Re:Progress by default+luser · · Score: 1

      I have a terrible feeling that I'm wrong, but will anyone be able to correct me?

      Sure.

      RSX: You could put a more powerful GPU into a PC and get better performance numbers, so why count GPU performance power? Also, you cannot do 64-bit floating-point math with ANY GPU at the moment, and has non-IEEE-standard accuracy, so remove it from the equation.

      Each SPE can do 25.6 Gflop/s theoretical (180 Gflop/s for all 7), but only for 32-bit (non IEEE-standard) values. For 64-bit accuracy, tests have shown the thoroughput is only about 1/10 the normal speed, or about 18 Gflop/s for all 7 SPEs.

      Theoretically, you could get that kind of throughput from a Core2Duo processor:

      (2x64-bit operations per 128-bit SSE pipeline) x (2 128-bit SSE pipes per processor) x (2 processors per die) x (approximately 2.6 GHz clock speed) = 20.8 Gflop/s for 64-bit values.

      This, of course, makes Cell even less spactacular. As always happenns, Sony announces some "incredible" new processor, but in the three years it taskes them to build it, work out the bugs and ramp up the yields, the x86 world quietly releases something just as good. In another year, a computer using one of these Core2Duo 2.6 GHz chips will be quite affordable, once the quad-core wars start.

      --

      Man is the animal that laughs.
      And occasionally whores for Karma.

  5. machines like this by Neuropol · · Score: 2, Interesting

    should be used in conjunction with the topic from the previous article. Creating coutless means by which, to not only find vulnerabilities in things like Javascript, but equally, construct fixes to those vulnerabilities. Once it creates an open door, it generates the fix for closing it and keeping it closed. Machines like this can think thousands of times faster than your average black-hat-crackah, so why not use them as a fight fire with fire tool?

    Every one is so concerned with internet safety, on would think that at some point massive resources with be set forth in order to effectively deal with the flaw finding few out there making it difficult for the rest of to simply enjoy the benefits of the internet.

    1. Re:machines like this by x2A · · Score: 4, Insightful

      Having a computer do something very very fast is only of any use if you have the software to do what you want done very very fast. As far as I know, the hard part of what you suggest is writing such capable software, not running it.

      --
      The revolution will not be televised... but it will have a page on Wikipedia
    2. Re:machines like this by Neuropol · · Score: 1

      agreed. as i think about it more, i feel that a computer like this would need to have that 'AI-like' tempmerment that would allow for the active 'thought' process to be learning what to check for vulnerabilities. A whole lot of if-then.

      If the resources are available to crack rc5, to do distributed based work on a cure for cancer, and crunch data captured from radio antennas in search of little green men from mars, then I think we have the know-how necessary get some thing like this up and running.

      It makes me wonder why there aren't more things in place that attempt break software and search for problems long before the program/function/feature is realeased. It seems that once code is written, and it functions well enough for release, it's out in the wild causing (and attempting to deal with) unforseen issues. So why not take the extra steps to totally complete the process and make sure the i's are dotted and the stinking t's are crossed.

    3. Re:machines like this by NewbieProgrammerMan · · Score: 3, Informative
      If the resources are available to crack rc5, to do distributed based work on a cure for cancer, and crunch data captured from radio antennas in search of little green men from mars, then I think we have the know-how necessary get some thing like this up and running.

      Well the examples that you mention are not really the same as "attempting to break software and search for problems long before release." If I understand these issues correctly: (1) (with apologies to crypto specialists) RC5 cracking required lots of CPU time to factor a big-ass number, (2) projects like Folding@Home aren't "looking for a cure for cancer," they're running (I think) quantum chemistry simulations to find out how certain molecules can act in certain situations, and (3) SETI@Home is looking for specific patterns in signal data. In all three of these cases, there's a few common (maybe not so simple) operations that need to be applied to a large set of data or initial conditions, and that's why they need lots of machines, or fast machines.

      Figuring out how clever people will take advantage of a particular implementation of a web browser or TCP/IP stack is a completely different class of problem IMHO. Yeah, maybe there's some clever AI techniques that may simulate attack attempts, and maybe they could come up with attacks that nobody has thought of yet, but a really fast computer will not somehow magically solve these kinds of problems for us. There's a lot of hard science and software engineering that needs to be done first.

      --
      [b.belong('us') for b in bases if b.owner() == 'you']
    4. Re:machines like this by x2A · · Score: 1

      There's a huge difference between searching for unknown problems in unknown places, and searching for solutions to set problems. The latter involves running a set function with different input values, and checking the result. The former involves several layer deep multi-property pattern recognition, and nothing comes anywhere near the brain at doing this.

      --
      The revolution will not be televised... but it will have a page on Wikipedia
    5. Re:machines like this by Anonymous Coward · · Score: 0

      What you suggest wouldn't really be a tractable computational problem.

      However, it is possible to use the power of computers to ensure software safety. It's called formal methods. Programming languages such as OCaml, SML, Alice, Haskell, and some others (Scala, Epigram, Omega, etc.) rely on certain mathematical foundations in such a way that when a program is compiled (or interpreted) in one of these languages, the compiler or interpreter basically performs proof checking to the effect that a runtime error will not occur. This occurs during the type checking phase of compilation/interpretation. You may have heard of the fact that strongly statically typed programs "do not go wrong" (assuming the type system is sound) -- well this is what that means. Of course, there are some cases in which they can go wrong, such as when they are not purely functional (this usually happens when doing IO -- Haskell is the only one of the above languages which is purely functional).

      So the solution to security problems of the sort you speak of is not to write programs in insecure languages and then waste horrendous amounts of cpu cycles trying to determine security flaws, but rather to write programs in secure languages from the get go, and then perform relatively inexpensive checks (but nontheless, checks which would be tedious to do by hand) at compile time.

    6. Re:machines like this by Anonymous Coward · · Score: 0

      Hey !
      what's happening to the /. crowd ? nobody's called this guy an idiot yet.
      people actually replied and explained in serious tones why it could not be done to a question that once, if the Big Guy Above took pity on the poster, would have received a "troll" rating and no replies at all due to... well it's obvious really isn't it ?

      next thing you know people will start helping out with commands in xterms, downloading from anon cvs and building for those apps &&/!! distro combos that have no binary packages yet !

      that's awful, that's like.... like Anarchy, worse, Communism !

      Jeezuz... i'm outta here.

    7. Re:machines like this by Grismar · · Score: 1

      > [b.belong('us') for b in bases if b.owner() == 'you']

      There's a bug in your sig. I think it should be:

      b.belong('us') for b in base if b.owner() == 'you'

  6. Efficiency by Eightyford · · Score: 2, Interesting

    The article says that this machine is much more efficient than other supercomputers. Is it actually cheaper to run large programs like SETI@HOME on a supercomputer? Electricity isn't cheap.

    1. Re:Efficiency by TA · · Score: 1

      From what I've heard about this particular petaflop supercomputer in the past is that it isn't a general purpose computer even in principle. It's built for a special purpose and that single purpose is what it can do at petaflop speeds. Nothing else. BlueGene and those in the same range are a bit more general purpose, if you could call it that.

    2. Re:Efficiency by NewbieProgrammerMan · · Score: 0, Offtopic

      Well, in the case of SETI@Home, it wouldn't be cheaper to run on the supercomputer - SETI isn't paying for the power to run all those CPUs out there in people's homes and offices.

      --
      [b.belong('us') for b in bases if b.owner() == 'you']
    3. Re:Efficiency by chrisb33 · · Score: 1

      Right - it splits the cost among a large group of people, rather than having one organization pay for supercomputer time. Of course, they could try to fund this through donations, but I think that people are more likely to run a program for SETI than to start sending them checks.

    4. Re:Efficiency by Jerry+Coffin · · Score: 2, Interesting
      Is it actually cheaper to run large programs like SETI@HOME on a supercomputer?
      This computer is efficient at what it does largely because it's extremely specialized. It's built specifically for working on molecular dynamics, but from the looks of things, it's probably close to useless for nearly anything else.

      As such, it would probably work quite nicely for Stanford's folding@home project (which studies protein folding, i.e. molecular dynamics). It probably would not work very well for seti@home, because SETI isn't studying molecular dynamics, and it would probably be difficult to cast the problems they're working on into a form that would "look" enough like molecular dynamics to work well on this machine (this, BTW, is why this machine probably shouldn't go onto the top500 list or anything like that -- it's really not a general purpose computer at all).

      As far as using other supercomputers for these kinds of jobs, here's what the folding@home FAQ has to say about it (from the F@H FAQ):

      Why not just use a supercomputer? Modern supercomputers are essentially clusters of hundreds of processors linked by fast networking. The speed of these processors is comparable to (and often slower than) those found in PCs! Thus, if an algorithm (like ours) does not need the fast networking, it will run just as fast on a supercluster as a supercomputer. However, our application needs not the hundreds of processors found in modern supercomputers, but hundreds of thousands of processors. Hence, the calculations performed on Folding@Home would not be possible by any other means! Moreover, even if we were given exclusive access to all of the supercomputers in the world, we would still have fewer cycles than we do with the Folding@Home cluster! This is possible since PC processors are now very fast and there are hundreds of millions of PCs sitting idle in the world.

      To put that into perspective, consider that the Blue Gene/L has 65536 processors. seti@home has over a million hosts and folding@home has a couple hundred thousand more. As the quote above notes, most supercomputers aren't drastically faster on a per-processor basis than PCs -- not nearly enough to make up this deficiency in sheer number of processors.

      My guess is that the Blue Gene/L is probably somewhat more power efficient than the average contributor to seti@home or folding@home -- but mostly because the majority of the latter are probably Pentium 4's, which are notoriously inefficient in terms of power usage. As the world transitions away from the Netbust architecture, it's nearly certain that the efficiency of seti@home, folding@home, etc., will go up (considerably).

      That brings up another point worth considering: the way things are right now, the computers used for seti@home, folding@home, BOINC, etc., get updated on quite a regular basis. If they spent millions of dollars for a single fast machine, it would might be more efficient right now -- but in a few years it would fall behind the curve -- but most budget committees (and such) would be reluctant to spend millions of dollars to replace it simply because something better was available.

      --
      The universe is a figment of its own imagination.
    5. Re:Efficiency by Duncan3 · · Score: 3, Insightful
      To put that into perspective, consider that the Blue Gene/L has 65536 processors. seti@home has over a million hosts and folding@home has a couple hundred thousand more.
      Try comparing active hosts to active host. SETI "active" means anyone they have ever seen, and always has. Just compare TFLOPS. Folding@home has been larger for a very long time, tho SETI may be catching up, depending on how much you bend their stats.

      Of course, if you compare USEFUL results, it's Folding@home: lots (over 50 papers), SETI: 0

      The Japan box will be faster for a little while then Folding@home, but will also likely produce RESULTS instead of just alot of global warming.
      --
      - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    6. Re:Efficiency by Jerry+Coffin · · Score: 1
      Try comparing active hosts to active host. SETI "active" means anyone they have ever seen, and always has.

      Ah, I wasn't aware of that -- I mentioned SET primarily because the OP did. My own spare cycles all go to F@H...

      Of course, if you compare USEFUL results, it's Folding@home: lots (over 50 papers), SETI: 0

      Quite true -- and IMO, likely to remain that way (and thus, my decision about where to contribute...)

      --
      The universe is a figment of its own imagination.
  7. Incorrect chip count by Bushcat · · Score: 4, Informative

    The original article seems to be unreachable, so I can't read it, but the precis has the wrong chip count: It does have 4808 LSI chips, but it also has 19,122 Xeon processors.

    1. Re:Incorrect chip count by Savantissimo · · Score: 2, Informative

      I read the article - don't waste your time. No doubt it's a cool machine, but the artile was the flimsiest puff-piece I've ever seen linked on Slashdot. Complete lack of technical detail, moron-level explainations of common terms - I feel stupider having read it.

      Are there any good articles on this machine that anyone would care to share?

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    2. Re:Incorrect chip count by rgravina · · Score: 5, Informative

      This article here from Riken themselves has some more technical details:

      http://mdgrape.gsc.riken.jp/modules/tinyd0/index.p hp

    3. Re:Incorrect chip count by TubeSteak · · Score: 1
      Their custom chip draws 19 W at 350 MHz(fastest) or 16 W at 250 MHz(typical)using 130 nanometer tech.

      I wonder how much lower they could have pushed the power draw by using a 90nm or 65nm fab?

      note: The system "cost" $9 mil because... that's what their budget was. The chip builders ate some of the cost.

      --
      [Fuck Beta]
      o0t!
    4. Re:Incorrect chip count by Aeomer · · Score: 1

      No Xeons on the official list - are you an Intel Marketing Stooge trying to take the limelight?

    5. Re:Incorrect chip count by Aeomer · · Score: 1

      No Xeons - MDGrape-3 chips. We have developed the MDGRAPE-3 chip. It was fabricated by Hitachi Device Development Center HDL4N 0.13 um technology. It has 20 pipelines for force calculations which operate at 300 MHz at the typical case. The chip performs 660 equivalent-operations per cycle and has the peak performance of 198 Gflops. The power dissipation is 19 W at 350 MHz(fastest) or 16 W at 250 MHz(typical). Bet it scales better than Conroe

    6. Re:Incorrect chip count by Bushcat · · Score: 1
      In your two-post rebuttal you state "No Xeons on the official list - are you an Intel Marketing Stooge trying to take the limelight?" and "No Xeons - MDGrape-3 chips. We have developed the MDGRAPE-3 chip."

      OK, since your use of "we" suggests you are somehow involved (which I doubt), I checked the Riken site (http://www.rikenresearch.riken.jp/roundup/31/) which states

      "MDGRAPE-3 is a large system that consists of 201 units of 24 MDGRAPE-3 chips, 64 parallel servers each containing 256 of Intel's newest Xeon 5000-series processors (codename Dempsey), and 37 parallel servers each having 74 Intel Xeon 3.2GHz processors with 2MB L2 caches. Developed by RIKEN, the MDGRAPE-3 chip is the world's fastest LSI chip for simulation of molecular dynamics."

      Looks like there are Xeons in there, whether you like it or not.

  8. Purchasing Advice by ZachPruckowski · · Score: 4, Funny

    Will this run Vista at a decent speed, or should I wait for the Rev B and SP1?

  9. Uses a large walk-in closet? by StarWreck · · Score: 5, Interesting

    If this petaflop supercomputer really only costs $9 million and only occupies the space of a large walk-in closet, why don't they mass-produce it and sell it. No, not to individuals but to corporations and governments. Folding@Home and Seti@Home could suddenly be like, sorry guys we don't need you anymore - we got something better. Having hundreds of copies of this super computer could quickly solve problems across the globe that much slower supercomputers are currently having trouble with!

    --
    ... and in the DRM, bind them.
    1. Re:Uses a large walk-in closet? by ObsessiveMathsFreak · · Score: 1

      Having hundreds of copies of this super computer could quickly solve problems across the globe that much slower supercomputers are currently having trouble with!

      Because nobody is writing paralleisable code, or if you like, computer languages don't readily support multi-threaded code. It's always a construct verging on a hack that frequently goes horribly, horribly wrong. Until multi-threading in languages is as seamless and usable as calling a sub routin, parallel computing will never take off.

      --
      May the Maths Be with you!
    2. Re:Uses a large walk-in closet? by JDevers · · Score: 1

      This is far from a general purpose supercomputer. If you read the more technical article at http://mdgrape.gsc.riken.jp/modules/tinyd0/index.p hp you will see that this thing is designed from the ground up to do molecular dynamics. So while folding@home might be able to make some use out of it, none of the other distributed projects would.

    3. Re:Uses a large walk-in closet? by cmorriss · · Score: 1
      why don't they mass-produce it and sell it?

      The cost of this computer is actually much higher than $9 million. If you rtfa, you'll see that much of the computer was effectively donated by outside companies. The CPU design was done by Hitachi. Intel supplied other hardware as well as SGI Japan. None of this is factored into the $9 mil. It's likely that the actual cost was many multiples of that.

      --
      10 minutes working on a sig. What a waste.
    4. Re:Uses a large walk-in closet? by Greg+Lindahl · · Score: 1

      Er, all the computers on the Top500 run parallelized code.

      Typically, they use libraries (not built-in language features) to do it.

      And it's not done using multi-threading.

      What isn't that common yet is consumer apps that are parallelized. Scientific apps got there a decade ago.

    5. Re:Uses a large walk-in closet? by Sithgunner · · Score: 1

      not that everyone can buy those equipment when they need alot of computational power...
      so i dont know why seti and the rest wants to buy those with some unknown budget they'll never want to shell out for replacement thats already running.

  10. Not just a flop by davidwr · · Score: 4, Funny

    NOT what the VP of Marketing wants to hear:

    "Not just a flop, but a flop a million billion times over."

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  11. cheaper and more efficient by john_uy · · Score: 2, Insightful

    the supercomputer is quite cheap. they can probably sell a lot of these machines and will sweep the top500 list. however, it mentioned that the processor is specialized in doing astrophysics calculation. i am not sure if this will be useful for other fields.

    but the good think about it is that it is more energy efficient. it seems the trend in desktop/servers right now are also going to the supercomputers. maybe they could include a performance per watt ratio in the top500 list as well.

    --
    Live your life each day as if it was your last.
    1. Re:cheaper and more efficient by HeroreV · · Score: 1

      According to Wikipedia it doesn't qualify for the TOP500 list because it is not capable of running the LINPACK benchmark.

  12. 1,500 $ by G3ckoG33k · · Score: 1

    But, you still can't get 100 gigaflops for 1,500 dollars. :(

    1. Re:1,500 $ by Eightyford · · Score: 2, Insightful
      But, you still can't get 100 gigaflops for 1,500 dollars. :(
      I'm sure Sony's PS3 will be advertised as having 1000 gigaflops for a few hundred dollars.
    2. Re:1,500 $ by smallfries · · Score: 1, Interesting

      No? 'cos a GTX-7800 does 320Gflop/s and you could buy a few of those for $1500...

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    3. Re:1,500 $ by adam31 · · Score: 1
      It's actually 2000 GFlops for a few hundred dollars.


      And calling this a Petaflop supercomputer is similarly misleading, for roughly the same reason. The PS3 gets its 2 TF from the GPU, which can process 384 flops per cycle in an architecture built specifically to shade pixels. Likewise this MDGrape-3 is built at the hardware level to solve the n-Body problem, and that's it.

    4. Re:1,500 $ by Whiney+Mac+Fanboy · · Score: 1

      No? 'cos a GTX-7800 does 320Gflop/s and you could buy a few of those for $1500...

      True - but we're talking general purpose operations here.

      --
      There are shills on slashdot. Apparently, I'm one of them.
    5. Re:1,500 $ by smallfries · · Score: 1

      Well sure we could be. The actual supercomputer in the article isn't a general purpose machine although it does run the Linpack. They managed to get such a high performance by limiting the operations, much like a GPU. In more general terms a processor capable of 8Gflop/s can be had for about $100 - so general purpose flops would be about 120Gflop/s for the $1500. Not quite as impressive but still quite high...

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  13. Say what?!? by mosel-saar-ruwer · · Score: 2, Informative


    Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips...

    FLOP = floating operation [per second].

    PETA = 10 ^ 15, or "a quadrillion".

    (10 ^ 15) / 4808 = about 207,986,688,852, which would indicate that each chip is running at several hundred TERA-hertz [and, even then, the machine would have to possess an operating system so efficient that it could consistently perform one floating point operation per clock increment, which seems extraordinarily unlikely].

    Or is this an "analog" computer and are these "analog" FLOPS?

    And no, I did not RTFA.

    1. Re:Say what?!? by hattig · · Score: 5, Informative

      The Cell processor can do ~200 GFLOPS - not IEEE quality FLOPS however, however they're 'good enough single precision FLOPs' for it's target uses. This is probably why this new supercomputer won't get into the Top500 list, because it's very specialised and thus probably nowhere near as good at IEEE conformant calculations.

      The Cell processor is not running at 200GHz. There's this concept called 'parallelisation', it's how your graphics card can do dozens, if not hundreds, of operations per clock cycle. In Cell's case it can do 8 (number of SPUs) * 4 (128-bit registers, SIMD) * 2 (units) = 64 SP FLOPS per clock cycle, and that's not including the PPU which has VMX128 and an FPU itself.

      However make the Cell processor calculate IEEE conformant FLOPS, and it gets a double precision score of around 20GFLOPS. Still good though.

      The above was from memory, details may vary, figures are roughly correct, YMMV, etc.

    2. Re:Say what?!? by twiddlingbits · · Score: 1

      You assume 1 core per chip, it's quite possible that they have several cores per chip. Chips with 4 cores are now common, 8's on the horizon and 16's in the lab. Each CPU was special built for astrophysics calculations (not sure what that means..seems to me just to be lots of floating point) by Hitachi which absorbed the cost of the CPU development. Also, the chips may be able to work somewhat in parallel if the software is written that way which obviously will increase performance. So, I don't doubt the figures, I think they have just come up with a novel approach. Which, if you RTFA is exactly what the Architect of BlueGene had to say.

    3. Re:Say what?!? by Anonymous Coward · · Score: 0

      Your math is wrong - it assumes that 1 clock cycle = 1 second. A previous post mentioned 20 000 Xeons, but the article doesn't say say anything about that (just dual-core Xeons used for something). But let's just assume there's 4808 CPUs for now and only they can do floating point operations.

      10^15 / 4808 = 207 986 688 852 FLOPS/chip.
                Assume a conservative 1 GHz speed for each of the chips.
      (207 986 688 852 FLOPS/chip) / (10^6 Hz/s) = 207 987 FLOPS/chip, or 207 GFLOPS

      I'm not up on the current FLOP measurements of the new CPUs, but it doesn't seem a too far-fetched measurment. It's accomplished by having several FPUs in parallel. Also, unless the OS is doing heavy floating point intensive calculations itself (which it shouldn't be), there's nothing against the FPU getting used quite efficiently (dependant on how the fetch and dispatch units are designed).

    4. Re:Say what?!? by Anonymous Coward · · Score: 0
      Perhaps a bit of RTFA/RTFM is in order, then.

      The GRAPE processors have been around for some time, used (to date) in N-Body and other astrophysics simulations.

      You may want to peruse the following:
      Official GRAPE page

      It seems a bit of parallel pipelined execution when applied to dedicated math co-processors will
      take you much further per clock cycle in a specialized processor than in a 'generic' processor.


      So, no, they don't run at Terahertz per chip.

    5. Re:Say what?!? by Anonymous Coward · · Score: 0

      Sorry, the above should be 207 GFLOP/cycle. Maybe it is a bit far-fetched. I guess that's where the Xeon's come into play.

    6. Re:Say what?!? by bloosqr · · Score: 2, Interesting

      Yea its specialized hardware, the mdgrape basically calculates Newton's law in the hardware so it does the inverse ^2 calculation really super fast. There used to be a md-grape equivalent which did the same thing for coulombs law (as you would think there is more money in doing biosims than astrosims), but i think that died as the market was too small.

      I think this was an ibm/fujitsu collaboration and ibm had md-grape and dropped it because of the market and fujitsu is still making the grape..

      FYI the reason even though it is specialized, this is cool is that any simulation you want to do classically (i.e gravity, coulomb), basically goes as N^2 where N is the number of things (i.e. you have to calculation the interaction btwn each thing and every other thing, so there are lots of tricks to make approximations (clever versions of stuff far away doesn't matter so much). This goes up fast as simulations get bigger, hence the GRAPE tricks, which let people do monster simulations as if they had terahertz machines!

      (On the other hand some people will object the "approximations" make real simulations go as N log N, so its not like we were all twiddling our thumbs waiting around for GRAPE)

    7. Re:Say what?!? by Hollinger · · Score: 3, Informative
      Yeah, it's a bit obvious that you didn't.

      Quoting another link you can see how they reached these numbers (which I take issue with):
      The following figure shows the block diagram of the MDGRAPE-3 chip. It consists of 20 force calculation pipelines, a j-particle memory unit, a cell-index controller, a master controller, and a force summation unit. The force calculation pipeline is the most important part of the chip which performs calculations of two-body forces such as Coulomb and van der Waals forces. Each pipeline performs 33 equivalent floating point operations per cycle when it calculates Coulomb force. Thus, when it operates at 250 MHz its performance will reach 165 Gflops with 20 pipelines. The chip also has the j-particle memory unit, which corresponds to the main memory of the CPU. Therefore, no extra memory is needed to attached with the chip.

      - http://mdgrape.gsc.riken.jp/modules/tinyd0/index.p hp

      With that answered, I'm confused. Another poster sent along that link which explains what Riken will do. I'm confused about that actually. Reading the page, based on the verb usage, either someone didn't understand future and past tense (possible, but unlikely), or they haven't built the entire box yet. Perhaps I'm reading a bit too much into it... it's quite possible that someone simply hasn't updated the website.

      Based on the webpage, all of the calculations to reach 1 petaflop are based on theoretical peak performance measurements, extrapolated from the theoretical peak of a single special-purpose ASIC which has been built, but may or may not have been actually placed into a fully configured system. Nothing talks about measured benchmarks, and the OP's article contains the same theoretical extrapolated numbers.

      Anyone know if they've actually built it?

      ~ Mike
    8. Re:Say what?!? by InfiniteWisdom · · Score: 1

      (10 ^ 15) / 4808 = about 207,986,688,852, which would indicate that each chip is running at several hundred TERA-hertz

      It implies nothing of the sort. A single chip could have several floating point pipelines.

    9. Re:Say what?!? by Anonymous Coward · · Score: 0

      I'm thrilled that future processors always have zero overhead, as if by magic. The algorithms are always perfect and memory is always unified with zero latency.

      I'm starting to distrust anything that Sony is involved with (the Playstation 2's performance reality, and the "Playstation 9" massively stupid marketing), compounded by the reports that most Cell units are rolling off the assembly line with only 7 functional processing units out of 8.

      It would be interesting to see some actual benchmarks of sustained "real-world" hardware performance - but I'm betting that they're going to be significantly worse than the paper benchmarks.

    10. Re:Say what?!? by Anonymous Coward · · Score: 0

      then go and fucking read it you lazy cunt.

    11. Re:Say what?!? by aminorex · · Score: 1

      Each of 5832 300-MHz execution units does 660 parallel
      floating-point operations per cycle, for 1.15 e 15 flops/sec.
      The Xeons do not contribute to the total; they essentially
      act as the microcode program that tells the vector units
      what to do next.

      While optimized for moldyn, it would be readily repurposed
      for a wide range of large-scale computations, including
      solving massive ensembles of linear systems. Indeed, I
      would be quite pleased to write a Fortran-2005 compiler or
      a Matlab compiler for this beast, if anyone wanted to fund
      such an endeavor. I was the tech lead for the CM-5
      C* and Fortran compilers about 8 years ago.

      I speak a little Japanese and would enjoy the opportunity
      to gain full mastery.

      Send offers to parallelcompilers (at) southoftheclouds (dot) net.

      --
      -I like my women like I like my tea: green-
  14. Petaflop? by Timesprout · · Score: 1

    Does that mean its a giant cluster of unwanted aibos?

    --
    Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
    What truth?
    There is no dupe
  15. 9 million? by jacklebot · · Score: 4, Insightful

    Great. 9 million dollars to build the thing, 15 million dollars to build the infastructure to power and cool it, probably.

    1. Re:9 million? by AC-x · · Score: 2, Insightful

      "Riken's machine occupies the space of a large walk-in closet and is an energy-sipper"

      Remember the green cross code: Stop, Read, then Post.

  16. Mathematics for experts by Anonymous Coward · · Score: 0
    But even if the U.S. loses the title, it still dominates the field, with 298 machines on the Top 500 list -- more than any other nation.

    Read carefully, and you'll discover some A-level mathematics in this sentence. Wow.
  17. It's not about speed.. by DarKlajid · · Score: 1

    You're wrong here.

    It's not about being fast. It's about creative ways to do things that interfaces weren't intentended for.
    Your idea would work out as soon as you have a way to replace artists with computers.

  18. Our penis so small, your american penis so large.. by tomstdenis · · Score: 2, Insightful

    Nuff said.

    Where are the really neato results we should be getting from these? I'm tired of "Country X builds massive TeraWatt computer system." I want to read about "Country X mapped the cancer genome" or some such.

    Besides, these are relatively not impressive. Sure in the 50s, 60s, 70s, 80s we were maturing the technology. Inventing new technology, analyzing it, etc. Now it's more of the same. Huge budget, lots of space and infiniband connections...

    Show me the MFlops/Watt rating of this? Are they improving it? Are we wasting less resources? The irony of this is they pollute by wasting tons of energy, all so we can predict global warming or whatever.

    Tom

    --
    Someday, I'll have a real sig.
  19. 4808 chips -- Alas, it is still bottlenecked by... by nethneta · · Score: 2, Funny

    ...its Geforce MX 420.

  20. Re:4808 chips -- Alas, it is still bottlenecked by by cheese-cube · · Score: 0

    I have a GeForce MX440 so trust me, thats not funny.

  21. For once the subtitle is right on by John+Muir · · Score: 2, Informative

    ROFL at the "From the renders-a-million-tentacles-a-minute dept" ... nice choice!

  22. Re:Our penis so small, your american penis so larg by tomhudson · · Score: 4, Informative

    "Show me the MFlops/Watt rating of this?"

    No problemo!

    The number of flops: (10 ^ 15) / 4808 = about 207,986,688,852 flops per chip, - from a previous poster.
    The number of watts: 300,000 - from the manufacturers' site = 62 watts/chip
    207,986,688,852 / 62 = 33,546,240 flops (33 MFlops) / watt.

  23. MOD PARENT UP by pimpimpim · · Score: 1
    thank you for this one!

    As someone else already said, and mentioned in Parent's link, this is a very specific machine, for Molecular Dynamics simulations, everything from memory handling to processing is optimized only for handlig particles and doing force calculations on them. Therefore, it'll serve a relatively small market.

    That said, I'm very curious to see how fast it'll run gromacs, the MD program I use. This is pretty optimized for parallel simulations already, and I'm able to do the calculations I need on a small opteron cluster in no time.

    The biggest problem might be now to find useful research questions to simulate on it! Actually that is the main problem why computational medicine didn't really take over yet. The good thing is that this machine will give researchers time to think about this instead of spending their time thinking how to get enough computing power.

    --
    molmod.com - computing tips from a molecular modeling
  24. Vector Processing? by NousCS · · Score: 1

    "the machine may be ineligible because of its specialized hardware"
    What specialized hardware? I would really like to read a more technical article about this machine. I would guess that the Japanese focused on vector processing like they did in the design of the Earth-Simulator.

    The best supporting evidence I have for this conclusion is the comparison of Japan's last two supercomputers:
    Sun Fire X64 Cluster
    Earth-Simulator

    Sun Fire has 10,368 processors with a Rmax(GFlops) of 38,180.
    Earth-Simulator has 5,120 processors with a Rmax(GFlops) of 35,860.
    That's 49% less processors with 94% the processor power*.

    Here's the original article link:
    http://www.businessweek.com/globalbiz/content/jul2 006/gb20060726_150659.htm?chan=topStories_ssi_5/

    *Only comparing one aspect of performance.

    1. Re:Vector Processing? by Anonymous Coward · · Score: 0

      I think the GRAPE machines use custom ASICs to do the crunching. This makes them specialized
      for whatever problem the machine was designed for. It also means that the number of
      floating point operations per clock tick can be very large.

    2. Re:Vector Processing? by Anonymous Coward · · Score: 0

      This isn't an all-purpose petaflop computer (ie. can't be used for protein folding calculations, thermonuclear explosion simulations, weather and climate prediction, etc...).

      The first real petaflop computer will be built by Cray and up and running in about 2 years.

    3. Re:Vector Processing? by Jerry+Coffin · · Score: 1
      This isn't an all-purpose petaflop computer (ie. can't be used for protein folding calculations, thermonuclear explosion simulations, weather and climate prediction, etc...).

      Actually, it probably could be used in protein folding -- but not the others.

      The first real petaflop computer will be built by Cray and up and running in about 2 years.
      Maybe -- but IBM has at least talked about a Blue Gene/P (P for petaFLOP). I haven't seen much about it recently, so it may be open to some question. OTOH, IBM now has enough presence in supercomputers that I'd have to call it a credible possibility.

      OTOH, Cray's use of reconfigurable computing could make theirs applicable to a wider variety of problems -- if IBM builds a BG/P, it'll probably be pretty similar to a larger version of the BG/L, which means a HUGE number of processors. Nobody gets this kind of speed from a single processor, but Cray does it with far fewer than most (though Hitachi clearly wins in this respect). The real challenge with most of these huge machines is figuring out how to distribute the job across lots of CPUs. Fewer, faster, CPUs makes that easier and (somewhat) less necessary.

      For those who aren't aware of it, reconfigurable computing means Cray has boxes that include some big Xilinx FPGAs, which it can use to create custom hardware. For bits and pieces that can benefit from massive parallelization, this can provide considerable improvement. It gives more or less the kind of special-purpose capability discussed in TFA, BUT can be reconfigured to have more or less that kind of capability for a lot of different purposes.

      This may sound like we're back to a general-purpose computer, with the same shortcomings as usual -- and to an extent that's exactly true. The difference is that you're reconfiguring the hardware instead of doing things in software. The result is a compromise between the two extremes. Slower and less power efficient than an ASIC, but faster and more power efficient than software. Developing a particular capability for an FPGA is generally slower and more expensive than equivalent software, but faster than cheaper than developing an equivalent ASIC.

      --
      The universe is a figment of its own imagination.
    4. Re:Vector Processing? by Anonymous Coward · · Score: 0

      Bah, most of the computers in the top500 are heavily tuned towards running linpack. They aren't anywhere near that fast for general processing. As for possibly being ineligable, these processors sound to be very heavily geared towards running a few specific algorithms, and if it can't run linpack, it doesn't get on the top500.

      (Note : I was a supercomputer engineer)

  25. Re:Our penis so small, your american penis so larg by Waffle+Iron · · Score: 1

    Your "Grumpy Old Man" impression is passable, but it's nowhere near as funny as Dana Carvey's was.

  26. Weather Predictions Expalined! by Hercules+Peanut · · Score: 1

    From the article: Meteorologists use supercomputers to predict climate patterns decades into the future by analyzing huge databases of statistics.

    It all makes sense now. When they predict 90% chance of rain three days in a row and we don't see a drop, they relly meant that it will rain sometime between now and thirty or forty years from now.

  27. Re:Our penis so small, your american penis so larg by NewbieProgrammerMan · · Score: 4, Insightful

    Oh, please. This machine only uses 300kW - that's maybe the equivalent of 150 American homes. These folks are building a specialized (as in not "more of the same") machine to support a particular bit of science (molecular dynamics simulations) that isn't gonna make for flashy headlines, and I say more power to them. I'd rather there were more scientists out there doing basic research that may actually be useful, than have them chasing after stuff for headlines that will make you happy.

    And if you're trolling, yeah, you got me, so congratulations.

    --
    [b.belong('us') for b in bases if b.owner() == 'you']
  28. "Computer" ? by AC-x · · Score: 1

    From the article it sounds like the whole thing is based on a large collection of specialised processors designed only for protien folding calculations, so while it may be able to do those at a petaflop rate it probably can't do anything else at nearly that rate (just as the WWII Colossus computer could beat a 486 at Enigma cracking it certainly wasn't faster terms of actual computing speed)

    1. Re:"Computer" ? by aminorex · · Score: 1

      The MDGRAPE-3 VPUs are optimized for moldyn, but could be easily applied to a wide range of problems. They do have a robust set of floating-point instructions. I have experience in adapting a wide range of scientific problems to similar architectures (in a previous generation of capability), and can assure you that while peak hardware utilization rates would be nigh impossible to achieve for the majority of applications, a respectable percentage of the theoretical capacity could be brought to bear on a wide range of problems.

      --
      -I like my women like I like my tea: green-
  29. Oh just... by Awod · · Score: 1

    9 million, sign me up, where I can get one.

    1. Re:Oh just... by Anonymous Coward · · Score: 0

      Just wait till the end of this year. Sony has promised this is the hardware that will run their new EMOTION ENGINE for the PS3.

  30. Re:Apparent source page for device data by Traiklin · · Score: 3, Funny

    but I thought Japan already had a lot of studys on protein?

    I've seen the videos of it a few times and stumbled across entire collections of them! they call it something like bukkake.

  31. glxgears by jonathansizz · · Score: 2, Funny
    Japan has managed to create the first Petaflop supercomputer, called MDGrape-3, with just 4808 chips, and it cost just $9 million to develop.
    Wow! I bet it gets loads of fps in glxgears!
    1. Re:glxgears by Anonymous Coward · · Score: 0

      9943246578995686462542356478875753653 fps

  32. Not even close! by bockelboy · · Score: 2, Insightful

    You've all been had by a reporter with an overactive imagination talking to a researcher selling his own shit. The MDGrape is a specialized processor (you can actually buy it commercially as a separate board for your computer) that does exactly one thing: particle simulation using traditional laws of physics. This will allow it to do computational molecular dynamics on the small scale or universe modeling on the large scale. All it understands is data input in the form of particle positions and will output the new positions in the next time step. Can you place two numbers in a register and ask it to add the results? No. Can it do any piece of the HPL benchmark required to get on the supercomputing list? No. It does one thing, but it does it well. This whole article is like comparing the rendering capabilities of your new Nvidia GPU and the latest AMD CPU, then concluding AMD is full of idiots who can't engineer because the Nvidia chip renders more polygons.

    1. Re:Not even close! by Jerry+Coffin · · Score: 1
      This whole article is like comparing the rendering capabilities of your new Nvidia GPU and the latest AMD CPU, then concluding AMD is full of idiots who can't engineer because the Nvidia chip renders more polygons.

      ...and now we know the real reason AMD decided to buy ATI! :-)

      --
      The universe is a figment of its own imagination.
    2. Re:Not even close! by Anonymous Coward · · Score: 0

      Mod parent up. Not all petaflops are created equal ... the md-grape machines are designed specifically to do molecular dynamics and other particle-particle interaction calculations. (That's what "md" stands for.) They are really co-processors. So, according to this article, you could build a "petaop" machine by just slapping together a cluster with 1,000 video cards in it. (Modern GPU have around a teraop ... in the latest cards that op might even qualify as a flop.)

    3. Re:Not even close! by cr0sh · · Score: 1
      particle simulation using traditional laws of physics. This will allow it to do computational molecular dynamics on the small scale or universe modeling on the large scale.


      Hmm - this is interesting in and of itself. What I mean by this, is that here is a very specialize (and I assume, Turing complete) computer, doing one particular job, and doing it amazingly well. Now, let us suppose the simulation of particles it does according to known physics is complete (I know it isn't). If it were, then in theory it could model a very small subset of the Universe at the atomic level. Let us suppose you could scale this simulation up to an area the size of the computer (elsewhere in this discussion on /., someone noted you could buy a board for your computer that had a version of this machine - a specialized PPU add-in card, if you will) - let's say 1 cubic meter for sake of the discussion. Now, suposing all of this was true - could this specialized computer simulate (at a slower rate, to be sure) an entire generalized computer (a full scale UTM)?

      I argue that it could, easily. In theory, it could even simulate a human (albeit one bunched up into a ball, or one under a meter tall). How close are we to this? Hard to say, but we (humans) did recently simulate an entire virus (though only for a few femtoseconds IIRC) modelled from atomic particle interactions (and learning a heck of a lot in the process).

      Now, if a specialized processor, simpler in scope than a more generalized processor (like comparing a DSP to a full CPU, although things are blurring, so the analogy might not fit), can simulate the more generalized processor, how much simpler can those instructions be on the generalized processor?

      Stephen Wolfram has a clue - and none of this would suprise him. It certainly doesn't surprise me...

      --
      Reason is the Path to God - Anon
    4. Re:Not even close! by bockelboy · · Score: 1

      There's a lot of little problems with this approach, unfortunately. We don't know enough of the details of how a human body - or a human cell - works. A large-scale simulation of atomic particles may work to up to hundreds of thousands of particles, but our modelling breaks down at some point.

      I hate to quote the concept of chaos, but a lot of dynamic systems in the world *are* chaotic. That is, an infinitely small error in the input conditions leads to arbitrarily large differences in results. This is why, even with the largest computers, we can't predict the weather for more than a few days in the future. We would have some results, but they'd be totally meaningless.

      (As an offtopic note, the first time chaos was discovered, if memory serves, was by Lorenz while doing computer weather models. He got an interesting output from one of his models and wanted to repeat it. He rounded off the input parameters to about 7 digits, reran the simulation, and got a completely different outcome. Hard to predict things when they are chaotic!)

      This is the danger of a large scale computer simulation when the model is just a guess. At least the physicists are lucky where the rules of physics are pretty well defined. There are no "laws" of biology.

    5. Re:Not even close! by cr0sh · · Score: 1
      This is why I noted:


      Now, let us suppose the simulation of particles it does according to known physics is complete (I know it isn't).

      When I said "complete", I really meant "complete", and I also know that this isn't (currently) attainable, if it ever is (maybe with quantum computing, but maybe not). I have known about chaos since I first encountered it studying fractal algorithms as a high school student in 1989 or so (played around with them on an Apple IIe and a Tandy Color Computer 3).

      I am not trying to discount you here, everything you said is true and complete. I just wanted to make it clear that I am aware of these effects, and that I was simply proposing a thought experiment, given a set of suppositions assumed to be true for the sake of the argument...

      --
      Reason is the Path to God - Anon
  33. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  34. 500Gflop with one computer chip for cheap... by nairb774 · · Score: 1

    Check out the company Mathstar (http://www.mathstar.com/). They just taped out a chip the other day that when it comes to market will do about 500 Gflops a chip. The technology is quite incredable and although it is not specifically a general purpose chip the chip can be programed to work in any way you like allowing you to get max preformance for the applications that you need to run. Honestly I would like to get a hold of about, say, 50 of these and see what I could make them do in parallel (as they are made to be hooked up in parallel also). From what I have heard these would be competitive with processors now a days in price and therefore likely less then $1,000 a piece. Making it $0.50 a Gflop!

  35. Specialised by SamAdam3d · · Score: 3, Informative

    The problem with that is that this computer is very specialised to molecular simulations. It can't very easily do other things, like seti or folding (okay, well, maybe that it can do). It was easy to design and cheap because it didn't have to be general purpose and adaptable, like BlueGene/L is.

    --
    I love deadlines. I like the whooshing sound they make as they fly by. - Douglas Adams
  36. Actually by Frightening · · Score: 1

    Cancer research sounds a little better than preventing-your-browser-from-misbehaving research. But at only 9 mil a piece, why not both?

    In fact, you could put thousands of these machines together for less than 10 billion. For 10 billion dollars you could crack any reversible cryptographic algorithm in the universe on a weekend. I call that world domination.

    Maybe Gates still has interesting things to do with his life after all.

  37. giga not tera by tetromino · · Score: 3, Insightful

    (10^15)/4808 = 207 986 688 852, i.e. ~208 billion flops, i.e. if the chip executed only 1 instruction per clock, it would be 208GHz (not THz as you imply). Except of course the chip does more than 1 instruction per clock. Modern x86 chips do multiple flops per cycle. A Cell should be able to do at least 9 per cycle. I imagine that a dedicated vector processor, of the sort that NEC used to make, can do tens of flops per cycle.

    Furthermore, many processor architectures have instructions to do several basic floating point instruction in one step. For instance, PowerPC has a one-cycle multiply-accumulate instruction (multiply and add in one step), so for marketing purposes, a PowerPC has twice the flops. Now, imagine if you have a vector processor that has a highly-optimized instruction for taking square roots or doing trig in one cycle. A square root operation will translate into dozens of basic flops (add, multiply, subtract). Such a processor might therefore be rated at 208 gigaflops even though its operating frequency is <1GHz.

    1. Re:giga not tera by maraist · · Score: 1

      Modern x86 chips do multiple flops per cycle
      I love acronyms that are explained incorrectly..
      Floating Point Operations Per Second per cycle

      If you assume that the reader doesn't know the meaning, then just write it out to begin with. :)

      --
      -Michael
    2. Re:giga not tera by Anonymous Coward · · Score: 0

      FlOps (Floating point Ops) and FLOPS are both used in these types of discussions. It is assumed that you can understand which term is being used at any given time without having to resort to the technically correct case for each letter for the given use.

  38. Not Quite Progress Yet by TubeSteak · · Score: 2, Informative
    From Page 2 of TFA
    No other supercomputer at the top of the rankings can muster so much calculating brawn on such a tiny budget. That's partly because MDGrape-3 relies on fewer chips and less circuitry than rivals. It's also because the chief scientist, It's also because the chief scientist, Dr. Makoto Taiji, working with only two other researchers, had plenty of help from Hitachi, Intel, and NEC subsidiary SGI Japan.

    Those companies supplied the hardware -- Hitachi made the central processing unit, or CPU -- and absorbed part of the cost of building the machine. One measure of the MDGrape-3's ultra-efficient computing muscle is its cost per gigaflop (1 billion floating-point calculations per second), which Riken puts at $15.
    Only if you're getting subsidised by 3 global corporations.

    If it costs $15/gigaflop, then they would have paid... $15 million
    A $6 million subsidy (40%) isn't small change.
    --
    [Fuck Beta]
    o0t!
  39. PETA flop by aapold · · Score: 1

    No, a Petaflop is when an animal rights activist throws themselves in the path of a fishing trawler, cattle car or some other vehicle used in the meat or fur industry. It is similar to, but not quite the same as the terraflop which is more used in anti-logging activities.

    --
    "Waste not one watt!" - CZ
  40. MadDog Grape is my favorite flavor too! by FatSean · · Score: 1

    Of all the MD 20/20 varieties...grape stands out as the best.

    --
    Blar.
  41. Finally... by boeserjavamann · · Score: 0, Troll

    Something for the Java Swing Developers and Users out there :-)

    1. Re:Finally... by Anonymous Coward · · Score: 0

      nah! it's still gonna be slow

  42. Does this deserve Top 500? by Quila · · Score: 1

    I guess it would depend on the definition, whether it has to be capable of general purpose or only specialized. Technically, it should be possible to easily get petaflop performance by putting a few million into a computer using chips designed only to run LINPACK.

    Personally, I don't think it should qualify. Otherwise the EFF's $250,000 Deep Crack, which could only crack DES (although faster than tens of thousands of regular computers at that time), would qualify too.

  43. New Blue? by Doc+Ruby · · Score: 1

    How many petaFLOPS will IBM get out of a new Blue Gene made from Cell processors?

    --

    --
    make install -not war

  44. Yes but........ by Lissajous · · Score: 1

    ......will it run linux??

    1. Re:Yes but........ by Anonymous Coward · · Score: 0

      and can we build a Beowulf cluster of them?

  45. Re:Our penis so small, your american penis so larg by tomstdenis · · Score: 0

    It wasn't a troll. I honestly believe we don't step back and say "should we do this". Just because you CAN do something doesn't mean you SHOULD. Of the computers on top500.org how many of them have led to new discoveries or tested hypothesises [sp?]?

    It not just computers though. Look at the number of people who subscribe to the notion that they need their own personal vehicle, bottled water, blah blah blah.

    So a group built a overgrown home computer. Big deal. Let's wait and see what they accomplish with it.

    Tom

    --
    Someday, I'll have a real sig.
  46. Not comparable by News+for+nerds · · Score: 2, Informative

    Though the theoretical performance of this computer is higher than that of BlueGene and may have higher realworld performance too, you can't compare this supercomputer with BlueGene and other TOP500 supercomputers since it can't run LINPACK. It's just too specialized for its use.

  47. Screw petaflops... by Anonymous Coward · · Score: 0

    ...how many bogomips does it do?

  48. Darn algorithms! by Duncan3 · · Score: 1

    If you can just take their n^3 algorithm (with quantum it's more like n^8), and make it n^2, you can do all that on your desktop :)

    Not all progress needs to be brute force. But brute force is much more fun to brag about.

    -

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
  49. flops don't replace skill by backwardMechanic · · Score: 1

    From TFA:

    Experts believe that the nation with the most machines near the top of the ranking generally has the most competitive economy.

    Oh come on - were these American experts by chance? How about flops/head? But lets think for a moment. Do raw flops count, or is it what you do with them? Once you have a big computer, it's easy to generate lots of numbers. The art of science, though, is to abstract your question, so you can make some useful predictions. Otherwise you might as well just measure the world that's out there, in all its complexity.

  50. Re:Weather Predictions not explained by bussdriver · · Score: 1

    Measuring in yards/meters is easier than measuring in nanometers.
    Predicting long term weather trends is easier than daily weather conditions in your area.

    When fluid dynamics and computers are to a level to handle compressible fluids at the scale needed, the predictions will still be off to places that aren't the focus. Frequently the predictions for my city only come true to part of the city.

  51. More tech specs by OfNoAccount · · Score: 1
  52. The real problem is... by StarkinProgram · · Score: 0

    ...is it willing to come out of the closet?

  53. Re:Our penis so small, your american penis so larg by cyber-vandal · · Score: 1

    The trouble with science is that the value of research is very hard to measure. However the more we know about the world the better we understand it. Just because supercomputer research hasn't produced a certain amount of value yet doesn't mean it never will. I'm all for "wasting" money on learning more, because the more humans learn the more likely we are to discover stuff that is useful.

  54. This is the future for supercomputing. by jozmala · · Score: 1

    What from the article it looks like they did special purpose asic that solved their problems, and those are controlled by standard cpu:s. Depending on algorithm you can get multiple orders of magnitude performance advantage for doing a special purpose chip instead of general purpose computing chip.
    Lets do order of magnitude computations here, pair of general purpose cpu cores use about 100M transistors not counting cache. An adder takes 1000 transistors. So with cpu:s transistor budget you get 100000 adders running in parallel. In overall the performance difference would be 1000x for the asic design over general purpose solution. As for not counting cache is important since you probably want the ondie storage for the temporary values, and caches transistor density is far higher than logics. And thats not the best case not worst case scenario but more or less what to expect in general rule if you don't saturate the memory in which case you should add more or faster memory channels or change algorithm for less bandwith limited, still can make trade offs that no off the shelf CPU could reasonably make. In overall you still get atleast 10x performance increase over going for standard cpus. So expect 1000x to 10x on code that runs EXTREMELY optimally on general purpose chip. Of course you CAN construct a case where general purpose computer beats the special purpose one. But more than often that case cannot use lots of processors as once you can parallerize the special purpose wins.
    The problem with special purpose is that you cannot do everything, you can do one thing and that thing VERY WELL.
    You just change the control logic to a logic solving the problem.

    --
    ©God :Copyright is exclusive right for creator to determine the use of his creation.
  55. AI by taxtropel · · Score: 1

    hmm...here's what I see with this...
    With such great power and such few processors, this will cause other (but not all) computing technology to migrate in that direction.
    I can see the average PC doing 15 Terra flops with in the next 5 years. This, if I am accurate, would put the home PC in the processing realm of the human brain. Is it possible that an AI which could pass the Turing test with near 100% of the subjects is not long behind? Humanoid robots and robotic transportation? ...And then...there's the military...

    Should we put a "Three Laws Treaty" on the international table?

    1. Re:AI by scottyokim · · Score: 1

      The three laws are hopelessly inadequate ... see the Friendly AI research at http://www.singinst.org/ for details ...

  56. Better Anime? by LifesABeach · · Score: 1

    Does this mean that the animation of Anime will be better? If not, so what.

  57. Re:Our penis so small, your american penis so larg by RuiFerreira · · Score: 1

    207,986,688,852 / 62 = 3,354,624,013 flops = 3199 MFlops/watt My AMD64 3000 + has around 42 MFlops/watt

  58. Re:Our penis so small, your american penis so larg by tomstdenis · · Score: 1

    What new technology was developed to produce this machine?

    Or was it a case of having loads of money, room and a friendly merchant at Fry's?

    That's my complaint. It was different with the first Crays. Nothing like it existed before. They had to invent new technology to accomplish it. This is more a case of networking via gige and optical then stacking box upon box.

    Tom

    --
    Someday, I'll have a real sig.
  59. Precision? by Junta · · Score: 1

    Already the article suggests it may not be capable of running linpack, the other question being, are these 32-bit precision operations or 64-bit precision? Linpack explicitly measures 64-bit precsion. This is one reason why despite some clustered deployments that are inevitible with the cell processor, those won't be impressive top500 wise despite the cries of 'OMFG, cell has uber gigaflops'. Cell brags on the gigaflops, but the state of Cell as it is announced today is only interesting 32-bit precision wise. 64-bit precision won't blow away the conventional Power/PPC chips which are impressive Linpack wise.

    --
    XML is like violence. If it doesn't solve the problem, use more.
  60. bullshit alert!! by linuxghoul · · Score: 0

    Exhibit A. 1 peta flops is 10 to the 15th power floating point operations per second.
    Exhibit B. This computer has ~5000 chips.

    This means each chip should be able to capable of 200 giga floating point ops per second.
    I know of no technology which can allow any floating point unit to be clocked at 200 GHz.
    Even if it were possible, the kind of power it will consume would make P4s look like mere tiny fuzzy little animals.

    This means that each of these chips has to have multiple fpus running in parallel. For low power apps, generally going over 1GHz clock (at todays chip process technologies) is not viable. Assuming that to be the case, this would need 200 FPUs in each chip, amounting to the equivalent of 1 million nodes (just distributed over 5000 chips): why does this matter? The larger the nodes, the larger the complexity of splitting the application into so many threads of execution, and the larger the communication bottleneck. Yes, integrating 200 FPUs on a single chip would certainly ease the design of the communication system, but that also means that going off chip will in general have to carry withitself a large large large "communication penalty".

    Also, in that case, I would consider the article deliberately misleading, as they make it a point to mention the lower number of chips being used in this design as evidence of it being better than the other super comps.

    As to having so many FPUs on a chip, there are dozens of companies out there making massively parallel chips...1024 and 2048 fpus per chip has already been done...

    theres more to this than meets the eye...
    if anyone here has more info, care to share?

    -ghoul2

    --
    Sigura Non Grata
    1. Re:bullshit alert!! by slashthedot · · Score: 1

      Some info about the chip. (Note: the article is old) http://news.com.com/Japan+designers+shoot+for+supe rcomputer+on+a+chip/2100-1008_3-5322558.html/ From the original posting: "How do experts rate the MDGrape-3? Alan Gara, chief architect for BlueGene/L at IBM's T.J. Watson Research Center in Yorktown Heights, N.Y., had this to say: "It's an unusual architecture. In BlueGene/L all chips can communicate with each other. In our largest BlueGene we have 65,000 nodes, with 130,000 processors. They didn't need to do that. [MDGrape-3 has 4,808 chips.] "They also built a processor that did only the type of calculations they need to do in astrophysics. So they built a specialized processor and a specialized network. It's a good example. It shows how cost- and power-efficient you can be if you build for a specific applications. We can learn from it. They've set a benchmark of power performance." While Horst Simon, associate laboratory director for computing sciences at Berkeley Lab and editor of the Top500 Supercomputer Sites, weighed in with this: "When we say 1 petaflop, it's just a number. It's the same as if you were to run 100 meters in less than 10 seconds. But it does mean something because it's a barrier to break through. The fact is we've reached the petaflop threshold. Others will follow. In computing, a matter of three to four years can change things."" Although specialized, this supercomputer deserves the credit.

    2. Re:bullshit alert!! by TheRaven64 · · Score: 1
      I take it you haven't been keeping up with CPU developments in the last few decades.

      Firstly, it has been a long time since processors only managed to do one instruction per clock. Modern chips do about 8. That alone means that 200GFLOPS equates to about 25GHz.

      Next, you get SIMD instructions. This lets a single instruction work on multiple data elements in parallel. Most modern CPUs have 4-way SIMD, but 8-way is not unheard of. This brings it down to 3.125GHz.

      Now, factor in the fact that you can get 2-4 cores in a single chip. This brings it to between 800MHz and 1.5 GHz. There is hardly a spectacular clock speed. If the chip is optimised for a particular operation (as these are) then it is hardly beyond the realms of possibility. Oh, and by the way, the NVIDIA 7800 GTX gets 200GFLOPS, so it's not even that unusual.

      --
      I am TheRaven on Soylent News
  61. MDGrape 3? by Anonymous Coward · · Score: 0

    so does that mean in the future we'll have the MD-Grape 20/20???

  62. Comparison MDGrape-3, BlueGene/L & Earth Simul by yalla · · Score: 2, Informative

    I compiled some quick facts which compare those three supercomputers and added pointers to other resources for your convenience:
    http://www.bloglines.com/blog/ITnomad?id=126

    Cheers, Alex.

    --
    You look like a million dollars. All green and wrinkled.
  63. It's the topology, sillypants! by Slugster · · Score: 1

    To triple previous speeds with so few processors some radical engineering took place; strangely enough, the bus tolopogy closely resembles that of a four-dimensional domo-kun.

    It is theorized that a complex tolopogy resembling a four-dimensional Hello Kitty will run roughly twenty times as fast.
    ~

    1. Re:It's the topology, sillypants! by Anonymous Coward · · Score: 0

      Well, Domo-kun is a mostly a cube. What else do we know that is mostly 4 dimensional cube shaped? A timecube!

  64. incorrect chip count... by YesIAmAScript · · Score: 1

    http://slashdot.org/comments.pl?sid=06/07/30/13823 4&threshold=1&commentsort=0&mode=nested&cid=158108 14

    19,122 Xeons.

    (1 * 10 ^ 15) / (2 * 10 ^ 4 ) = 5 * 10^10.

    That's 50 billion floating-point operations per second. If each Xeon is dual-core, it's 25 billion ops per core per second. If they're 4GHz processors, then it's 6.1 ops/cycle. I'm not sure how it achieves that. Even multiply-add fused instructions only do 2 ops per cycle.

    I still have to ask if this is achiveable.

    --
    http://lkml.org/lkml/2005/8/20/95
    1. Re:incorrect chip count... by aminorex · · Score: 1

      The Xeons do not contribute to the flop rate. They act as instruction sequencers and I/O processors.

      --
      -I like my women like I like my tea: green-
  65. Idiotic summery. by imsabbel · · Score: 2, Informative

    This computer, like all the previous (md)grape generations, is a central force potential calculation accelerator.

    it does nothing but calculate 1/sqrt(dx^2+dy^2+dz^2)*variable, but really really often.

    Grape 6, 5 years or so ago, was already running at 200Mhz, had a throughput of one force calculation per pipleline and 6 pipelines on once chip. So it counts as 1.2 billion force calculations, each being (1* inverse, 1 sqrt, 3 adds, 3 squares, 2 fmul, ect).
    A lot of flops, but totally useless as general purpose computers.

    --
    HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  66. Singularity by pontifier · · Score: 1

    For the first time, I have become worried about an unbalanced singularity. If one country reaches the singularity first, the power they would gain might allow them to prevent a singularity in other countries. The US should invest in technology to speed and guide the development of singularity technology here at home. We can't afford to let the singularity happen somewhere else first.

    --
    -John Fenley
    1. Re:Singularity by Anonymous Coward · · Score: 0
      For the first time, I have become worried about an unbalanced singularity. If one country reaches the singularity first, the power they would gain might allow them to prevent a singularity in other countries. The US should invest in technology to speed and guide the development of singularity technology here at home. We can't afford to let the singularity happen somewhere else first.
      So you're the reason all the Kurzweil books at the library are sticky. Yuuuuuck...
    2. Re:Singularity by DeXOR · · Score: 2, Funny

      Yes, we must avoid a singularity gap!

    3. Re:Singularity by joto · · Score: 1
      For the first time, I have become worried about an unbalanced singularity. If one country reaches the singularity first, the power they would gain might allow them to prevent a singularity in other countries. The US should invest in technology to speed and guide the development of singularity technology here at home. We can't afford to let the singularity happen somewhere else first.

      What do you mean? That some super-intelligent AI created somewhere in the world, would want to have an active part in human geopolitics?

      I'm sorry, but that just sounds a bit far-fetched to me. Once (if) AI intelligence surpasses human intelligence, their (non-)interest in human geopolitics would be comparable to our own (non-)interest in the territorial urine-marking of other mammals.

      There are plenty of things that worry me about a (hypothetical) singularity. Whether the breakthrough happens in the US, Japan, North Korea, Iran, Israel, Sudan, China, or somewhere else is not one of them.

  67. Point by Mark_MF-WN · · Score: 1

    Very good explanation. You could even compare this to the Human brain, which only operates at about 50Hz (if I remember my AI class properly) but can have every single one of the trillions of Neurons doing its own little threshold calculation. Granted, it's difficult to compare Neural nets to non-linear circuit systems in a meaningful way, but it does demonstrate the ridiculous extreme of parallelisation.

  68. Google's facility qualify as a super computer? by bismark.a · · Score: 1

    I wonder if the Googleplex machines and its distributed systems have a throughput near this and if so, does it qualify for a supercomputer?

  69. It did not cost $9m to develop. by stonecypher · · Score: 1

    The article is badly written. It cost Riken $9m, because NEC (as SGI Japan) paid for most of the hardware, and because Hitachi and Intel provided all but three of the workers.

    In short, Riken had almost nothing to do with the process, except for the design of the single custom chip involved, and even then, most of the work was done by outside firms who wanted the press. And even then, it still cost the host organization $9 million!

    --
    StoneCypher is Full of BS
  70. It is petaflops not petaflop by bommai · · Score: 1

    FLOPS is not the plural of FLOP. FLOPS is FLoating point Operations Per Second. Man, it drives me nuts when clueless journalists think they can just call one petaflop. I know it sounds funny to say one petaflops, but that is exact what it is. Quit propagating erroneous acronyms - please.

    1. Re:It is petaflops not petaflop by Anonymous Coward · · Score: 0
      FLOPS is FLoating point Operations Per Second.

      No FlOp/s is more accurate! I prefer flop to flops..it stops journalists from saying we're iliterate : )

    2. Re:It is petaflops not petaflop by aminorex · · Score: 1

      Everyone loves a pedant. Notice that many people use FlOp to mean Floating-point Operation, that the English language does not conform to your personal preferences, and that the Cray-2 was referred to as "the gigaflop" 'way back in '91 -- so there's about 15 years of tradition behind the flexible use of this terminology.

      --
      -I like my women like I like my tea: green-
  71. Morons by Anonymous Coward · · Score: 0

    What can I say to the morons who commented on this (really astonishing) design in the article.
    Two words.
    Sour grapes.
    Maybe more....yes, it's a better design than yours....you can't HELP but learn from it, as it just made your idiotic computer science ideas so much rubbish.

  72. From the article.. by RichiH · · Score: 1
    Should the U.S. government or researchers be worried that a Japanese supercomputer will soon be crowned the world's fastest computer?

    No, but they should be worried when a 'technology magazine' sees the need to explain that 298 is a larger number than 250.. Yes, this might be shocking, but after you substract 298 from 500, you are only left with 202. And no, 202 is not larger than 298, even if you take the whole of it. So, yes, if you have 298 apples of a total of 500, noone will be able to have more than you. Next, we will have a closer look at the letter 'G'.
  73. cluster by mattbrundage · · Score: 1

    Hmm, and with a cluster of these, local news stations may now be able to accurately predict the weather six days in advance.

    --
    Matthew Brundage
    Silver Spring, MD
  74. If you think this... by cr0sh · · Score: 1
    ...what makes you think that we (here on earth) will be the first (in the Universe)? How would you (or any one of us) know whether this hasn't already occurred (ie, a technological singularity has already happenned elsewhere in the Universe, and we have been "isolated" in some manner to prevent us from doing the same, or at least limiting our spread should we get lucky).


    The honest answer is "we don't know", and that we should continue on (for whatever that means) doing what we do...

    --
    Reason is the Path to God - Anon
  75. Re:Our penis so small, your american penis so larg by aminorex · · Score: 1

    No, this is nothing like a beowulf cluster. While the basic architectural outline is classical, using a general-purpose computer to feed instructions and manage I/O for a whopping big array processor, there are numerous small, critical innovations which contribute to the enormous flop count. BlueGene/L you might consider just a big flipping stack of workstations, but there is an order of
    magnitude difference in flops between that kind of commodity system and MDGRAPE-3.

    Gig-E is a pretty sad sort of MPP interconnect, BTW. Infiniband is a big step up, and HyperTransport 3 is another hop skip jump beyond that. When the VPUs are talking over a direct interconnect, magic can happen.

    --
    -I like my women like I like my tea: green-