Slashdot Mirror


BlueGene/L Puts the Hammer Down

OnePragmatist writes "Cyberinfrastructure Technology Watch is reporting that BlueGene/L has nearly doubled its performance to 135.3 Teraflops by doubling its processors. That seems likely to keep it at no. 1 on the Top500 when the next round comes out in June. But it will be interesting to see how it does when they finally get around to testing it against the HPC Challenge benchmark, which has gained adherents as being more indicative of how a HPC system will peform with various different types of applicatoins."

8 of 152 comments (clear)

  1. Finally... by Nos. · · Score: 5, Funny

    Maybe this thing can keep the WoW service running.

  2. Re:similarities by TetryonX · · Score: 5, Funny

    If the BlueGene/L can grant me any wish I want for collecting 7 of them, sign me up.

    --
    [!] No, I can't see my comments. They are not worthy of +3 moderation.
  3. Math Error? by mothlos · · Score: 5, Insightful
    Roughly as expected, BlueGene/L can now crank away at 135.3 trillion floating point operations per second (teraflops), up from the 70.72 teraflops it was doing at the end of 2004. BlueGene/L now has half of its planned processors and is more than half way to achieving its design goal of 360 teraflops.



    Is it just me or is 135.3 * 2 < 360 / 2?

    1. Re:Math Error? by Anonymous Coward · · Score: 5, Informative

      Lets see if I can get this right. I'm going to talk a little bit out of my ass now, but here goes:

      Every 512 node backplane has a peak of 1.4tflop/s according to their design doc.

      The 32k system benched in at 70. The theoretical peak was 91, so the actual performance is about 77 percent of the peak, which is pretty normal.

      The 64k system benched in at 135. The peak should be around 182, so that is 74 percent of the peak.

      The design goal is for 360 at 64k. I'm guessing that 360 is the peak, because my rough estimate calculations put the of 64k nodes at about 364. Lets be nice and assume 70% of peak in the actual machine. That would indicate around 255tflop/s at 64k nodes of actual performance, assuming the thing scales at about the same rate.

      So, they got their math right as long as they are claiming a peak of 360. That's a theoretical max, and so never actually reached. The actual is notably less. My numbers are estimates, and so 364 not equalling 360 doesn't bother me much in the end.

      Anyone care to correct anything?

      Also, can someone explain to me why Cray's Redstorm won't kick this things ass performance wise. Redstorm should have 10k processors, but they are 64bit Opteron 2.4ghz processors with 4x the ram per node. These, I believe, are 700mhz processors.

      I'm confused because Redstorm only has a theoretical peak of about 40tflops off the 10k nodes. IBM's system at 10k should have a peak at around 30tflops. I'm wondering how 64bit opterons at more than 3x the speed could only be 10flops faster at 10k nodes than 700mhz 32bit PPCs with 1/4 the ram. Can someone please explain?

      Also, anyone know why IBM isn't using HPCC. Cray has been using it for the XD1s. I'm guessing the reason IBM hasn't posted results is because they can't even come close in sustained memory bandwidth, mpi latency, and other tests. That's just a guess though. I'd love to hear from someone that actually knows about this stuff.

  4. Wait another year... by Anonymous Coward · · Score: 5, Interesting
    That's like, what, 527 Cell processors?

    Obviously that number's based on an unrealistic, 100% efficient scaling factor. But still. The 137 TFlop is coming from 64,000 processors.

    It's fun to think about what's just around the corner.

  5. Cell vs HPC by adam31 · · Score: 5, Insightful
    The HPC Challenge benchmark is especially interesting and I think sheds some light on the design goals IBM had in coming up with the Cell.

    1) Solving linear equations. SIMD Matrix math, check.
    2) DP Matrix-Matrix multiplies. IBM added DP support to their VMX set for Cell (though at 10% the execution rate), check.
    3) Processor/Memory bandwidth. XDR interface at 25.6 GB/s, check.
    4) Processor/Processor bandwidth. FlexIO interface at 76.8 GB/s, check.
    5) "measures rate of integer random updates of memory", hmmmm... not sure.
    6) Complex, DP FFT. Again, DP support at a price. check.
    7) Communication latency & bandwidth. 100 GB/s total memory bandwidth, check (though this could be heavily influenced on how IBM handles its SPE threading interface)

    Obviously, I'm not saying they used the HPC Challenge as a design document, but clearly Cell is meant as a supercomputer first and a PS3 second.

  6. Re:Yes but what .. by OneDeeTenTee · · Score: 5, Funny

    and what type of frame rate do you get with Quake?

    It speculatively pre-renders every possible frame for the next 90 seconds.

    --
    Stop the world; I need to get off.
  7. Re:Top 500 and Auto Racing by ShadowFlyP · · Score: 5, Informative

    I think your comparison here is quite unfair to the technological accomplishments of BlueGene/L. This is not simply a case of IBM "throwing more processors" at the problem, but BlueGene is a technological leap over other supercomputers. Not only is BlueGene faster, than for instance the Earth Simulator, but it also consumes FAR LESS power (which in turn minimizes the energy wasted cooling the thing) and takes up much less space. From an article published when BlueGene first overcame the Earth Simulator: "Blue Gene/L's footprint is one per cent that of the Earth Simulator, and its power demands are just 3.6 per cent of the NEC supercomputer." http://www.theregister.co.uk/2004/09/29/supercompu ter_ibm/ So, I say to you, NO! The top 500 race is not simply big companies throwing money at a problem (well, it sort of is), but there is quite a lot of technical accomplishment going on here. You could argue that the people involved may not have the brilliance of Seymore, but they sure do have real talent.