Slashdot Mirror


BlueGene/L Puts the Hammer Down

OnePragmatist writes "Cyberinfrastructure Technology Watch is reporting that BlueGene/L has nearly doubled its performance to 135.3 Teraflops by doubling its processors. That seems likely to keep it at no. 1 on the Top500 when the next round comes out in June. But it will be interesting to see how it does when they finally get around to testing it against the HPC Challenge benchmark, which has gained adherents as being more indicative of how a HPC system will peform with various different types of applicatoins."

41 of 152 comments (clear)

  1. Finally... by Nos. · · Score: 5, Funny

    Maybe this thing can keep the WoW service running.

  2. The real question is.... by bryan986 · · Score: 2, Funny

    ...how do we slashdot it?

    --
    There is no sig
  3. Avoiding the obvious memes... by yuriismaster · · Score: 3, Funny

    How much processing power does one need for any certain application? I know that projects like World Community Grid need massive amounts of computing power, but seriously, 135 TFlops?

    ...ok I couldn't resist

    Imagine a beowulf cluster of these....

  4. similarities by teh_mykel · · Score: 4, Insightful

    does anyone else find the similarities between the computer hardware world and DragonballZ irratating? right when you think its finally over, the best is exposed and found worthy, yet another difficulty comes up - along with the standard unfathomed power increases and bizare advances. then it all happens again :/

    --
    this sig no verb
    1. Re:similarities by TetryonX · · Score: 5, Funny

      If the BlueGene/L can grant me any wish I want for collecting 7 of them, sign me up.

      --
      [!] No, I can't see my comments. They are not worthy of +3 moderation.
  5. Yes but what .. by Anonymous Coward · · Score: 2, Funny

    ..about overclocking it?

    and what type of frame rate do you get with Quake?

    1. Re:Yes but what .. by OneDeeTenTee · · Score: 5, Funny

      and what type of frame rate do you get with Quake?

      It speculatively pre-renders every possible frame for the next 90 seconds.

      --
      Stop the world; I need to get off.
    2. Re:Yes but what .. by InfiniteWisdom · · Score: 2

      Its prints a running total like:
      1. Based on optimal actions your frags would have been: 1348
      2. Your actual frags: 2
      3. You suck

  6. Math Error? by mothlos · · Score: 5, Insightful
    Roughly as expected, BlueGene/L can now crank away at 135.3 trillion floating point operations per second (teraflops), up from the 70.72 teraflops it was doing at the end of 2004. BlueGene/L now has half of its planned processors and is more than half way to achieving its design goal of 360 teraflops.



    Is it just me or is 135.3 * 2 < 360 / 2?

    1. Re:Math Error? by Anonymous Coward · · Score: 5, Informative

      Lets see if I can get this right. I'm going to talk a little bit out of my ass now, but here goes:

      Every 512 node backplane has a peak of 1.4tflop/s according to their design doc.

      The 32k system benched in at 70. The theoretical peak was 91, so the actual performance is about 77 percent of the peak, which is pretty normal.

      The 64k system benched in at 135. The peak should be around 182, so that is 74 percent of the peak.

      The design goal is for 360 at 64k. I'm guessing that 360 is the peak, because my rough estimate calculations put the of 64k nodes at about 364. Lets be nice and assume 70% of peak in the actual machine. That would indicate around 255tflop/s at 64k nodes of actual performance, assuming the thing scales at about the same rate.

      So, they got their math right as long as they are claiming a peak of 360. That's a theoretical max, and so never actually reached. The actual is notably less. My numbers are estimates, and so 364 not equalling 360 doesn't bother me much in the end.

      Anyone care to correct anything?

      Also, can someone explain to me why Cray's Redstorm won't kick this things ass performance wise. Redstorm should have 10k processors, but they are 64bit Opteron 2.4ghz processors with 4x the ram per node. These, I believe, are 700mhz processors.

      I'm confused because Redstorm only has a theoretical peak of about 40tflops off the 10k nodes. IBM's system at 10k should have a peak at around 30tflops. I'm wondering how 64bit opterons at more than 3x the speed could only be 10flops faster at 10k nodes than 700mhz 32bit PPCs with 1/4 the ram. Can someone please explain?

      Also, anyone know why IBM isn't using HPCC. Cray has been using it for the XD1s. I'm guessing the reason IBM hasn't posted results is because they can't even come close in sustained memory bandwidth, mpi latency, and other tests. That's just a guess though. I'd love to hear from someone that actually knows about this stuff.

    2. Re:Math Error? by RaffiRai · · Score: 2, Informative

      I might fathom that the layout/grid computing data-flow arrangement might have as much, if not more, effect than the sheer number of processors when you're workong on something like that.

      It seems to me that since the device isn't complete the data management isn't working under optimal conditions.

  7. Wait another year... by Anonymous Coward · · Score: 5, Interesting
    That's like, what, 527 Cell processors?

    Obviously that number's based on an unrealistic, 100% efficient scaling factor. But still. The 137 TFlop is coming from 64,000 processors.

    It's fun to think about what's just around the corner.

    1. Re:Wait another year... by Shag · · Score: 2, Insightful

      Well yeah, it's a lot of processors. But that's part of the point - these are very low-power, practically embedded-spec, PowerPC chips, so IBM can throw N+1 of them into a system and wind up with something that uses less power than one Big Complex Chip from a competing supplier, yet computes faster, or something like that.

      Given the size and complexity of the Cell, 527 of them might present some cooling problems. (Or cogeneration opportunities, if you hook a good liquid cooling system to a steam turbine...)

      --
      Village idiot in some extremely smart villages.
    2. Re:Wait another year... by imsabbel · · Score: 2, Informative

      Well, too be fair, cell uses some fairly stoned tricks to get to that kind of peak power. (the massive memory bandwith is only to small local memories (everywhere else you would call them cache) and the main memory bandwith is laughable compared to the computing resources.
      Although linpack is very nice to parallize, i dont think it would be possible to even get 10% of the theoretical rate on a cell.

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  8. Maybe it will be able to... by EvanED · · Score: 4, Funny

    ...host a spell check for Slashdot! ...as being more indicative of how a HPC system will peform with various different types of applicatoins."

    1. Re:Maybe it will be able to... by CoolGopher · · Score: 4, Funny

      I'm sure you mean a spelling checker. It's WoW that needs the spell check... combat resurrection for priests, anyone?!

  9. Windows HPC by Cruithne · · Score: 4, Funny

    Oh man, I *so* wanna put Windows HPC on this thing!

    1. Re:Windows HPC by ozmanjusri · · Score: 2, Funny

      You'll probably need it if you want to turn on the eye candy in Longhorn...

      --
      "I've got more toys than Teruhisa Kitahara."
    2. Re:Windows HPC by Jules+Labrie · · Score: 2, Interesting

      Well, if you had Windows on this machine (but be serious, please !)... This would only be one every 64 nodes. I explain why.

      Blue Gene is known to run Linux. True, but... In fact, there are two types of nodes in Blue Gene. The computing nodes and the IO nodes. There is 1 IO node for 63 computing nodes. So for a 64000 nodes cluster, there are in fact only 1000 processors that runs Linux. The other 63000 are running an ultra light runtime environment (with MPI and other essential things) to maximize the speed. Even Linux is too heavy for that ! So windows would maybe not make the performances so bad... But I don't believe IBM didn't ever considered this option !

  10. Cell vs HPC by adam31 · · Score: 5, Insightful
    The HPC Challenge benchmark is especially interesting and I think sheds some light on the design goals IBM had in coming up with the Cell.

    1) Solving linear equations. SIMD Matrix math, check.
    2) DP Matrix-Matrix multiplies. IBM added DP support to their VMX set for Cell (though at 10% the execution rate), check.
    3) Processor/Memory bandwidth. XDR interface at 25.6 GB/s, check.
    4) Processor/Processor bandwidth. FlexIO interface at 76.8 GB/s, check.
    5) "measures rate of integer random updates of memory", hmmmm... not sure.
    6) Complex, DP FFT. Again, DP support at a price. check.
    7) Communication latency & bandwidth. 100 GB/s total memory bandwidth, check (though this could be heavily influenced on how IBM handles its SPE threading interface)

    Obviously, I'm not saying they used the HPC Challenge as a design document, but clearly Cell is meant as a supercomputer first and a PS3 second.

    1. Re:Cell vs HPC by shizzle · · Score: 3, Interesting
      2) DP Matrix-Matrix multiplies. IBM added DP support to their VMX set for Cell (though at 10% the execution rate), check.
      [...]
      ...clearly Cell is meant as a supercomputer first and a PS3 second.
      I think you've refuted your own argument there: double precision floating point performance is critical for true supercomputing. (In supercomputing circles DP and SP are often referred to as "full precision" and "half precision", respectively, which should give you a better idea of how they view things.)

      In contrast, SP is plenty of accuracy for things like rendering and game physics, since (very loosely speaking) as long as you're within a fraction of a pixel of the right answer you don't need any more accuracy.

      I'd say the Cell architecture is very well suited for supercomputing as well as gaming, but the announced Cell implementation appears to me to be clearly targeted at the PS3. They'll have to come out with a "Cell HPC Edition" that has much better DP performance before they take over supercomputing. Not that I don't expect that they're working on that as we speak...

    2. Re:Cell vs HPC by tarpitcod · · Score: 2, Interesting

      I don't think they thought that at all (Let's build a supercomputer). I think the natural problem they were trying to solve.

      This is because when you have the following conditions:

      -- Lots of memory bandwidth needed
      -- Fast floating point
      -- Parallelizable code
      -- Hand tuned kernels OK

      You end up with something that looks lots like a supercomputer. You just turned your compute bound problem into an IO bound problem. We may want to revise that saying -- and say 'You turned your compute bound problem into a coding problem'. Supercomputer performance seems more bound by the feasibility of extracting decent performance from the iron than it used to be -- Judging by the stuff I have read by the old-hands.

  11. Pics by identity0 · · Score: 4, Informative

    I found it odd that there aren't any pics of the machine on those sites, so I looked around... Here are some pics of the prototype at top, and the finished version at bottom. It looks like it's going to be in classic "IBM black", like the 2001 monolith : )

    Some more pics of the prototype.

    For comparison, the Earth simulator and big mac.

    Anyone know what kind of facilities blue gene will be housed at? The one for the earth simulator looks like something out of a movie, IBM better be able to compete on the 'cool factor'. : )

    And does anyone else get the warm and fuzzy feelings from looking at these pics, even though there's nothing you could possibly use that much power for? Ahhh, power...

    1. Re:Pics by mankei · · Score: 2, Insightful

      > And does anyone else get the warm and fuzzy feelings from looking at these pics, even though there's nothing you could possibly use that much power for?

      Someone told me that it took the Earth simulator about a week to simulate air flow past a truck. I get warm and fuzzy feelings by simply looking at stuff around myself and appreciating the mind-boggling complexity of Mother Nature.

    2. Re:Pics by Anonymous Coward · · Score: 2, Informative

      I work at IBM in Rochester Minnesota where the machines are built and housed and I have seen the machines that are being shipped around and installed at Lawrence Livermore and other places... it is an awesome sight. VERY LOUD with the hugs fans above it, and the floor in the building had to be dug down 4 feet to allow for the cabling and air ducts to run underneath everything. What most surprised me is not how fast it is, but how well they were able to get it to scale by using fairly low power processors. The power is not in the processors in it, which are low power to conserve electrocity and heat, but in how amazingly well it scales.

  12. Re:But What about the Crays? by yennieb · · Score: 2, Interesting

    You're confused and lost. According to the top 500 rankings referenced by the article, the highest ranking Cray (an X1) puts out less than 6 TFLOPS.

    So try... a cluster of 25+ X1s and then we'll talk =)!

  13. Re:But What about the Crays? by Hungry+Admin · · Score: 4, Informative

    Not all problems are going to be solved faster by parallel computation. Some problems will be better solved on the 6 Tflop machine than with 10,000 slower CPUs.

    --
    Be who you are and say what you feel, because the people who mind don't matter, and the people who matter don't mind.
  14. More than Teraflops by gtsili · · Score: 2, Interesting

    What it would also be interesting is the power consumption and heat production figures of those systems when idle and under heavy load and also the load statistics.

    In other words what is the cost in the quest for performance?

  15. Can anyone please... by camcorder · · Score: 2

    ...explain why those genetic reseach need that much amount of cpu power? What calculations take that long to process so they need to build fastest computers. And also, are they sure that the programmers working at research labs are optimizing thier codes effectively so maybe the work done on those computers can be done w/ 1/4th of that current power?

    1. Re:Can anyone please... by kromozone · · Score: 3, Informative

      Think of all the charges in a protein composed of hundreds of amino acids, each composed of dozens of atoms. Now imagine those charges ineracting during protein folding, in a solution. Let's say that process takes a few miliseconds. Now imagine modeling this process at the femtosecond resolution. This system is severely underpowered.

    2. Re:Can anyone please... by overunderunderdone · · Score: 2, Informative

      Because the mechanism by which the DNA blueprint is actually used to create proteins (in a mechanism that itself uses proteins) is *spectacularly* complex.

      "IBM estimates that the folding model for a 300-residue protein will encompass more than one billion forces acting over one trillion time steps. Even for Blue Gene, modeling such a folding process is expected to take about a year of around-the-clock processing."

    3. Re:Can anyone please... by Obfiscator · · Score: 2, Informative

      I prefer Monte Carlo techniques with special folding moves, personally, but I don't think either method will be solving this problem anytime soon.

      What kind of force field do you want to use? CHARMM or AMBER? Well, it might work. From an unfolded protein, though, some of your atoms will undergo fairly drastic changes in environment. Better use a polarizable model. Oh, crap, there's another order of magnitude in expense, and you still may not get the right answer unless your force field is parameterized in a clever way that may only work properly for one class of proteins. What's the functional form? Lennard-Jones? Exp-6? It better be able to reproduce hydrogen bonding accurately. Do you have the correct bond stretching, bond bending, and torsional potentials? A lot of interactions in those systems, and each set has a very complicated potential energy surface with a lot of minima. It's gonna be tough to find the correct one.

      Let's get rid of parameters and use quantum mechanical energy calculations, you say? That cuts down your system size to 64 molecules and maybe 20 ps of trajectory if you use a method like CPMD and density functional theory with no Hartree-Fock exchange...and there is no protein in the system, just water. How long does it take a protein to fold? Nanoseconds? Microseconds? So you can forget about doing a protein like this, and besides, current density functionals give a worse answer for most physical properties of water than cheap empirical models. So create a new functional to reproduce the properties of water? It's being tried, and it could work very nicely. Now you just have to worry about the protein. DFT is nice in that you won't have to worry about assigning charges or intramolecular parameters to the protein, but if it struggles this much with water...well, let's say I don't have much faith in it to nail a protein on the first try. On top of that, I don't think I've even seen a single point calculation done on a full protein with DFT (QM/MM, yes, but not just DFT), much less the thousands that you'd need for a simulation.

      Nope, protein folding isn't going to give us any answers for years. Tell you what, though: you can keep working on proteins, and I'll try and get water correct. Water is a difficult enough molecule. Maybe by the time we finally get water right, we'll have the resources we need to do a full protein.

      I'm not trying to dissuade you from continuing your research. I'm just trying to be realistic about what we can expect and when we can expect it. Good luck to you.

      --
      "Nothing shocks me. I'm a scientist." -Indiana Jones
  16. Could encourage poor products? by mcraig · · Score: 2, Insightful

    So what do people think assuming speeds continue to leap ahead in the desktop arena, will it simply encourage further sloppy programming. After all if the choice is to optimise your product for a month to save a few Gigaflops or get it out into the market and so what if its a bit resource hungry, I imagine many teams will get pushed to release sooner rather than later.

  17. One in every home by BeerCat · · Score: 2, Interesting

    Several decades ago, a computer filled an entire room, and "I think there is a world market for maybe five computers"

    A few decades ago, people thought Bill Gates was wrong when he reckoned there would soon be a time when there was a computer in every home.

    Now, a supercomputer fills an entire room. So how long before someone reckons that there will come a time when there will be a supercomputer in every home?

    --
    "She's furniture with a pulse"
    1. Re:One in every home by TheRealFoxFire · · Score: 2, Informative

      The definition of a supercomputer is a moving line. At any given time, a supercomputer is usually just a machine with an order of magnitude more CPU throughput than a PC. This neglects things supercomputers have that desktops don't like massive I/O capabilities, but in terms of CPU performance, today's desktops are usually as fast as the past's supercomputers.

  18. Top 500 and Auto Racing by ch-chuck · · Score: 4, Insightful

    It could be that the competition for the top of the 500 slot is becoming less of technological achievement and more of just who has the most $$$ to spend. Just like auto racing used to be about improvements in engines and transmissions etc but after a point everybody could make a faster car just by buying more commonly available, well known technology than the other guys. So they put in limitations for the races, only so big a venturi, displacement, etc.

    Anyway, my point is - it's becoming just "I can afford more processors than you can so I win" instead of the heyday of Seymore Cray when you really had to be talented to capture the #1 spot from IBM.

    --
    try { do() || do_not(); } catch (JediException err) { yoda(err); }
    1. Re:Top 500 and Auto Racing by ShadowFlyP · · Score: 5, Informative

      I think your comparison here is quite unfair to the technological accomplishments of BlueGene/L. This is not simply a case of IBM "throwing more processors" at the problem, but BlueGene is a technological leap over other supercomputers. Not only is BlueGene faster, than for instance the Earth Simulator, but it also consumes FAR LESS power (which in turn minimizes the energy wasted cooling the thing) and takes up much less space. From an article published when BlueGene first overcame the Earth Simulator: "Blue Gene/L's footprint is one per cent that of the Earth Simulator, and its power demands are just 3.6 per cent of the NEC supercomputer." http://www.theregister.co.uk/2004/09/29/supercompu ter_ibm/ So, I say to you, NO! The top 500 race is not simply big companies throwing money at a problem (well, it sort of is), but there is quite a lot of technical accomplishment going on here. You could argue that the people involved may not have the brilliance of Seymore, but they sure do have real talent.

  19. Scalar performance -- Unimpressed! by tarpitcod · · Score: 3, Interesting

    What's the scalar performance of one of these beasties?

    Can an Athlon 64 / P4 beat it on scalar code? The whole HPC world has gotten boring since Cray died. Here's why I say that:

    The Cray 1 had the best SCALAR and VECTOR performance in the world.

    The Cray 2 was an ass kicker, the Cray 3 was a real ass kicker (if only they could build them reliably).

    Cray pushed the boundaries, he pushed them too far at some points -- designing and trying to build machines that they couldn't make reliable.

    So it'll be a cold day in hell before I get all fired up over the fact that someone else managed to glue together a bazillion 'killer micros' and win at Linpack...
    Now if someone would bring back the idea of transputers, or we saw some *real* efforts at Dataflow and FP then I'd be excited. I'd love a PC with 8 small, simple, fast, in-order tightly bound cpus. Don't say CELL, all indications are that they will be a *real* PITA to program to get any decent performance out of.

  20. Why BlueGene kicks RedStorm's ass on Linpack by RalphBNumbers · · Score: 4, Informative

    Well, it comes down to a few different things.

    First off, Opterons are pretty mediocre at double precision floating point benchmarks, it just isn't what they were designed for. Opterons effectively have only a single FPU (technically they have two, but one only does addition, while the other handles all multiplies), while most competing chips in the HPC arena have two full FPUs. They tend to get spanked by PPCs and Itanium2s, and even Xenons can do better.

    Also, you should note that the modified PPC440s in BlueGene have a disproportionate amount of floating point resources. Making them about equivalent to the 970 in that area mhz for mhz, despite being massively outclassed in integer and vector ops. And the floating point units on those 440s are full 64-bit units (as fpus are on many other ostensibly 32 bit chips, as the bit width of a fpu has nothing to do with the integer units and mmus being 32-bit). Plus the PPC has a fused multiply-add instruction, allowing it to theoretically finish 2 FLOPS/unit/cycle, instead of just one.

    And finally, you should know that individual nodes' ram sizes matter very little for Linpack.

    When you take all that together, it's not too surprising that 700Mhz PPC440s with 2 64-bit FPUs each finishing up to 2 FLOPs/cycle (at least 2 of which must be adds) would perform on par with 2.xGhz Opterons finishing a total of 2FLOPs/cycle (at least one of which has to be an add).

    --
    "The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge
  21. Blinking Lights by DumbSwede · · Score: 2, Funny
    Slashdot is still using the 1976 Cray-1 as the icon for supercomputing, and I think its safe to say supercomputing styling has gone down hill since. Not that these things should be like cars, though here at Slashdot we tend to salivate over them like they were. Don't get me started on people who are into case mods.

    I remember seeing a news article on TV recently about NASA and their upgrades to computer horse power for doing flight simulations and design work. The picture they showed? A late 80's connection machine. You know the beast, 8 black cubes glued together to make one big cube with hundreds of blinking LEDs over the faces, one for each of the 65536 simple processors. Sort of a Borg at Christmas time affair. Stock footage to be sure, and the news outlets trot it out every time the word supercomputer is used. At least they've quit showing IBM Model 726 Tape Units spinning reel-to-reel tapes back and forth as a show of awesome computing power.

  22. Does anyone realize... by suitepotato · · Score: 2, Funny

    ...that by the time Duke Nukem Forever launches, this will be the level of computing power on every desktop? I can hardly wait for Windows mean-time-to-failure to be measured in femtoseconds.

    --
    If my grammar and spelling are off, I am [distracted/tired/careless] (take your pick)