BlueGene/L Puts the Hammer Down
OnePragmatist writes "Cyberinfrastructure Technology Watch is reporting that BlueGene/L has nearly doubled its performance to 135.3 Teraflops by doubling its processors. That seems likely to keep it at no. 1 on the Top500 when the next round comes out in June. But it will be interesting to see how it does when they finally get around to testing it against the HPC Challenge benchmark, which has gained adherents as being more indicative of how a HPC system will peform with various different types of applicatoins."
Lets see if I can get this right. I'm going to talk a little bit out of my ass now, but here goes:
Every 512 node backplane has a peak of 1.4tflop/s according to their design doc.
The 32k system benched in at 70. The theoretical peak was 91, so the actual performance is about 77 percent of the peak, which is pretty normal.
The 64k system benched in at 135. The peak should be around 182, so that is 74 percent of the peak.
The design goal is for 360 at 64k. I'm guessing that 360 is the peak, because my rough estimate calculations put the of 64k nodes at about 364. Lets be nice and assume 70% of peak in the actual machine. That would indicate around 255tflop/s at 64k nodes of actual performance, assuming the thing scales at about the same rate.
So, they got their math right as long as they are claiming a peak of 360. That's a theoretical max, and so never actually reached. The actual is notably less. My numbers are estimates, and so 364 not equalling 360 doesn't bother me much in the end.
Anyone care to correct anything?
Also, can someone explain to me why Cray's Redstorm won't kick this things ass performance wise. Redstorm should have 10k processors, but they are 64bit Opteron 2.4ghz processors with 4x the ram per node. These, I believe, are 700mhz processors.
I'm confused because Redstorm only has a theoretical peak of about 40tflops off the 10k nodes. IBM's system at 10k should have a peak at around 30tflops. I'm wondering how 64bit opterons at more than 3x the speed could only be 10flops faster at 10k nodes than 700mhz 32bit PPCs with 1/4 the ram. Can someone please explain?
Also, anyone know why IBM isn't using HPCC. Cray has been using it for the XD1s. I'm guessing the reason IBM hasn't posted results is because they can't even come close in sustained memory bandwidth, mpi latency, and other tests. That's just a guess though. I'd love to hear from someone that actually knows about this stuff.
I think your comparison here is quite unfair to the technological accomplishments of BlueGene/L. This is not simply a case of IBM "throwing more processors" at the problem, but BlueGene is a technological leap over other supercomputers. Not only is BlueGene faster, than for instance the Earth Simulator, but it also consumes FAR LESS power (which in turn minimizes the energy wasted cooling the thing) and takes up much less space. From an article published when BlueGene first overcame the Earth Simulator: "Blue Gene/L's footprint is one per cent that of the Earth Simulator, and its power demands are just 3.6 per cent of the NEC supercomputer." http://www.theregister.co.uk/2004/09/29/supercompu ter_ibm/
So, I say to you, NO! The top 500 race is not simply big companies throwing money at a problem (well, it sort of is), but there is quite a lot of technical accomplishment going on here. You could argue that the people involved may not have the brilliance of Seymore, but they sure do have real talent.