IBM To Build 3-Petaflop Supercomputer
angry tapir writes "The global race for supercomputing power continues unabated: Germany's Bavarian Academy of Science has announced that it has contracted IBM to build a supercomputer that, when completed in 2012, will be able to execute up to 3 petaflops, potentially making it the world's most powerful supercomputer. To be called SuperMUC, the computer, which will be run by the Academy's Leibniz Supercomputing Centre in Garching, Germany, will be available for European researchers to use to probe the frontiers of medicine, astrophysics and other scientific disciplines."
Both the Chinese machine and the German machine are not cutting edge designs. They represent what you can do with near commodity hardware and good but not fully custom packaging. They may look like top end machines today, but by 2012 they will not be in the top ten.
Why is Snark Required?
Once upon a time, supercomputers were bunches of general-purpose cpu's, and you made them faster by connecting up more of them.
Now people have realized that massively parallel special purpose chips (like Cell and, even more so, GPU's) can be used to do general-purpose computing, and have started to add those to clusters. But those chips have a lower bandwidth:flops ratio than the x86 etc. CPU's that have been historically used; the gap between a computer's "peak" FLOPS (on an ideal job with no communication requirements to either other nodes or to memory) and the performance it actually achieves is wider using something like CUDA than on a standard supercomputer. CUDA machines are so bandwidth-limited that people use rather hairbrained data compression schemes to move data from place to place, just because all the nodes have extra compute power lying around anyway, and the bottleneck is in communication. (The example that comes to mind is sending the coefficients of the eight generators of an SU(3) matrix rather than just sending the eighteen floats that make up the damn matrix. It's a lot of work to reassemble, relatively speaking, but it's worth it to avoid sending a few bits down the wire.)
CUDA is wonderful, and my field at least (lattice QCD) is falling over itself trying to port stuff to it. Even though it falls far short of its theoretical FLOPS, it's still a hell of a lot faster than a supercomputer made of Opterons. But we shouldn't fool ourselves into thinking that you can accurately measure computer speed now by looking at peak FLOPS. It makes the CUDA/Cell machines look better than they really are.
I believe May's Law is the one you're referring to; a corollary to Moore's Law, stating that software efficiency halves every 18 months (or two years).