The Problem With the Top500 Supercomputer List
angry tapir writes "The Top500 list of supercomputers is dutifully watched by high-performance computing participants and observers, even as they vocally doubt its fidelity to excellence. Many question the use of a single metric — Linpack — to rank the performance of something as mind-bogglingly complex as a supercomputer. During a panel at the SC2010 conference this week in New Orleans, one high-performance-computing vendor executive joked about stringing together 100,000 Android smartphones to get the largest Linpack number, thereby revealing the 'stupidity' of Linpack. While grumbling about Linpack is nothing new, the discontent was pronounced this year as more systems, such as the Tianhe-1A, used GPUs to boost Linpack ratings, in effect gaming the Top500 list."
Fortunately, Sandia National Laboratories is heading an effort to develop a new set of benchmarks. In other supercomputer news, it turns out the Windows-based cluster that lost out to Linux stumbled because of a bug in Microsoft's software package. Several readers have also pointed out that IBM's Blue Gene/Q has taken the top spot in the Green500 for energy efficient supercomputing, while a team of students built the third-place system.
Now that the Chinese are ahead, there's suddenly a problem with the list/benchmark.
As the article alludes, the big problem with ranking supercomputers via Linpack is that it doesn't advance supercomputer design. The net result is a pissing match over scalability, where winning is dependent upon who can cram the most cores into a single room. The real innovatiors should be recognized for their efforts to reduce space, power and cost, or finding new algorithms to crunch the numbers in more efficient or useful ways.
When you have nothing left to burn you must set yourself on fire
The guide has this to say about supercomputers: "Supercomputers," it says, "are big. Really big. You just won't believe how vastly, hugely, mindbogglingly big they are. I mean, you may think your SGI Challenge DM is big, but that's just peanuts to supercomputers, listen..."
The Top500 has the problem in that many of the systems on there aren't super computers, they are clusters. Now clusters are all well and good. There's lots of shit clusters do well, and if your application is one of them then by all means build and use a cluster. However they aren't supercomputers. What makes supercomputers "super" is their unified memory. A real supercomputer has high speed interconnects that allow direct memory access (non-uniform with respect to time but still) by CPUs to all the memory in the system. This is needed in situation where you have calculations that are highly interdependent, like particle physics simulations.
So while you might find a $10,000,000 cluster gives you similar performance to a $50,000,000 supercomputer on Linpack, or other benchmark that is very distributed and doesn't rely on a lot of inter-node communication, you would find it falls flat when given certain tasks.
If we want to have a cluster rating as well that's cool, but a supercomputer benchmark should be better focused on the tasks that make owning an actual supercomputer worth it. They are out there, that's why people continue to buy them.
Also, I have seen cases where compiler optimization is smart enough to remove the entire loop if there are no side effects to incrementing i, and it's not used outside the loop.
Most compilers should be doing this. Hell, even IE9 is supposed to do it for JavaScript now. It gets great scores on SunSpider because of it (the JIT can throw away entire tests).
In other supercomputer news, it turns out the Windows-based cluster that lost out to Linux stumbled because of a bug in Microsoft's software package.
As it should. That's not news; that's how the game is played. If your software is buggy, and those bugs drag your performance far enough down, you don't deserve a top500 spot.
If they fix their software, rerun the test, and perform better than Linux, then they will have won that battle (the battle for the top500 spot, not the battle for market share) fair and square.
Let q be a radix > 1. I am in ur base-q, killing 10 d00ds.
Yes, noticed that.
Here's the actual benchmark used for Top500: "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". It solves linear equations spread across a cluster. The clustered machines have to communicate at a high rate, using MPI 1.1 message passing, to run this program. See this discussion of how the algorithm is parallelized. You can't run this on a set of machines that don't talk much, like "Folding@home" or like cryptanalysis problems.
Linpack is a reasonable approximation of computational fluid dynamics and structural analysis performance. Those are problems that are broken up into cells, with communication between machines about what's happening at the cell boundaries. Those are also the problems for which governments spend money on supercomputers. (The private market for supercomputers is very small.)
So, quit whining. China built the biggest one. Why not? They have more cash right now.
The advantage is that, contrary to the arguments of TFA, the test is very representative of scientific and engeneering problems.
No, it really isn't. I work in HPC at a national lab, and our bureaucrats buy these computers based on these benchmark numbers and then expect us to adapt our codes to fit these machines, rather than buying machines that are better suited to the problems we are solving. For example, one of our machines peaked at #2 on the Top500 list, and was essentially useless for real codes. Another machine of ours held the #1 spot for quite a while, and worked well for a small class of problems, but was so limited in functionality that it couldn't even run many of our codes. I've heard similar stories from people using other machines near the top of the Top500.
Real science codes often do not look anything like LINPACK, and the computers that run these benchmarks fast aren't necessarily good for true HPC.