The Supercomputer Race
CWmike writes "Every June and November a new list of the world's fastest supercomputers is revealed. The latest Top 500 list marked the scaling of computing's Mount Everest — the petaflops barrier. IBM's 'Roadrunner' topped the list, burning up the bytes at 1.026 petaflops. A computer to die for if you are a supercomputer user for whom no machine ever seems fast enough? Maybe not, says Richard Loft, director of supercomputing research at the National Center for Atmospheric Research in Boulder, Colo. The Top 500 list is only useful in telling you the absolute upper bound of the capabilities of the computers ... It's not useful in terms of telling you their utility in real scientific calculations. The problem with the rankings: a decades-old benchmark called Linpack, which is Fortran code that measures the speed of processors on floating-point math operations. One possible fix: Invoking specialization. Loft says of petaflops, peak performance, benchmark results, positions on a list — 'it's a little shell game that everybody plays. ... All we care about is the number of years of climate we can simulate in one day of wall-clock computer time. That tells you what kinds of experiments you can do.' State-of-the-art systems today can simulate about five years per day of computer time, he says, but some climatologists yearn to simulate 100 years in a day."
Is how many libraries of congress it can read in a fortnight.
Don't hold your breath; it'll disrupt the predictions.
Simulate 100 years of climate in a day? Here's my code:
echo -e "sunny\nrainy\ncloudy" | rl -rc 36525
Ask me about repetitive DNA
Sadly, while predicting the weather and better understanding it ultimately helps a lot of people, I suspect a LOT more computing power is thrown at more mundane things like predicting where the financial markets are going to be based on a gazillion data inputs. Probably even better funded are the vast datacenters around the world that fondle communications and other data for the spymasters. I doubt those computing resources are represented in the annual supercomputing lists. :)
But.. The whole point is to test the model, and the models change, don't they? Surely we're not just simulating more "years" of climate with the current batch, but improving resolution, making fewer simplifying assumptions, and hopefully, finding ways to do the exact same operations with fewer cycles.
How can you possibly evaluate supercomputers in any other way except how many mathematical operations can be performed in some reference time? And.. some serial metric if the math is highly parallel, since just reducing the size of vectors in those cases wouldn't actually result in those flops being useful for other tasks.
Can you be Even More Awesome?!
A quality HPC vendor will give you the opportunity to benchmark your application before you buy a system or cluster. Most will have standard codes installed, but you should also be able to arrange for a login to build and run your own code on their test clusters. This is the only way to guarantee you're getting the best bang per buck, because the bottleneck in your particular applications may be memory, IO, interconnect, CPU, chipset, libraries, OS... An HPC cluster can be a big purchase, and it performance and reliability can make or break careers. Don't trust generalized benchmarks unless you know that they accurately reflect your workload on the hardware you'll be purchasing.
The mondo-flop race,
As the hair on your face,
You yearn to displace,
So do it with grace.
Burma Shave
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
I write massively parallel scientific code that runs on these supercomputers for a living... and this is what I've been preaching all along.
The thing about RoadRunner and others (such as Red Storm at Sandia) is that they are special pieces of hardware that run highly specialized operating systems. I can say from experience that these are an _enormous_ pain in the ass to code for... and reaching anything near the theoretical computing limit on these machines with real world engineering applications is essentially impossible... not too mention all of the extra time it costs you in just getting your application to compile on the machine and debug it...
My "day-to-day" supercomputer is a 2048 processor machine made up of generic Intel cores all running a slightly modified version of Suse Linux. This is a great machine for development _and_ for execution. My users have no trouble using my software and the machine... because it's just Linux.
When looking at a supercomputer I always think in terms of utility... not in terms of Flops. It's for this reason that I think the guys down at the Texas Advanced Computing Center got it right when they built Ranger ( http://www.tacc.utexas.edu/resources/hpcsystems/#constellation ). It's about a half a petaflop... but guess what? It runs Linux! And is actually made up of a bunch of Opteron cores... the machine itself is also a huge, awesome looking beast (I've been inside it... the 2 huge Infiniband switches are really something to see). I haven't used it myself (yet), but I have friends working at TACC and everyone really likes the machine a lot. It definitely strikes that chord between ultra-powerful and ultra-useful.
Friedmud
...ever looked at gaming benchmarks? Server benchmarks? Productivity benchmarks? Rendering benchmarks? In fact, any kind of benchmark? Seen how they all differ depending on the product and test run? Same with supercomputers, you got some synthetic benchmarks, and you got some real world benchmarks. But the weather simulation may not be a relevant benchmark at all if you're doing nuke simulations or gene decoding or finite deformation or some other kind of simulation. Synthetics are the lowest common denominator - you'd rather see benchmarks in your field, and most of all benchmarks with your exact application. That doesn't change that those are individual wants and synthetic benchmarks are the only ones with any value to everyone.
Live today, because you never know what tomorrow brings
44 years ago 1-5 megaflops was hot! What excitement we felt when the CDC6600 was installed at my university!
Back in '85 I was part of a startup building a mini-Cray, reimplementing the Cray instruction set in a smaller, cheaper box. I remember we focused on the Whetstone benchmark a lot, and it turned out that the Whetstone code really was bound up by moving characters around while formatting output strings, etc. We paid very careful attention to efficiently coding the C library string handling routines, and that got us more performance payback than anything we could do to optimize the arithmetic. One needs to understand the benchmark being used.
Just with a lot more dollars behind it...
Every one remotely engaged in Top500 systems knows how very specific the thing being measured is. It's most sensitive to the aggregate clock cycles and processor architecture, and not as sensitive to memory throughput/architecture or networking as many real world things are.
http://icl.cs.utk.edu/hpcc/
Is an attempt to be more comprehensive, at least, by specifying a whole suite of independently scored benchmarks to reflect the strengths and weaknesses of things in a more holistic way. Sure, it's still synthetic, but it can give a better 'at-a-glance' indicator of several generally important aspects of a supercomputer configuration.
The thing probably inhibiting acceptance of this is that very fact, that it is holistic and the winner 'depends' on how you sort the data. This is excellent for those wanting to more comprehensively understand their configurations standing in the scheme of things, but hard for vendors and facilities to use for marketing leverage. Being able to say 'we built *the* fastest supercomputer according to the list' is a lot stronger than 'depending on how you count, we could be considered number one. Vendors will aggressively pursue pricing knowing about the attached bragging rights, and facilities that receive research grant money similarly want the ability to make statements without disclaimers.
Rest assured, though, that more thorough evaluations are done and not every decision in the Top500 is just about that benchmark. For example, AMD platforms are doing more strongly than they would if only HPL score is counted. AMD's memory performance is still outrageously better than Intel and is good for many HPC applications, but Intel's current generation trounces AMD in HPL score. Of course, Intel did overwhelmingly become popular upon release of their 64-bit core architecture based systems, but still..
XML is like violence. If it doesn't solve the problem, use more.
Most of the locations listed are mostly educational institutions, r&d centers, and computer companies. The results were probably submitted unofficially. There are few exceptions, but they are just that--few. It makes you wonder what the Big Data companies (Google, Yahoo!, etc) actually have running. They have no reason to participate, after all...
Consider something like Yahoo!'s research cluster. Why isn't it on this list? Why don't they run the tests?
computings Mount Everest - the petaflops barrier
Two bad cliched metaphors in one! Its not a peak, and its not a barrier, just another arbitrary milestone. Who writes this crap? ... a "professional" writer from an industry magazine. That figures.
Oh
This guy should enter the The Bulwer-Lytton Fiction Contest
It's about a half a petaflop... but guess what? It runs Linux!
This sounds kind of nice but why should this make it any easier to write parallel programs for it? You still have to manage hundreds if not thousands of threads, right? This will not magically turn it into a computer for the masses, I guarantee you that. I have said it elswhere but parallel computing will not come of age until they do away with multithreading and the traditional CPU core. There is a way to build and program parallel computers that does not involve the use of threads or CPUs. This is the only way to solve the parallel programming crisis. Until then, supercomputing will continue to be a curiosity that us mainstream programmers and users can only dream about.
Is running relatively stock Fedora (the ppc distribution). True, it's ram hosted, but the OS is hardly specialized in terms of libraries and such. You could say the Cell SDK is a tad specialized, but the underlying platform is not so custom as implied.
In fact, every single Top500 system I've ever touched has been far more typical linux than most people ever expect.
In any event, the most compelling aspect of RoadRunner in my view is the flops/watt. Application developers who can leverage highly parallel clusters are those who have the best shot of taking adequate advantage of something like the cell architecture, which is admittedly a pain for those that are accustomed to no more than 1 or two concurrent heavy-loaded processes or threads.
BTW I still hate the Infiniband cabling with a passion. Even as they've made it less bad over time, it's still a huge connector. Nothing like Quadrics, mind you, but still reminiscent in bulk to 10base5.
As for top500: really, quit with this political joke benchmark. E.g. In molecular simulations alone you will spend on a computer which has either broad memory access for matrix inversions in quantum calculations. Or on a high clock speed, low bandwidth one for MD, which basically does nothing but floating point operations. The score in the top500 will give you 0 information about what machine to choose.
molmod.com - computing tips from a molecular modeling
I just threw away a couple of mod points to bring you this announcement: Climate != weather, climate is the long term statistics of weather. Two different numerical analysis models, both computationally expensive.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
IANAM (I am not a meteorologist)
That's for sure.
Here's an analogy: Say you pour two different colored cans of paint into a bucket and start stirring. Weather is like predicting the exact patterns of swirls that you'll see as the colors mix. Very hard to do looking ahead more than a couple of stirs.
Climate is more like predicting the final color that will result after the mixing is done. Not nearly so intractable. The summary is talking about climate, not weather.
It's fair to criticize Linpack for being a one-trick pony. It measures system performance for dense linear algebra, and nothing else. Jack Dongarra (the guy who wrote Linpack and maintains the top 500 lists) is quite up-front about Linpack's limitations, and he thinks that using a single number as the end-all-be-all of a computer's performance is a bad idea. It's a simple fact of life that certian kinds of computers do better on certain problems. The good guys out at Berkeley even sat down a couple years ago and enumerated all of the problems they found in real-world HPC applications (See the tables on pages 11-12). The real truth here is that people should stop treating Linpack like it's the final measure of system performance. If you are doing pure linear algebra problems, it's a pretty good measurement for your purposes; if you are not, then you use it at your own peril.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
I seem to recall a Nova special I watched many moons ago about "strange attractors" and "fractal behavior" that seemed to indicate that for a large class of complex-valued iterative functions there was a weird phenomenon called the "Butterfly Effect". Apparently... according to this show I saw 20 years ago (and I think that Mandelbrot mentioned it in a lecture I attended a few years later), initial variables which are as intertwined as the rational and irrational numbers can have drastically divergent outcomes in these situations.
It seems that the reason that this was called the Butterfly Effect was actually because the disturbance caused by a butterfly could be enough to change the track of a massive storm some days later. ( Reference)
The fact is that the weather forecasters on the local broadcast channel are less accurate than if they always predicted sun in one study:
"The graph above shows that stations get their precipitation predictions correct about 85 percent of the time one day out and decline to about 73 percent seven days out.
"On the surface, that would not seem too bad. But consider that if a meteorologist always predicted that it would never rain, they would be right 86.3 percent of the time. So if a viewer was looking for more certainty than just assuming it will not rain, a successful meteorologist would have to be better than 86.3 percent. Three of the forecasters were about 87 percent at one day out â" a hair over the threshold for success."
(ref: http://freakonomics.blogs.nytimes.com/2008/04/21/how-valid-are-tv-weather-forecasts/)
It's a wonderful idea that we can model the incredibly complex climate of our huge planet, but I'll believe it once I can trust the weekend forecast before Friday.
Any other ideas about useful purposes to put these huge computers to? Perhaps accounting and auditing for the new Emergency Financial Legislation?
Are you sure about that? You see, you opened a hole so broad that your statement isn't accurate even today. It's always summer and always winter on the earth. So it could be the same temperature in summer and winter in 4008.
Also, you didn't explain what future climate predictions you came to that conclusion over. I might have not been specific enough for a fan boy like you, but I was making the claim to future weather predictions from climate predictions. Of course a prediction is an attempt to explain what the future will be like so it is in the future too. You can conceive the seasonal climate differences in a particular spot from historical reference but not future climate predictions. In other words, you can't claim the climate will be X in the future and then make a claim about the weather from X. Or do you know something the rest of us thinking individuals don't? I mean how do you come to the conclusion that summer will not be as cold as winter by using climate predictions alone?
That's a cop out. I really wish you guys wouldn't get your panties in a knot when someone questions the premise of your faith. The only flaw in my logic is where it hampers with your beliefs.
And the site your talking about uses some false logic and logical fallacies in and of itself. I remember it, in one article, attempting to reference a claim that was recently refuted in order to refute the claim that just refuted the previous claim. Yes, your head should be spinning by now. It's like saying your wrong because of this stuff that your claiming is wrong shows something different. But that's what I would expect from a site pioneered by a NASA scientist who said he knew information he was using was flawed but "exaggeration by scientists had its place when it was necessary to mobilize public opinion." Of course the father of global warming also published his first climate model claiming global cooling was a threat in 1971. And to make things even worse is the political hijacking of the issue and almost all of it's purposed solutions. But like you've said, we have had this talk before.
I wasn't talking about your faith in science. I am talking about your faith in global warming and how it has to be true.
I'm wasn't talking about absolute certainty. I was talking about the ability to predict climate independent of weather. It can't be done which was the point of my post and the reason why separating climate from weather has no bearing on the accuracy claims.
Actually, you don't have evidence in a non-political manner. That is the problem. Hansen fought forever to keep his evidence secrete. It wasn't until someone collected enough of their own to check and find something was wrong before they started opening it up. I have heard the arguments about peer reviewed and all but when a lowly blogger attempts to get the data, he runs into road block after road block, lost data sets, and all of the rest. You have heard the saying, Garbage in Garbage out, well we have no guarentee that there isn't garbage in and until we do, all the work from the political organization, the IPCC, has to be questioned. But, as you will notice by using your expert scientific mouse clicker and go back, I have said nothing disputing global warming now have I. But somehow, pointing out that separating climate from weather with a definition that locks them together and blasting a biased site who's main contributor claims it is perfectly fine to exaggerate because the ends justify the means, you now have me denying science altogether. Well, here is a hint for you, Not trusting the source of something say nothing about it at all. I don't get how if we don't just accept what your deity says without question, we are somehow non-believers and committing blasphemy. In fact, that is so wrong that I'm not sure you can even claim to be scientific in your beliefs. The basic premise of science is to question the answers and test everything.