The Supercomputer Race
CWmike writes "Every June and November a new list of the world's fastest supercomputers is revealed. The latest Top 500 list marked the scaling of computing's Mount Everest — the petaflops barrier. IBM's 'Roadrunner' topped the list, burning up the bytes at 1.026 petaflops. A computer to die for if you are a supercomputer user for whom no machine ever seems fast enough? Maybe not, says Richard Loft, director of supercomputing research at the National Center for Atmospheric Research in Boulder, Colo. The Top 500 list is only useful in telling you the absolute upper bound of the capabilities of the computers ... It's not useful in terms of telling you their utility in real scientific calculations. The problem with the rankings: a decades-old benchmark called Linpack, which is Fortran code that measures the speed of processors on floating-point math operations. One possible fix: Invoking specialization. Loft says of petaflops, peak performance, benchmark results, positions on a list — 'it's a little shell game that everybody plays. ... All we care about is the number of years of climate we can simulate in one day of wall-clock computer time. That tells you what kinds of experiments you can do.' State-of-the-art systems today can simulate about five years per day of computer time, he says, but some climatologists yearn to simulate 100 years in a day."
Is how many libraries of congress it can read in a fortnight.
I year for the day when they can predict the weather 48 hrs from now and be 99% accurate. Also, the same with hurricanes.
Simulate 100 years of climate in a day? Here's my code:
echo -e "sunny\nrainy\ncloudy" | rl -rc 36525
Ask me about repetitive DNA
Sadly, while predicting the weather and better understanding it ultimately helps a lot of people, I suspect a LOT more computing power is thrown at more mundane things like predicting where the financial markets are going to be based on a gazillion data inputs. Probably even better funded are the vast datacenters around the world that fondle communications and other data for the spymasters. I doubt those computing resources are represented in the annual supercomputing lists. :)
Imagine a beowulf cluster of these.
But.. The whole point is to test the model, and the models change, don't they? Surely we're not just simulating more "years" of climate with the current batch, but improving resolution, making fewer simplifying assumptions, and hopefully, finding ways to do the exact same operations with fewer cycles.
How can you possibly evaluate supercomputers in any other way except how many mathematical operations can be performed in some reference time? And.. some serial metric if the math is highly parallel, since just reducing the size of vectors in those cases wouldn't actually result in those flops being useful for other tasks.
Can you be Even More Awesome?!
The people who want to simulate 100 years of climate a day will, when they get it, want to simulate 2000 years a day.
A quality HPC vendor will give you the opportunity to benchmark your application before you buy a system or cluster. Most will have standard codes installed, but you should also be able to arrange for a login to build and run your own code on their test clusters. This is the only way to guarantee you're getting the best bang per buck, because the bottleneck in your particular applications may be memory, IO, interconnect, CPU, chipset, libraries, OS... An HPC cluster can be a big purchase, and it performance and reliability can make or break careers. Don't trust generalized benchmarks unless you know that they accurately reflect your workload on the hardware you'll be purchasing.
I write massively parallel scientific code that runs on these supercomputers for a living... and this is what I've been preaching all along.
The thing about RoadRunner and others (such as Red Storm at Sandia) is that they are special pieces of hardware that run highly specialized operating systems. I can say from experience that these are an _enormous_ pain in the ass to code for... and reaching anything near the theoretical computing limit on these machines with real world engineering applications is essentially impossible... not too mention all of the extra time it costs you in just getting your application to compile on the machine and debug it...
My "day-to-day" supercomputer is a 2048 processor machine made up of generic Intel cores all running a slightly modified version of Suse Linux. This is a great machine for development _and_ for execution. My users have no trouble using my software and the machine... because it's just Linux.
When looking at a supercomputer I always think in terms of utility... not in terms of Flops. It's for this reason that I think the guys down at the Texas Advanced Computing Center got it right when they built Ranger ( http://www.tacc.utexas.edu/resources/hpcsystems/#constellation ). It's about a half a petaflop... but guess what? It runs Linux! And is actually made up of a bunch of Opteron cores... the machine itself is also a huge, awesome looking beast (I've been inside it... the 2 huge Infiniband switches are really something to see). I haven't used it myself (yet), but I have friends working at TACC and everyone really likes the machine a lot. It definitely strikes that chord between ultra-powerful and ultra-useful.
Friedmud
...ever looked at gaming benchmarks? Server benchmarks? Productivity benchmarks? Rendering benchmarks? In fact, any kind of benchmark? Seen how they all differ depending on the product and test run? Same with supercomputers, you got some synthetic benchmarks, and you got some real world benchmarks. But the weather simulation may not be a relevant benchmark at all if you're doing nuke simulations or gene decoding or finite deformation or some other kind of simulation. Synthetics are the lowest common denominator - you'd rather see benchmarks in your field, and most of all benchmarks with your exact application. That doesn't change that those are individual wants and synthetic benchmarks are the only ones with any value to everyone.
Live today, because you never know what tomorrow brings
Isn't "The Turk" supposed to be the worlds most powerful computer?
Just with a lot more dollars behind it...
Every one remotely engaged in Top500 systems knows how very specific the thing being measured is. It's most sensitive to the aggregate clock cycles and processor architecture, and not as sensitive to memory throughput/architecture or networking as many real world things are.
http://icl.cs.utk.edu/hpcc/
Is an attempt to be more comprehensive, at least, by specifying a whole suite of independently scored benchmarks to reflect the strengths and weaknesses of things in a more holistic way. Sure, it's still synthetic, but it can give a better 'at-a-glance' indicator of several generally important aspects of a supercomputer configuration.
The thing probably inhibiting acceptance of this is that very fact, that it is holistic and the winner 'depends' on how you sort the data. This is excellent for those wanting to more comprehensively understand their configurations standing in the scheme of things, but hard for vendors and facilities to use for marketing leverage. Being able to say 'we built *the* fastest supercomputer according to the list' is a lot stronger than 'depending on how you count, we could be considered number one. Vendors will aggressively pursue pricing knowing about the attached bragging rights, and facilities that receive research grant money similarly want the ability to make statements without disclaimers.
Rest assured, though, that more thorough evaluations are done and not every decision in the Top500 is just about that benchmark. For example, AMD platforms are doing more strongly than they would if only HPL score is counted. AMD's memory performance is still outrageously better than Intel and is good for many HPC applications, but Intel's current generation trounces AMD in HPL score. Of course, Intel did overwhelmingly become popular upon release of their 64-bit core architecture based systems, but still..
XML is like violence. If it doesn't solve the problem, use more.
Most of the locations listed are mostly educational institutions, r&d centers, and computer companies. The results were probably submitted unofficially. There are few exceptions, but they are just that--few. It makes you wonder what the Big Data companies (Google, Yahoo!, etc) actually have running. They have no reason to participate, after all...
Consider something like Yahoo!'s research cluster. Why isn't it on this list? Why don't they run the tests?
computings Mount Everest - the petaflops barrier
Two bad cliched metaphors in one! Its not a peak, and its not a barrier, just another arbitrary milestone. Who writes this crap? ... a "professional" writer from an industry magazine. That figures.
Oh
This guy should enter the The Bulwer-Lytton Fiction Contest
That an article about featuring IBM supercomputers comes shortly after a few misguided individuals were posting that "IBM is no longer relevant, they are a OEM reseller nowadays" or that they "only make bloated, slow software"
No sig for the moment.
It's about a half a petaflop... but guess what? It runs Linux!
This sounds kind of nice but why should this make it any easier to write parallel programs for it? You still have to manage hundreds if not thousands of threads, right? This will not magically turn it into a computer for the masses, I guarantee you that. I have said it elswhere but parallel computing will not come of age until they do away with multithreading and the traditional CPU core. There is a way to build and program parallel computers that does not involve the use of threads or CPUs. This is the only way to solve the parallel programming crisis. Until then, supercomputing will continue to be a curiosity that us mainstream programmers and users can only dream about.
Is running relatively stock Fedora (the ppc distribution). True, it's ram hosted, but the OS is hardly specialized in terms of libraries and such. You could say the Cell SDK is a tad specialized, but the underlying platform is not so custom as implied.
In fact, every single Top500 system I've ever touched has been far more typical linux than most people ever expect.
In any event, the most compelling aspect of RoadRunner in my view is the flops/watt. Application developers who can leverage highly parallel clusters are those who have the best shot of taking adequate advantage of something like the cell architecture, which is admittedly a pain for those that are accustomed to no more than 1 or two concurrent heavy-loaded processes or threads.
BTW I still hate the Infiniband cabling with a passion. Even as they've made it less bad over time, it's still a huge connector. Nothing like Quadrics, mind you, but still reminiscent in bulk to 10base5.
"State-of-the-art systems today can simulate about five years per day of computer time, he says, but some climatologists yearn to simulate 100 years in a day."
IANAM (I am not a meteorologist) like Mr Loft, so excuse me please if I am wrong, but does not the current state of the art in weather modeling provide something like a 3 day preview of the future, with only 50% accuracy?
I submit that Mr Lofts complaints have much more to do with the current mathematical model limitations, than with the ability of current hardware.
This is not a hardware issue (yet). This is still a mathematical issue, and has not one iota to do with the prowess of any computational hardware.
When Mr Loft presents a 100 year weather model with 50% accuracy, I might begin to worry about whether our CPUs can handle it.
Until then I keep my eyes glued on the Weather Underground.
Number of simulated years per day isn't exactly the metric you want. I can simulate a million years in a minute on my home pc, just not very accurately. As you get more accurate, the sim years/CPU day will decrease.
So knowing the number of simulated years per cpu day doesn't tell you anything unless you know exactly what algorithm you're using.
Give me Classic Slashdot or give me death!
As for top500: really, quit with this political joke benchmark. E.g. In molecular simulations alone you will spend on a computer which has either broad memory access for matrix inversions in quantum calculations. Or on a high clock speed, low bandwidth one for MD, which basically does nothing but floating point operations. The score in the top500 will give you 0 information about what machine to choose.
molmod.com - computing tips from a molecular modeling
90% of Roadrunner CPU time is reserved for the military as I recall.
Great!
But will it run Crysis?
Why don't they code LINPACK in COBOL?
Sig this!
IBM retains their engineering, but sometimes the business decision makers fail to understand the value of that, and try to profit by slapping IBM logo on other company's products.
Let's presume we start with an outsourced generation of products. IBM does it and gets slapped with warranty/service costs/get complaints from the services organizations saying they cannot build quality solutions on random white-box systems, and generally customers see little product differentiation from the companies whose products IBM are reselling at increased price.
Then they have a generation of good, IBM-engineered product that uses their in-house engineering teams to produce product. The products work well, and the customers and IBM services business can consistently build upon them.
Then, they decide it's going so well, if only they could cut costs by outsourcing, and the vicious cycle continues.
I was actually referring to the difference between read as in the system call and read as in what you do with a book.
It's fair to criticize Linpack for being a one-trick pony. It measures system performance for dense linear algebra, and nothing else. Jack Dongarra (the guy who wrote Linpack and maintains the top 500 lists) is quite up-front about Linpack's limitations, and he thinks that using a single number as the end-all-be-all of a computer's performance is a bad idea. It's a simple fact of life that certian kinds of computers do better on certain problems. The good guys out at Berkeley even sat down a couple years ago and enumerated all of the problems they found in real-world HPC applications (See the tables on pages 11-12). The real truth here is that people should stop treating Linpack like it's the final measure of system performance. If you are doing pure linear algebra problems, it's a pretty good measurement for your purposes; if you are not, then you use it at your own peril.
To make laws that man cannot, and will not obey, serves to bring all law into contempt.
--E.C. Stanton
They should just run Vista on them to use as a benchmark. That will effectively flood bottlenecks on all kinds of levels.
We need a new moderation: -1 Wikipedia Googlebomb. Yes we know you can look things up in Wikipedia. But every time you make a link to Wikipedia from Slashdot, Wikipedia goes up in the Google Page Rankings. And then people act all surprised when Wikipedia is in the top ten for every Google search. Every time you link to Wikipedia, it gets a little bit more powerful.
So instead, why not link to some other relevant page? In this case, link to the owner of the Roadrunner supercomputer. You can probably even go to Wikipedia to get the link. If Wikipedia has a great page on something, don't link to it, just put the plaintext name in like this: "Search for IBM_Roadrunner on Wikipedia."
Please everybody, stop linking to Wikipedia. You're destroying the internet.
The thing about climate models is that they get more complex and higher resolution as soon as the computers get faster. We will always take about 3 months to run a simulation. You can run it faster? Make it more detailed. It takes longer? Wait a few months to a year and it'll only take 3 months. Why 3 months? Not sure. Partly because that is about the length of a funding cycle of design experiment, run it, analyze, and write it up.
If you want to run 100 years per day, you can do so with an older model. The EdGCM project has wrapped a NASA global climate model (GCM) in a GUI (OS X and Win). You can add CO2 or turn the sun down by a few percent all with a checkbox and a slider. Supercomputers and advanced FORTRAN programmers are no longer necessary to run your own GCM.
Disclaimer: I'm the project developer.
Space and Computers.
I seem to recall a Nova special I watched many moons ago about "strange attractors" and "fractal behavior" that seemed to indicate that for a large class of complex-valued iterative functions there was a weird phenomenon called the "Butterfly Effect". Apparently... according to this show I saw 20 years ago (and I think that Mandelbrot mentioned it in a lecture I attended a few years later), initial variables which are as intertwined as the rational and irrational numbers can have drastically divergent outcomes in these situations.
It seems that the reason that this was called the Butterfly Effect was actually because the disturbance caused by a butterfly could be enough to change the track of a massive storm some days later. ( Reference)
The fact is that the weather forecasters on the local broadcast channel are less accurate than if they always predicted sun in one study:
"The graph above shows that stations get their precipitation predictions correct about 85 percent of the time one day out and decline to about 73 percent seven days out.
"On the surface, that would not seem too bad. But consider that if a meteorologist always predicted that it would never rain, they would be right 86.3 percent of the time. So if a viewer was looking for more certainty than just assuming it will not rain, a successful meteorologist would have to be better than 86.3 percent. Three of the forecasters were about 87 percent at one day out â" a hair over the threshold for success."
(ref: http://freakonomics.blogs.nytimes.com/2008/04/21/how-valid-are-tv-weather-forecasts/)
It's a wonderful idea that we can model the incredibly complex climate of our huge planet, but I'll believe it once I can trust the weekend forecast before Friday.
Any other ideas about useful purposes to put these huge computers to? Perhaps accounting and auditing for the new Emergency Financial Legislation?
Way to clarify that for the AC out in left field!! [His plug for Transmeta was complete OT, but you managed to use it as a legitimate 'question'.] Congrats! ;-)
Thanks ... I think. ;-)
He had to plug Transmeta because he anticipated your plug for Apple's powerpc-based "I think." That's quite a crystal ball he has. I wonder if he can predict the next 128-bit Hitachi SuperH processor comme!@# oh he's good...
get back to work.
Really I'm an LAM-MPI freak.. let all the processes talk, gather data and not share memory. it skips the pthreads issue, but some would call the mpirun a thread-launcher.
Nope, mpi wont make parallel code a computer for the masses... I dont have ANY clue as to how to pull the CPU out of the equation.
However there are a BUNCH of ways that parallel code can be commoditized. Sure there are limits to what can be parallelized.. but most modern code is begging to be transformed. The OS's need some upgrading to do a slicker job, and I think that the CPU could add some parallel functionality.
Anyway "only way to solve the parallel programming crisis" seems pretty bunk to me.. of course after reading some of your links... you have some "ideal parallel" concept where in the real world "fake parallelism" is quite preferable...
Storm
I remember hearing all about Cell processors. Uh Huh. IBM loves them. But they are a very very specialized beast. I hoped to use one in a PS3 as a scan line renderer. Bad idea. IBM says 'Oh, that should be wonderful'. Except they keep the code to fire up the SPE's. Without it, the Cell runs like a Pentium4 at 1.8 GHz. Problem 2: The memory 'full load' is 256 MB. You can't upgrade (there is no manufacturer that makes bigger and the stock bus won't allow more). For large data sets (large images) its untenable. It cannot deal with that much data at a time, and the data must be dealt with as a single complete piece. The new line of multi-core processors that can use many gigabytes of memory per core are much better suited to this kind of application. IBM will yelp about subdividing the data, but doing a complete rewrite of the application is a lot more work than getting a processor chain suited to the job.
Synthetic benchmarks aren't applicable to all users! Who knew?
TZ
Imagine a Beowulf cluster of these....
The Colombia Supacomputa is still much cooler looking than this.
I'll save those huge computers a few million years: The answer is 42.
Now try producing that on a mere Milliard Gargantubrain.
I am officially gone from
Build a faster computer. User's expands simulation parameters (which bogs down the faster computer). Rinse. Repeat.
'Cause if the Thunderbolt Greaseslapper ain't in it, I ain't interested.
Someone finally comes up with a machine that can handle Vista Ultimate... who else did you think could do it - Gateway?!?
Software is the weak point in supercomputing nowadays. Funding for hardware SO outstrips the development of software that runs on the expensive hardware that it's becoming a SERIOUS problem.
contest over /
apple.com
Something just doesn't jib here. So it takes you 20 days instead of 1 to compute the next hundred years. Why is this a problem? It's still a hundred years in 20 days. Don't you have even a little bit of patience?
The only reason to compute a hundred years in one day is if you're going to restart the computation each morning to see what the next hundred years is looking like. But that means you're throwing away the previous day's computation each morning and starting over. Which means you believe that the previous day's computation is bad. Which means why are you even running it in the first place?
If you can't compute the next hundred years accurately anyway, why don't you work in getting the next 5 years working instead. And hey, you've already got that amount of computing power. And when you do get the thing working right then I can wait an additional 19 days for the answers to the remaining 95 years.
In short, quit yer complaining and get your models running correctly. Then you'll only have to compute it once!
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Moving from PDPs to VAXen. Doesnt seem all that long ago.
About 13, 14 years for each new 1000x level.
"State-of-the-art systems today can simulate about five years per day of computer time, he says, but some climatologists yearn to simulate 100 years in a day."
Strangely enough, I'm reminded of the Heisenberg uncertainty principle. Considering the power and infrastructure requirements to run a supercomputer with that much processing power today, you might change the climate by simulating it.
A group at NASA Ames Research Center created a benchmark suite in the 90's as they measured the relative performance of parallel supercomputers. This benchmark suite is perhaps more useful than LINPACK, if the kernels represent YOUR application. However, like all benchmarks (including LINPACK) companies worked hard to show their results in the "best light", sometimes unrealistically so. After watching the tricks played in reporting benchmark results this paper was written called "Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers" crd.lbl.gov/~dhbailey/dhbpapers/twelve-ways.pdf While the units are MFLOPS and not PFLOPS the points made about what people will do to make their machine look great in benchmark results is still true today.
some climatologists yearn to simulate 100 years in a day
Why does the yearning stop there? Why not yearn to simulate, say, 1000 years in three seconds?
That that is is that that that that is not is not.