Has Supercomputing Hit a Brick Wall?

← Back to Stories (view on slashdot.org)

Has Supercomputing Hit a Brick Wall?

Posted by timothy on Tuesday May 14, 2013 @04:11AM from the complicating-factors dept.

anzha writes "Horst Simon, Deputy Director of Lawrence Berkeley National Laboratory, has stood up at conferences of late and said the unthinkable: supercomputing is hitting a wall and will not build an exaFLOPS HPC system by 2020. This is defined as one that passes linpack with a performance of one exaFLOPS sustained or better. He's even placed money on it. You can read the original presentation here."

8 of 185 comments (clear)

Min score:

Reason:

Sort:

No? by oGMo · 2013-05-14 04:19 · Score: 4, Informative

"Japan to develop new exaflop computer by 2020" ... why not? And if it's even a few microseconds into 2021 I suppose that supercomputing has failed, will pack up, and go home.

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
1. Re:No? by gentryx · 2013-05-14 05:18 · Score: 5, Informative
  
  Power consumption and MTBF: power consumption (high operating costs) be solved perhaps be solved by a larger budget, but the mean time between failures (MTBF) means, that the machine will fail before it can compute anything meaningful. Right know the machines we build, and even more importantly, the software we build rely on all parts of the machine to function. If even a single node fails, then the data it holds becomes inaccessible and the rest of the compute job crashes like a house of cards.
  This can be remedied by taking frequent snapshots and then restarting from the last snapshot, but the time for checkpoint/restart has been continuously growing for the last systems. No one really expects exascale systems to do full system checkpoint/restart in a reasonable time frame. They'd spend more time taking snapshots than actually computing.
  Source: I'm doing my PhD in supercomputing.
  
  --
  Computer simulation made easy -- LibGeoDecomp
Re:Ha, not the first by ssam · 2013-05-14 04:31 · Score: 5, Insightful

moore's law only talks about transistor counts. building a supercomputer means getting thousands of CPUs to cooperate which is a much harder challenge.
Anyone (with a large wallet) can stick an exoflop worth of CPUs in a large room. by 2020 you'll be able to do that with a not so large wallet. but that does not result in a useful exoflop computer
Clarke's Three Laws by Tokolosh · 2013-05-14 04:45 · Score: 5, Interesting

Clarke's Three Laws are three "laws" of prediction formulated by the British writer Arthur C. Clarke. They are:
1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
3. Any sufficiently advanced technology is indistinguishable from magic.

--
Prove anything by multiplying Huge Number times Tiny Number
Re:Ha, not the first by fuzzyfuzzyfungus · 2013-05-14 04:56 · Score: 5, Insightful

It's a particular nuisance because the speed of light is pretty strictly enforced...
Even if you went full-on-nuts and replaced fiber interconnects with little tubes full of hard vacuum, to squeak out that slight improvement over the speed of light in glass or air, you'll still see latency that meaningfully hinders the cooperation of multi-GHz CPUs and RAM across systems of any nontrivial size.
For loosely coupled problems, that barely matters; but not all problems are loosely coupled.
Re:Ha, not the first by fuzzyfuzzyfungus · 2013-05-14 05:37 · Score: 4, Insightful

I'm no expert on the refined world of supercomputers; but my money would be on latency. If you are made of money, bandwidth is a problem that you can substantially brute force. Not 100% efficiently; and layout gets to be a real headache; but if the state of the art in serial interconnects isn't good enough, you can bolt a bunch of them together and have a parallel interconnect(it'll be harder to do board layout for, the wiring will suck more, and it'll cost more; but the major sticking point is money).
If you want to cut latency, even the most exotic photonics-on-die-with-hollow-fiber arrangement imaginable still gives you surprisingly short distances before you start losing CPU cycles to waiting for the return photon.
The Nanosecond by wcrowe · 2013-05-14 05:57 · Score: 4, Interesting

Back in the early 80's I got the opportunity to hear Grace Hopper speak. One of the stories she used to like to tell at her talks was about the time that she was having trouble visualizing a nanosecond. Eventually she sent a memo to her engineers which said, "Please send up one nanosecond." She waited, curious as to how they would respond. After a couple of days a response came back in the form of a metal rod 11-3/4 inches in length with the note attached, "One Nanosecond", and no other explanation. After puzzling over the metal rod she called down to the engineering department and asked, "I give up, what is it"? "That's the distance light travels in a nanosecond", was the response. Later, she sent another memo to the engineers with the request, "Please send up one picosecond." The engineers immediately responded with a memo instructing her to, "put the nanosecond in a pepper grinder and you can make picoseconds all over your desk."
Grace Hopper's humorous anecdote underlines the serious problems faced by researchers when they push the boundaries. In her case, it was a real concern over how far a bit can travel at the speed of light. I have no idea if that has any bearing on the exascale problem, but it might illustrate the kinds of problems they might be running into.

--
Proverbs 21:19
so what? by markhahn · 2013-05-14 06:05 · Score: 4, Insightful

I'm an HPC professional, and do not see much value in these "hero" machines. Yes, you can go on all you want about the march of progress and tier-1 and grand challenges, but you're just reiterating an unquestioned manifest destiny-based view of history. Why do we need an Exaflop machine? is it because some particular set of applications need it? where is the threshold for those applications where the compute facility will be fast enough to achieve some breakthrough?
it's hard to find areas that are primarily limited by compute facilities. for instance, genetics/proteomics/metabilomics/whatever are *not* compute-limited, especially at the high end. they're laboratory-limited, the same way weather simulations are good and getting better, but not past the quality of their input data.
we need more compute in general, but not necessarily in one machine. a single exaflop machine will cost much more than a thousand petaflop machines. letting a thousand flowers bloom is much prettier than one excruciatingly beautiful flower...
and no, hero machines do not provide an efficient way to improve the tech of lesser or later machines. they have to be justified by their own need.