Has Supercomputing Hit a Brick Wall?

← Back to Stories (view on slashdot.org)

Has Supercomputing Hit a Brick Wall?

Posted by timothy on Tuesday May 14, 2013 @04:11AM from the complicating-factors dept.

anzha writes "Horst Simon, Deputy Director of Lawrence Berkeley National Laboratory, has stood up at conferences of late and said the unthinkable: supercomputing is hitting a wall and will not build an exaFLOPS HPC system by 2020. This is defined as one that passes linpack with a performance of one exaFLOPS sustained or better. He's even placed money on it. You can read the original presentation here."

15 of 185 comments (clear)

Min score:

Reason:

Sort:

No? by oGMo · 2013-05-14 04:19 · Score: 4, Informative

"Japan to develop new exaflop computer by 2020" ... why not? And if it's even a few microseconds into 2021 I suppose that supercomputing has failed, will pack up, and go home.

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
1. Re:No? by gentryx · 2013-05-14 05:18 · Score: 5, Informative
  
  Power consumption and MTBF: power consumption (high operating costs) be solved perhaps be solved by a larger budget, but the mean time between failures (MTBF) means, that the machine will fail before it can compute anything meaningful. Right know the machines we build, and even more importantly, the software we build rely on all parts of the machine to function. If even a single node fails, then the data it holds becomes inaccessible and the rest of the compute job crashes like a house of cards.
  This can be remedied by taking frequent snapshots and then restarting from the last snapshot, but the time for checkpoint/restart has been continuously growing for the last systems. No one really expects exascale systems to do full system checkpoint/restart in a reasonable time frame. They'd spend more time taking snapshots than actually computing.
  Source: I'm doing my PhD in supercomputing.
  
  --
  Computer simulation made easy -- LibGeoDecomp
Re:Ha, not the first by ssam · 2013-05-14 04:31 · Score: 5, Insightful

moore's law only talks about transistor counts. building a supercomputer means getting thousands of CPUs to cooperate which is a much harder challenge.
Anyone (with a large wallet) can stick an exoflop worth of CPUs in a large room. by 2020 you'll be able to do that with a not so large wallet. but that does not result in a useful exoflop computer
Clarke's Three Laws by Tokolosh · 2013-05-14 04:45 · Score: 5, Interesting

Clarke's Three Laws are three "laws" of prediction formulated by the British writer Arthur C. Clarke. They are:
1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
3. Any sufficiently advanced technology is indistinguishable from magic.

--
Prove anything by multiplying Huge Number times Tiny Number
1. Re:Clarke's Three Laws by tgd · 2013-05-14 06:07 · Score: 3, Funny
  
  You don't seem to understand the "concept" behind "warp."
  You are not exceeding the speed of light, you are just not traveling the linear distance between the two points.
  That's like saying that he doesn't understand the concept behind a Stargate. Made up is made up is made up.
  You can't have an honest discourse on the speed if light when you're trying to involve fiction. You might as well go full star trek and say that thetalon radiation transmorphs subspace and changes the value of C, but only in the presence of an extradimensional rift, and if-and-only-if you have a humpback whale.
Re:Ha, not the first by fuzzyfuzzyfungus · 2013-05-14 04:56 · Score: 5, Insightful

It's a particular nuisance because the speed of light is pretty strictly enforced...
Even if you went full-on-nuts and replaced fiber interconnects with little tubes full of hard vacuum, to squeak out that slight improvement over the speed of light in glass or air, you'll still see latency that meaningfully hinders the cooperation of multi-GHz CPUs and RAM across systems of any nontrivial size.
For loosely coupled problems, that barely matters; but not all problems are loosely coupled.
I just woke up... by Kaenneth · 2013-05-14 05:21 · Score: 3, Funny

And still a little fuzzy headed, but the first thing I though of was arranging the racks for shortest maximim path, instead of one big football field sized room, stacking the datacenter into a cube shape... Then I thoght, "That's probably why Borg ships are Cubes."
Latency not as important as expected by gentryx · 2013-05-14 05:28 · Score: 3

Although latency isn't so much of an issue: the #1 systems of the last ~3 years did all have torus networks (all Blue Genes, all Crays, K computer, too). These networks only perform well for next neighbor communication -- which is fine since most codes running on these machines are simulation codes and they only need this type of communication. If you scale up the system, you'll typically also scale the size of the simulation instance (this is known as "weak scaling").
This means that your program can still spend the same time waiting for the network as it could on a smaller machine. The cables do not need to become shorter.

--
Computer simulation made easy -- LibGeoDecomp
Re:Ha, not the first by fuzzyfuzzyfungus · 2013-05-14 05:37 · Score: 4, Insightful

I'm no expert on the refined world of supercomputers; but my money would be on latency. If you are made of money, bandwidth is a problem that you can substantially brute force. Not 100% efficiently; and layout gets to be a real headache; but if the state of the art in serial interconnects isn't good enough, you can bolt a bunch of them together and have a parallel interconnect(it'll be harder to do board layout for, the wiring will suck more, and it'll cost more; but the major sticking point is money).
If you want to cut latency, even the most exotic photonics-on-die-with-hollow-fiber arrangement imaginable still gives you surprisingly short distances before you start losing CPU cycles to waiting for the return photon.
Re:Ha, not the first by swillden · 2013-05-14 05:46 · Score: 3, Interesting

building a supercomputer means getting thousands of CPUs to cooperate which is a much harder challenge.
Looking at his presentation, that seems to be his point. He concludes that power efficiency is going to become the limiting factor driving design decisions, and that since the power cost of increasing FLOPS has been so much lower than the power cost of moving larger quantities of data we're heading into an era where connectivity costs will so dominate the cost of cycles that cycles will be essentially free.
Hes's then basically arguing that it won't be cost-effective to build data transmission architectures that can effectively utilize exaflops, so no one will bother to build an exaflop machine.
He didn't state it, but if the rest of his arguments are correct, perhaps we're going to see the definition of a new metric for HPC, one that somehow captures the ability of a machine to distribute data to its computation nodes.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
The Nanosecond by wcrowe · 2013-05-14 05:57 · Score: 4, Interesting

Back in the early 80's I got the opportunity to hear Grace Hopper speak. One of the stories she used to like to tell at her talks was about the time that she was having trouble visualizing a nanosecond. Eventually she sent a memo to her engineers which said, "Please send up one nanosecond." She waited, curious as to how they would respond. After a couple of days a response came back in the form of a metal rod 11-3/4 inches in length with the note attached, "One Nanosecond", and no other explanation. After puzzling over the metal rod she called down to the engineering department and asked, "I give up, what is it"? "That's the distance light travels in a nanosecond", was the response. Later, she sent another memo to the engineers with the request, "Please send up one picosecond." The engineers immediately responded with a memo instructing her to, "put the nanosecond in a pepper grinder and you can make picoseconds all over your desk."
Grace Hopper's humorous anecdote underlines the serious problems faced by researchers when they push the boundaries. In her case, it was a real concern over how far a bit can travel at the speed of light. I have no idea if that has any bearing on the exascale problem, but it might illustrate the kinds of problems they might be running into.

--
Proverbs 21:19
so what? by markhahn · 2013-05-14 06:05 · Score: 4, Insightful

I'm an HPC professional, and do not see much value in these "hero" machines. Yes, you can go on all you want about the march of progress and tier-1 and grand challenges, but you're just reiterating an unquestioned manifest destiny-based view of history. Why do we need an Exaflop machine? is it because some particular set of applications need it? where is the threshold for those applications where the compute facility will be fast enough to achieve some breakthrough?
it's hard to find areas that are primarily limited by compute facilities. for instance, genetics/proteomics/metabilomics/whatever are *not* compute-limited, especially at the high end. they're laboratory-limited, the same way weather simulations are good and getting better, but not past the quality of their input data.
we need more compute in general, but not necessarily in one machine. a single exaflop machine will cost much more than a thousand petaflop machines. letting a thousand flowers bloom is much prettier than one excruciatingly beautiful flower...
and no, hero machines do not provide an efficient way to improve the tech of lesser or later machines. they have to be justified by their own need.
1. Re:so what? by Nite_Hawk · 2013-05-14 07:03 · Score: 3, Insightful
  
  I'm an HPC professional too.
  I don't totally disagree with your premise, but what the heck are you doing talking about genetics and proteomics in reference to giant supercomputers? If you know anything about proteomics codes, you know that the commonly used search engines like sequest and mascot were never designed to run on systems like that. Hell, they barely run on small clusters and yet people are getting enough science done that they just don't care. That doesn't mean that it's hard to find problems that need supercomputers though.
  If you want to talk about the really big systems, you are talking about things like nuclear weapons simulations, astrophysics, molecular dynamics, and quantum mechanics. There are only a handful of guys that will actually make really good use of those systems and scores of folks that would otherwise be perfectly fine running on significantly smaller ones. Having smaller jobs backfill on the big machines when the really hardcore guys are off doing something else isn't such a bad situation though. It lets you get the big science done and still keep the machines being used efficiently in the interim.
  Beyond that, just because some researchers aren't scaling their codes to those levels yet doesn't mean we should give up on big systems. There will always be people pushing the envelop and others playing catch up. Our job is to help the slow guys scale their codes when possible so they can do even better and more intensive science. Yes, not all problems require the big systems, but there are many that do, many that can be made to scale even when they don't appear to at first, and others that can serve as backfill to keep the systems busy. They have their place just as smaller clusters, cloud resources, and big data resources do.
Re:If you ignore the best news in supercomputing . by Anonymous Coward · 2013-05-14 06:07 · Score: 3, Insightful

Even if you ignore all the controversy over D-Wave's system and its nature, and take it all at face value, it is still only applicable to a narrow class of problems. CMOS or not, it amounts to something similar in principle to an ASIC. It is no surprised that a custom built chip can solve a specific class of problems orders of magnitudes faster than a general purpose processor. This used to be slightly more popular for a while in the 80s, where a few custom computers were built that were specifically designed for doing things like orbital calculations. And it pops up every so often, like custom chips for playing chess, and now bit coin mining chips. That is great for a small computer, but when your price gets into the millions or billions of dollars, the people bankrolling it will probably want to build a system that can be used for a wider class of problems even if it means running slower.
Re:It is tough by fisted · 2013-05-14 06:53 · Score: 3, Funny

> Just the yesterday there was another thread where someone was trying to suggest [...] instead of realizing [...].

What?! Someone was wrong on the Internet?

--
CLI paste? paste.pr0.tips!