Has Supercomputing Hit a Brick Wall?
anzha writes "Horst Simon, Deputy Director of Lawrence Berkeley National Laboratory, has stood up at conferences of late and said the unthinkable: supercomputing is hitting a wall and will not build an exaFLOPS HPC system by 2020. This is defined as one that passes linpack with a performance of one exaFLOPS sustained or better. He's even placed money on it. You can read the original presentation here."
"Japan to develop new exaflop computer by 2020" ... why not? And if it's even a few microseconds into 2021 I suppose that supercomputing has failed, will pack up, and go home.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
moore's law only talks about transistor counts. building a supercomputer means getting thousands of CPUs to cooperate which is a much harder challenge.
Anyone (with a large wallet) can stick an exoflop worth of CPUs in a large room. by 2020 you'll be able to do that with a not so large wallet. but that does not result in a useful exoflop computer
Clarke's Three Laws are three "laws" of prediction formulated by the British writer Arthur C. Clarke. They are:
1. When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2. The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
3. Any sufficiently advanced technology is indistinguishable from magic.
Prove anything by multiplying Huge Number times Tiny Number
It's a particular nuisance because the speed of light is pretty strictly enforced...
Even if you went full-on-nuts and replaced fiber interconnects with little tubes full of hard vacuum, to squeak out that slight improvement over the speed of light in glass or air, you'll still see latency that meaningfully hinders the cooperation of multi-GHz CPUs and RAM across systems of any nontrivial size.
For loosely coupled problems, that barely matters; but not all problems are loosely coupled.
And still a little fuzzy headed, but the first thing I though of was arranging the racks for shortest maximim path, instead of one big football field sized room, stacking the datacenter into a cube shape... Then I thoght, "That's probably why Borg ships are Cubes."
Although latency isn't so much of an issue: the #1 systems of the last ~3 years did all have torus networks (all Blue Genes, all Crays, K computer, too). These networks only perform well for next neighbor communication -- which is fine since most codes running on these machines are simulation codes and they only need this type of communication. If you scale up the system, you'll typically also scale the size of the simulation instance (this is known as "weak scaling").
This means that your program can still spend the same time waiting for the network as it could on a smaller machine. The cables do not need to become shorter.
Computer simulation made easy -- LibGeoDecomp
I'm no expert on the refined world of supercomputers; but my money would be on latency. If you are made of money, bandwidth is a problem that you can substantially brute force. Not 100% efficiently; and layout gets to be a real headache; but if the state of the art in serial interconnects isn't good enough, you can bolt a bunch of them together and have a parallel interconnect(it'll be harder to do board layout for, the wiring will suck more, and it'll cost more; but the major sticking point is money).
If you want to cut latency, even the most exotic photonics-on-die-with-hollow-fiber arrangement imaginable still gives you surprisingly short distances before you start losing CPU cycles to waiting for the return photon.
building a supercomputer means getting thousands of CPUs to cooperate which is a much harder challenge.
Looking at his presentation, that seems to be his point. He concludes that power efficiency is going to become the limiting factor driving design decisions, and that since the power cost of increasing FLOPS has been so much lower than the power cost of moving larger quantities of data we're heading into an era where connectivity costs will so dominate the cost of cycles that cycles will be essentially free.
Hes's then basically arguing that it won't be cost-effective to build data transmission architectures that can effectively utilize exaflops, so no one will bother to build an exaflop machine.
He didn't state it, but if the rest of his arguments are correct, perhaps we're going to see the definition of a new metric for HPC, one that somehow captures the ability of a machine to distribute data to its computation nodes.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Back in the early 80's I got the opportunity to hear Grace Hopper speak. One of the stories she used to like to tell at her talks was about the time that she was having trouble visualizing a nanosecond. Eventually she sent a memo to her engineers which said, "Please send up one nanosecond." She waited, curious as to how they would respond. After a couple of days a response came back in the form of a metal rod 11-3/4 inches in length with the note attached, "One Nanosecond", and no other explanation. After puzzling over the metal rod she called down to the engineering department and asked, "I give up, what is it"? "That's the distance light travels in a nanosecond", was the response. Later, she sent another memo to the engineers with the request, "Please send up one picosecond." The engineers immediately responded with a memo instructing her to, "put the nanosecond in a pepper grinder and you can make picoseconds all over your desk."
Grace Hopper's humorous anecdote underlines the serious problems faced by researchers when they push the boundaries. In her case, it was a real concern over how far a bit can travel at the speed of light. I have no idea if that has any bearing on the exascale problem, but it might illustrate the kinds of problems they might be running into.
Proverbs 21:19
I'm an HPC professional, and do not see much value in these "hero" machines. Yes, you can go on all you want about the march of progress and tier-1 and grand challenges, but you're just reiterating an unquestioned manifest destiny-based view of history. Why do we need an Exaflop machine? is it because some particular set of applications need it? where is the threshold for those applications where the compute facility will be fast enough to achieve some breakthrough?
it's hard to find areas that are primarily limited by compute facilities. for instance, genetics/proteomics/metabilomics/whatever are *not* compute-limited, especially at the high end. they're laboratory-limited, the same way weather simulations are good and getting better, but not past the quality of their input data.
we need more compute in general, but not necessarily in one machine. a single exaflop machine will cost much more than a thousand petaflop machines. letting a thousand flowers bloom is much prettier than one excruciatingly beautiful flower...
and no, hero machines do not provide an efficient way to improve the tech of lesser or later machines. they have to be justified by their own need.
Even if you ignore all the controversy over D-Wave's system and its nature, and take it all at face value, it is still only applicable to a narrow class of problems. CMOS or not, it amounts to something similar in principle to an ASIC. It is no surprised that a custom built chip can solve a specific class of problems orders of magnitudes faster than a general purpose processor. This used to be slightly more popular for a while in the 80s, where a few custom computers were built that were specifically designed for doing things like orbital calculations. And it pops up every so often, like custom chips for playing chess, and now bit coin mining chips. That is great for a small computer, but when your price gets into the millions or billions of dollars, the people bankrolling it will probably want to build a system that can be used for a wider class of problems even if it means running slower.
> Just the yesterday there was another thread where someone was trying to suggest [...] instead of realizing [...].
What?! Someone was wrong on the Internet?
CLI paste? paste.pr0.tips!