Processors and the Limits of Physics
An anonymous reader writes: As our CPU cores have packed more and more transistors into increasingly tiny spaces, we've run into problems with power, heat, and diminishing returns. Chip manufacturers have been working around these problems, but at some point, we're going to run into hard physical limits that we can't sidestep. Igor Markov from the University of Michigan has published a paper in Nature (abstract) laying out the limits we'll soon have to face. "Markov focuses on two issues he sees as the largest limits: energy and communication. The power consumption issue comes from the fact that the amount of energy used by existing circuit technology does not shrink in a way that's proportional to their shrinking physical dimensions. The primary result of this issue has been that lots of effort has been put into making sure that parts of the chip get shut down when they're not in use. But at the rate this is happening, the majority of a chip will have to be kept inactive at any given time, creating what Markov terms 'dark silicon.' Power use is proportional to the chip's operating voltage, and transistors simply cannot operate below a 200 milli-Volt level. ... The energy use issue is related to communication, in that most of the physical volume of a chip, and most of its energy consumption, is spent getting different areas to communicate with each other or with the rest of the computer. Here, we really are pushing physical limits. Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other."
Same ArsTechnica article link and everything
Clockless logic circuits might be an interesting workaround for the communication problem. The other side of the chip starts working when the data CAN make it over there, for example. I don't claim to know much about CPU design beyond how the work on a basic logical level, but I'd love to hear the opinions of someone here who does regarding CPUs and asynchronous logic.
Stacking dies or some other form of going from flat to vertical will get you around some of the signaling limits. If you look back at old supercomputer designs there were a lot of neat tricks played with the physical architecture to work around performance problems (for example, having a curved backplane lets you have a shorter bus but more space between boards for cooling). Heat is probably the major problem, but we still haven't gone to active cooling for chips yet (e.g. running cooling tubes through the processor rather than trying to take the heat off the top).
"Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other." ... in a single clock.
So in the 1980's I was a CPU designer working on what I call "walk-in, refrigerated, mainframes". It was mostly 100K-family ECL in those days and compatible ECL gate arrays. Guess what -- it took most of a clock to get to a neighboring card, and certainly took a whole clock to get to another cabinet. So in the future it will take more than one clock to get across a chip. I don't see how that is anything other than a job posting for new college graduates.
That one statement in the article reminds of when I first moved to Silicon Valley. Everybody out here was outrageously proud of themselves because they were solving problems that had been solved in mainframes 20 years earlier. As the saying goes: "All the old timers stole all our best ideas years ago."
Each semiconductor node shrink is faster and more power effiecient than the previous. For instance, TSMC 20nm process is 30% higher speed, or 25% less power than 28nm. Likewise, 16nm will provide 60% power saving than 20nm.
Your reasoning is false. Most AI algorithms are having a high level of parallelism which make them less susceptible to the single CPU physical limit. You can achieve incredible performance improvement on GPU and other parallel architectures.
Achille Talon
Hop!
Congratulations, you identified the densest possible circuits we can make. That doesn't even give an upper bound to Moore's Law, let alone an upper bound to performance.
Moore's Law is "the number of transistors in a dense integrated circuit doubles every two years". You can accomplish that by halving the size of the transistors, or by doubling the size of the chip. Some element of the latter is already happening - AMD and Nvidia put out a second generation of chips on the 28nm node, with greatly increased die sizes but similar pricing. The reliability and cost of the process node had improved enough that they could get a 50% improvement over the last gen at a similar price point, despite using essentially the same transistor size.
You could also see more fundamental shifts in technology. RSFQ seems like a very promising avenue. We've seen this sort of thing with the hard drive -> SSD transition for I/O bound problems. If memory-bound problems start becoming a priority (and transistors get cheap enough), we might see a shift back from DRAM to SRAM for main memory.
So yeah, the common restatement of Moore's Law as "computer performance per dollar will double every two years" will probably keep running for a while after we hit the physical bounds on transistor size.
The speed of light is approximately .3 X 10^8 m. Per sec in a vacuum. It's about half as fast in a semiconductor like silicon. So closer to 6 inches. Nearly all chips are less than one inch. Even if this were not the case, that would not be an upper limit, data does not have to reach the end of the chip before the next clock cycle. This is an example of the author having a bit of knowledge ( erroneous, as you point out) and extrapolating an incorrect answer.
Nobody says 5GHz is impossible. Read it.
It says that you can't traverse the entire chip while running at 5GHz. Most operations don't - why? Because the chips are small and any one set of instructions tends to operate in a certain smaller-again area.
What they are saying is that chips will no longer be synchronous - if chips get any bigger, your clock signal takes too long to traverse the entire length of the signal and you end up with different parts of the chips needing different clocks.
It's all linked. The size of the chip can get bigger and still pack in the same density, but then the signals get more out of sync, the voltages have to be higher, the traces have to be straighter, the routing becomes more complicated, and the heat will become higher. Oh, and you'll have to have parts of it "go dark" to avoid overheating neighbours, etc. This is exactly what the guy is saying.
At some point, there's a limit at which it's cheaper and easier to just have a bucket load of synchronous-clock chips tied together loosely than one mega-processor trying to keep everything ticking nicely.
And current overclocking records are only around 8GHz. Nobody says you can't make a processor operating at 10THz if you want. The problem is that it has to be TINY and not do very much. Frequency, remember, is high in anything dealing with radio - your wireless router can do some things at 5GHz and, somewhere inside it, is an oscillator doing just that. But not the SAME kinds of things as we expect modern processors to do.
Taking account that most of those overclocking benchmarks probably operate in small areas of the silicon, are run in mineral oil or similar and are the literal speed of a benchmark over a complicated chip that ALREADY takes account that signals take so long that clocks can get out of sync across the chip, we don't have much leeway at all. We hit a huge wall at 2-3GHz and that's where people are tending to stay despite it being - what, a decade or more? - since the first 3GHz Intel chip. We add more processors and more core and more threading but pretty much we haven't got "faster" over the last decade, we're just able to have more processors at that speed.
No doubt we can push it further, but not forever, and not with the kind of on-chip capabilities you expect now.
With current technology (i.e. no quantum leaps of science making their way into our processors), I doubt you'll ever see a commercially available 10GHz chip that'll run Windows. Super-parallel machines running at a fraction of that but performing more gigaflops per second - yeah - but basic core sustainable frequency? No.
Wrong.
Wrong. Get the voltage too low and they won't be fast, but they won't necessarily stop working.
And of course, the analysis of the communications issue is also wrong.
There are obvious and non-obvious physical limitations that limit scaling, but nobody is being helped by this muddy, error-ridden presentation.
Contribute to civilization: ari.aynrand.org/donate