Processors and the Limits of Physics
An anonymous reader writes: As our CPU cores have packed more and more transistors into increasingly tiny spaces, we've run into problems with power, heat, and diminishing returns. Chip manufacturers have been working around these problems, but at some point, we're going to run into hard physical limits that we can't sidestep. Igor Markov from the University of Michigan has published a paper in Nature (abstract) laying out the limits we'll soon have to face. "Markov focuses on two issues he sees as the largest limits: energy and communication. The power consumption issue comes from the fact that the amount of energy used by existing circuit technology does not shrink in a way that's proportional to their shrinking physical dimensions. The primary result of this issue has been that lots of effort has been put into making sure that parts of the chip get shut down when they're not in use. But at the rate this is happening, the majority of a chip will have to be kept inactive at any given time, creating what Markov terms 'dark silicon.' Power use is proportional to the chip's operating voltage, and transistors simply cannot operate below a 200 milli-Volt level. ... The energy use issue is related to communication, in that most of the physical volume of a chip, and most of its energy consumption, is spent getting different areas to communicate with each other or with the rest of the computer. Here, we really are pushing physical limits. Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other."
Same ArsTechnica article link and everything
Well, except that every other technology has hit limits, except computers! They'll just endlessly get better. Forever.
Clockless logic circuits might be an interesting workaround for the communication problem. The other side of the chip starts working when the data CAN make it over there, for example. I don't claim to know much about CPU design beyond how the work on a basic logical level, but I'd love to hear the opinions of someone here who does regarding CPUs and asynchronous logic.
Stacking dies or some other form of going from flat to vertical will get you around some of the signaling limits. If you look back at old supercomputer designs there were a lot of neat tricks played with the physical architecture to work around performance problems (for example, having a curved backplane lets you have a shorter bus but more space between boards for cooling). Heat is probably the major problem, but we still haven't gone to active cooling for chips yet (e.g. running cooling tubes through the processor rather than trying to take the heat off the top).
So why don't we use Alpha radiation particles?
"Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other." ... in a single clock.
So in the 1980's I was a CPU designer working on what I call "walk-in, refrigerated, mainframes". It was mostly 100K-family ECL in those days and compatible ECL gate arrays. Guess what -- it took most of a clock to get to a neighboring card, and certainly took a whole clock to get to another cabinet. So in the future it will take more than one clock to get across a chip. I don't see how that is anything other than a job posting for new college graduates.
That one statement in the article reminds of when I first moved to Silicon Valley. Everybody out here was outrageously proud of themselves because they were solving problems that had been solved in mainframes 20 years earlier. As the saying goes: "All the old timers stole all our best ideas years ago."
Each semiconductor node shrink is faster and more power effiecient than the previous. For instance, TSMC 20nm process is 30% higher speed, or 25% less power than 28nm. Likewise, 16nm will provide 60% power saving than 20nm.
You don't need to constantly shrink everything. My computer is about 2 feet tall and wide. I don't care if it's a couple more inches in any direction. Make a giant processor that weighs 20 pounds.
Yet another reason to find a way around the speed of light.
Actually I've always said (jokingly) that if anyone does find a way to go FTL, it'll be the computer chip manufacturers. In fact Brad Torgersen and I had a story to that effect in Analog magazine a couple of years ago, "Strobe Effect".
-- Alastair
Didn't you get the memo? Hemp seeds are better than graphene. Plus you can get high while growing the seeds.
Congratulations, you identified the densest possible circuits we can make. That doesn't even give an upper bound to Moore's Law, let alone an upper bound to performance.
Moore's Law is "the number of transistors in a dense integrated circuit doubles every two years". You can accomplish that by halving the size of the transistors, or by doubling the size of the chip. Some element of the latter is already happening - AMD and Nvidia put out a second generation of chips on the 28nm node, with greatly increased die sizes but similar pricing. The reliability and cost of the process node had improved enough that they could get a 50% improvement over the last gen at a similar price point, despite using essentially the same transistor size.
You could also see more fundamental shifts in technology. RSFQ seems like a very promising avenue. We've seen this sort of thing with the hard drive -> SSD transition for I/O bound problems. If memory-bound problems start becoming a priority (and transistors get cheap enough), we might see a shift back from DRAM to SRAM for main memory.
So yeah, the common restatement of Moore's Law as "computer performance per dollar will double every two years" will probably keep running for a while after we hit the physical bounds on transistor size.
President Romney agrees with you too.
I see you are from the reality where the Republican Senate repealed the laws of physics. The time-space continuum is altering already.
"First they came for the slanderers and i said nothing."
Right, which is why we live in the leisure society with 10 hour workweeks, everyone has a flying car, a Star Trek replicator and personal warp drive space ships.
You are clueless. You live in a bubble of technology created by people infinitely smarter than you and you are happy with comic-book levels of understanding.
The speed of light is approximately .3 X 10^8 m. Per sec in a vacuum. It's about half as fast in a semiconductor like silicon. So closer to 6 inches. Nearly all chips are less than one inch. Even if this were not the case, that would not be an upper limit, data does not have to reach the end of the chip before the next clock cycle. This is an example of the author having a bit of knowledge ( erroneous, as you point out) and extrapolating an incorrect answer.
Please see propagation delay.
Write failed: Broken pipe
I see increasing emphasis in the future on unconventional architectures to solve certain problems
http://www.research.ibm.com/ar...
http://en.wikipedia.org/wiki/Q...
and a little further into the future, single molecule switches and gates.
http://en.wikipedia.org/wiki/M...
We have a ways to go, but at some point we are going to have to say bye-bye to the conventional transistor.
My rights don't need management.
As Einstein showed, yes things are relative.
"Things," eh? Any particular "things"?
He also showed that one particular thing was absolute, if you recall.
systemd is Roko's Basilisk.
The human brain is a marvel of technology. Brain waves move through it as waves of activity. It only consumes (most) energy where the wave of intensified activity is passing through it. If a 3d circuit could be made to sense when a signal is incoming then it could be more efficient. In this paradigm its no 1's and 0's, but rather circuit on vs circuit off. In addition, if you could turn those on/off cycles into charge pump circuits then you could essentially recycle the a partial of that charge and reuse it in a casade like or layered circuit. I believe Sun Micro was working on one such design, but the cost benifits were not there at the time to make it to production. Things have changed.
remember 512k is good enough for everyone.
we aren't quite there yet...
Right. There's no way you'd run a signal across a one inch chip and expect to get anything useful out the other end.
In days of yore, the signal would be buffered a few times.
These days it would pass through 5 clock domains and power boundaries and so have to be rebuffered, resynchronized, levelshifted and firewalled at each stage. But this is normal and we do it all the time.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
There is also the assumption that the chip structure is 2-D. This is already not totally true, though there are tremendous heat problems as you start stacking layers. This is one of the attractions of "spintronics"...state can be switched with less heat.
I think we've pushed this "anyone can grow up to be president" thing too far.
Maybe Markov should go back to school.. Power use is modeled as voltage squared, not as proportional.
Apologies to Markov if it is just the summary that is wrong.
Power use is proportional to the chip's operating voltage, and transistors simply cannot operate below a 200 milli-Volt level
Wow. To me it is like P~U^2. So proportional, but not linear.
And where would that 200 mV level come from? In my understanding it depends very much on the semiconductor used.
You need single isotope silicon. Silicon-28 seems best. That will reduce the number of defects, thus increasing the chip size you can use, thus eliminating chip-to-chip communication, which is always a bugbear. That gives you effective performance increase.
You need better interconnects. Copper is way down on the list of conducting metals for conductivity. Gold and silver are definitely to be preferred. The quantities are insignificant, so price isn't an issue. Gold is already used to connect the chip to outlying pins, so metal softness isn't an issue either. Silver is trickier, but probably solvable.
People still talk about silicon-on-insulator and stressed silicon as new techniques. After ten bloody years? Get the F on with it! These are the people who are breaking Moore's Law, not physics. Drop 'em in the ocean for a Shark Week special or something. Whatever it takes to get people to do some work!
SoI, since insulators don't conduct heat either, can be made back-to-back, with interconnects running through the insulator. This would give you the ability to shorten distances to compute elements and thus effectively increase density.
More can be done off-cpu. There are plenty of OS functions that can b e shifted to silicon, but where the specialist chips have barely changed in years, if not decades. If you halve the number of transistors required on the CPU for a given task, you have doubled the effective number of transistors from the perspective of the old approach.
Finally, if we dump the cpu-centric view of computers that became obsolete the day the 8087 arrived (if not before), we can restructure the entire PC architecture to something rational. That will redistribute demand for capacity, to the point where we can actually beat Moore's Law on aggregate for maybe another 20 years.
By then, hemp capacitors and remsistors will be more widely available.
(Heat is only a problem for those still running computers above zero Celsius.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
The speed of light in a vacuum is about 3.0 x 10^8 m/sec, not 0.3 x 10^8 m/sec. Still, your 6" per nanosec at half the speed of light in a vacuum is about right.
Heat is only a problem for those still running computers above zero Celsius.
Good luck fitting your frozen computer into a laptop case or something else that can be used while riding public transit. Not everybody is content to just "consume" on a "mobile device" while away from mains power.
Local (narrow) interconnects are several times slower than that, though. You need wide ones across longer distances. And I'm not really sure the whole thing with routing an impulse from place A to place B is that simple anymore.
Ezekiel 23:20
one day, computers will be twice as fast and ten times as big -- vacuum tubes? meet transistors.
computers can't get any more popular because we'll run out of copper. . . zinc. . . nickel -- welcome to silicon. Is there enough sand for you?
everything will stay the way it is now forever. things will never get any faster because these issues that aren't problems today will eventually become completely insurmountable.
relax. take it easy. we don't solve problems in-advance. capitalism is about quickly solving huge problems, while totally ignoring small and medium problems.
wait for it. computers will be different in twenty years. I promise.
Nope. Einstein showed consequences of the speed of light being a constant of nature. He didn't show or even predict that it was one, that was done by Maxwell's equations and various attempts to measure Earth's velocity relative to luminous aether (which turned out to be "zero").
And as it happens, one of those consequences is that timewise and spacewise distance are relative.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
And you think printf() and strtol() are major bottlenecks worth dedicated silicon area why?
Modern CPUs already have many accelerators for high end functions, such as numerical computations, cryptography, and the all important memcpy. (Memory copies are a traditional bottleneck, and general enough that they can be easily offloaded.) They come in two forms—specialized SIMD/vector instruction sets, and dedicated blocks for high-level functions that take multiple microseconds. An example of the former are the SIMD-oriented AVX instructions found on modern x86 chips. As an example of the latter, chips aimed at high end signal processing often have discrete blocks such as FFT accelerators. Others aimed at network tasks (especially DPI) have regular expression engines.
The problem with accelerator blocks is that they do take up area. And if they're powered up, they leak. Leakage current is a significant factor in modern designs. To get faster transistors, you need to drive their threshold voltage down. As you lower the threshold voltage, their leakage current goes up exponentially. So, that circuit better be bringing a lot of bang for the buck if it's going to be sitting there taking up space and leaking.
Another issue with dedicating area to fixed functions is the impact it has on distance between functions on the die. In the Old Days, you could get anywhere on the die in a single clock cycle. With modern designs and modern clock rates, cross-die communication is slow, taking many many cycles. So, when you plop down your custom accelerator, you have to figure out where to put it. Do you put it right in the middle of the rest of the computational units, slowing down the communication between their functions (either lowering clock rate or increasing cycle counts), or do you put it on the other side of the cache, meaning it takes several cycles to send it a request and several cycles to see the result?
This is why many custom accelerator blocks out there today focus on meaty workloads. A large FFT still takes a good bit of time to execute, and there's usually other work the main CPU can do while it executes. Thus, the communication overhead doesn't tank your performance. printf(), on the other hand, generally shows up right in the middle of a bunch of other serial steps. You can't overlap that with anything. Hauling off to a printf() accelerator block generally would make zero sense. If you're really spending that much time in printf(), you're better off rewriting the code to use a less general facility.
A final issue with dedicated hardware is that you can't patch it. Someone finds a bug in your printf() and you're back to using a library version. I could go on, but I think I've made my point.
Program Intellivision!
Wires on silicon aren't a vacuum. The dominant effect is actually RC delay. As you make wires smaller, the resistance goes up (inversely proportional to cross-sectional area). As you make the wires closer together, capacitance goes up (inversely proportional to distance between the conductors). So, as geometries shrink, propagation delays for real signals in real wires on real silicon go up.
I won't even get into buffers which are required to recondition the signal on long routes... (Someone elsewhere on the thread already did.)
Program Intellivision!
Reminds me of Grace Hopper and her nano second samples...
https://www.youtube.com/watch?v=1-vcErOPofQ
It's a 10 minute Letterman interview but well worth the time...
You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
Your reasoning is false. Most AI algorithms are having a high level of parallelism which make them less susceptible to the single CPU physical limit. You can achieve incredible performance improvement on GPU and other parallel architectures.
Good luck finding enough programmers that can write code with that level of parallelism.
Most of the multithreaded code I encounter in the real world simply slaps mutexes around things, whether or not they're needed, or even applied consistently. Most of the time, the mutex could be replaced with something cheaper, like atomic operations, or even unique state-transitions on a single volatile global variable.
Your experience may differ. Maybe I just have the back luck of working with morons most of the time.
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
Intel's iAPX 432 was a 1981 attempt to do what you suggest, the reference language being Ada. It was a resounding flop.
Contribute to civilization: ari.aynrand.org/donate
You are clueless. You live in a bubble of technology created by people infinitely smarter than you and you are happy with comic-book levels of understanding.
So you're saying that Cyril M. Kornbluth was right? Race you to Venus!
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
Wrong.
Wrong. Get the voltage too low and they won't be fast, but they won't necessarily stop working.
And of course, the analysis of the communications issue is also wrong.
There are obvious and non-obvious physical limitations that limit scaling, but nobody is being helped by this muddy, error-ridden presentation.
Contribute to civilization: ari.aynrand.org/donate
" Even if signals in the chip were moving at the speed of light, a chip running above 5GHz wouldn't be able to transmit information from one side of the chip to the other."
Eh?
At 300 Megameters per second, the signal would travel 6cm during one clock cycle. Just how large of a "chip" are we talking about, and how much clock skew can we design into our processor?
I call bullshit on the above statement.
Pipelining increases performance and instructions per cycle, but at the cost of power efficiency as branches cause a pipeline flush.
The problem is balancing area, performance, and performance.
There are obviously limits the the ability to make smaller circuits, even the ones described as 14nm are not really 14 in the same way 160 was 160. There is a lot of wasted space because of the LELE process and the need to minimise crosstalk and distortion.
The real limit however is not how much better X-ray exposure will shrink the size, but how much it costs to make circuits, 28nm is likely to be the most cost efficient size for some time to come. Many fabs are making chips in larger process sizes for fast turnaround and cheap masks.
Every ten years I hear the same thing. "We have reached the linits of processor technology" I remember hearing it in 1994 upon the arrival of the pentium.. that the x86 processor was maxed out.. during 2004, when the next gen x86 chips arrived.. Now it.s 2014.. and it's the same tune again. Suuure. They'll find a breakthrough. Count on it.
But at the rate this is happening, the majority of a chip will have to be kept inactive at any given time, creating what Markov terms 'dark silicon.
When it's believed that computers only use 10% of their silicon, imagine if we could use 100% of our processors' capacity at the same time!
*mind blown*
*processor also blown*
But hey.. We also only use 10% of our carbon...
It is very plain that many parts of the Bible are not meant to be taken literally. The age of the earth being the most obvious.
Also, there's the issue of assuming that there's one instruction per clock. It's common for some instructions to take longer than one cycle, and it's possible to have fuzzy logic, and not even link output to clock, though those usually fail.
Learn to love Alaska
We all know Bees can't fly.
Learn to love Alaska
I don't think you design chips do you?
I should use this sig to advertise my book ISBN-13 : 978-1501515132.