Ask Slashdot: Why Are There No Huge Leaps Forward In CPU/GPU Power?
dryriver writes: We all know that CPUs and GPUs and other electronic chips get a little faster with each generation produced. But one thing never seems to happen -- a CPU/GPU manufacturer suddenly announcing a next generation chip that is, say, 4-8 times faster than the fastest model they had 2 years ago. There are moderate leaps forward all the time, but seemingly never a HUGE leap forward due to, say, someone clever in R&D discovering a much faster way to process computing instructions. Is this because huge leaps forward in computing power are technically or physically impossible/improbable? Or is nobody in R&D looking for that huge leap forward, and rather focused on delivering a moderate leap forward every 2 years? Maybe striving for that "rare huge leap forward in computing power" is simply too expensive for chip manufacturers? Precisely what is the reason that there is never a next-gen CPU or GPU that is, say, advertised as being 16 times faster than the one that came 2 years before it due to some major breakthrough in chip engineering and manufacturing?
Most likely, there is no major competition in the market, and PC sales on the whole have slowed considerably. A modern 6800K processor is as close as you'll come to a leap forward, but it's $1100 Canadian and requires a similarly expensive motherboard + memory. Same with similar chips.
Meanwhile the cheapest system on the market is as fast as a moderately high-grade enthusiast computer from 2010 and probably has reasonable 3D graphics onboard, with a SSD drive it will feel quite snappy.
So, a) not a lot of market demand for faster systems, b) lots of tablets and game consoles for entertainment out there, c) moderately faster systems exist but cost keeps them low-volume, d) very low-percentage demand for faster computers - definitely less than 1% that will pay a premium for it, e) the majority of gamers are young-ish and they play largely twitch games even on PCs which are more GPU limited than CPU limited.
...Steve
Instruction level parallelism in superscaler core designs have hit a limit. More pipeline stages becomes counter productive when a misprediction requires a flush. Thread level parallelism exploited by multi core designs can only go so far; only certain tasks can exploit massive parallelism(e.g. ray tracing).
Increases in clock speed have hit a wall with current silicon based semiconductors. Exotic semiconductors and incredible cooling systems aren't practical for the mass market.
The sole reason Kaby lakes got hot and clocked in so fast is because of AMD just around the corner and it worked to beat Ryzen. I expect the CPU race to heat back up again as physics has not killed innovation yet.
Proof is GPU's and Phones are still improving at breakneck speed. It is only because of an INtel monopoly that on the desktop it has went to a standstill.
http://saveie6.com/
To elaborate: We can't reliably clock Silicon much faster than we're doing right now.
There are other semiconductors (such as GaAs) which can operate reliably at higher frequencies, but they are absurdly expensive, produce too much heat, consume too much power, and so on -- not to mention the fact our tiny process sizes for silicon don't exactly work for entirely different materials (chemistry bites again).
We're running into a similar wall for die shrinkage, on multiple fronts:
- We're getting into the size territory where bits flip due to quantum tunneling, which tends to hurt reliability. Flash storage has started to reach that territory, if my colleagues working for ${SSD MANUFACTURER} are telling me the truth.
- Yields of working units are going down significantly as the die shrinks, and it's taking a lot longer to figure out how to bring yields back up.
In the end, every material has its limits, and we're starting to run into them with Silicon, and there isn't a material that 'stands out' as worth betting the business on.
-- Sometimes you have to turn the lights off in order to see.
CPU architect here. I'll try to provide some insight.
Performance for CPU/GPU or any computational tool isn't exactly just a number you hit. It's not like bandwidth for storage or communications nor is it like a battery's capacity.
A CPU and to a lesser extent a GPU is able to perform all sorts (all logical) computational functions. Each of these involves different usage patterns of the different computational paths inside a piece of silicon. And thus, speeding up each of these usage patterns requires different structures.
A single piece of code running something complex like launching an app or opening a webpage will generate hundreds of millions of instructions with lots of different patterns. Think about all those API's you call. How much code do you think is similar between them?
And thus the problem of improving "performance". The goalpost is a shifty one. Speed up one code pattern, and you risk your changes hurting another. Or you can spend extra transistors making a specialized accelerator for that code pattern. But then...it'll be idle 95% of the time.
And if you speed up a particular function by 1000x (it's happened), your average speed increase for a typical benchmark or API call will still be 0-1%. Because that function is only a small piece of the larger codebase.
Think about how many non-similar libraries and functions there are in typical software, and think about how there's any way to speed them *all* up. You can make memcpy or memset (malloc uses these) faster by 5x and that'll speed up javascript processing by....0.01% or so.
The reason "performance" doesn't increase as drastically in the computer world is because computing "performance" is very very multifaceted. Much like how "intelligence" can't just be increased by 5x -- someone can get 5x better at specific tasks, like memorizing or image recognition, but that doesn't make them 5x more "intelligent".
Compare this with a simple metric like 0-60 acceleration or network bandwidth.
Architecture-wise, Pascal was mostly an incremental upgrade to Maxwell.
The big difference from Maxwell to Pascal was a process upgrade from 28 nm to 16/14 nm which allowed the clock speed to bump 50% from around 1 GHz to around 1.5 GHz.
Couple that improved memory and a good balance of different types of units for the best performance in typical games of its time.
"We mustn't be caught by surprise by our own advancing technology" -- Aldous Huxley
Moore's law had a great run: ~40 years from early 60s to early 00s.
During that time, every generation boosted density, gate count, clock speed, and value per dollar.
The (exponential!) rule of thumb was 2x more every 18 months.
Everyone knew it had stop sometime: you can't make things smaller than atoms.
What finally did stop it (considerably north of atom-scale) was gate tunnelling current.
In a MOS-FET, the gate is separated from the channel by an insulator (SiO2).
As you scale the transistor down, that insulator gets thinner, along with everything else.
When the insulator thickness is less than the wavelength of an electron, you start to get significant tunnelling current.
This acts like short-circuit from the power to ground.
The technology hit the wall around 2003.
Gate tunnelling current was then over half of total power dissipation.
The power density of the CPU chip was 150 W/cm^2 (like a stove top),
and going further was clearly impractical.
As it happens, the clock speed at that design node was 3 GHz,
and that's pretty much were we are today.
Everything since then has been building bigger, not faster: multi-core, caches, SoC;
plus architecture tweaks and optimizations, like pipelining and super-scalar.
It was a great run while it lasted, but it's over,
and we're not getting another one without a fundamental scientific/technological breakthrough,
on the order of coal, or steel, or quantum mechanics.
Risk averse CEOs who don't want to sink in the R&D to make carbon based chips because there is risk of it not working.
A synthetic diamond transistor was first built and tested over 13 years ago at 81GHz: http://www.geek.com/blurb/81gh...
More recently they developed a 300GHz Graphene transistor, but that was still 7 years ago: https://www.bit-tech.net/news/...
The technology is there and proven, but scaling it up to processor scale would be a massive investment and a big risk.
If you disagree, please post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like
This kind of thing was rather common until about 2000. Each process node was better in every way than the last. Big jumps in performance at each node advance. Power went down too. And, of course it was much cheaper per gate. You could get doubled performance and 1/4 the cost by just porting over the same design, trace for trace, to the next full node. These "die shrinks" were quite common. Through the 90's you got an extra bonus for new designs. That is because the industry was brimming with ideas that were known to work but were just not practical to implement because they took too much silicon area.
First the idea spigot sputtered. The good mainframe ideas had already been implemented. It was longer clear what to do with all those gates. New ideas were tried. Some worked. Some didn't. Also, about this time, complexity started to threaten the ability to make chips that actually worked. Bugs became more common. Design progress slowed.
Then process starting acting up. Power scaling stopped. More transistors were available but if you used them, your chip consumed proportionally more power. Run the transistors faster and you had the same problem, only worse. A hot chip was no longer a marketing problem, it was a chip that would not work. More effort and more complexity were needed to tame power. A simple die shrink wouldn't do that much.
Then process started getting messier. The new nodes were not better in every way. Leakage current went up instead of down. Variability went up. Performance scaling slowed. Getting any improvement at all required more development time and money. Progress always slows when development time and cost rise.
Then 20nm planer came and it was awful. Terrible leakage. Required double patterning. Double patterning means more masks mean more expense up front and during manufacturing. It actually cost more per transistor than 28nm. What was the point, really?
That is pretty much the mess were are in now. Can't significantly increase clock rate. Can't throw gates at the problem and wouldn't really know what to do with the gates if we had them. Finfets temporarily tamed power but are only available in nodes hobbled by the need for multi-patterning.
I would also argue the money isn't there to make the insane investment required to make that "next leap" in chip tech.
If you look at reports on how old the average PC in the field is? You are looking at 4-7 years and the reason why is obvious...software hasn't kept up with hardware. Even gamers these days can have 7+ year old CPUs and play the latest games at 1080P so there really isn't a huge market** just begging for a new CPU as sales from Intel and AMD have shown. Now if some new "killer app" like the next Lotus 123 comes up? This may change but so far I've seen nothing on the horizon that would fill that slot.
**.- Before any of you that do some niche job like wave simulation or 3D rendering scream "But I needs moar power!"? You and your ilk are less than 2% max of the market, just too niche to be a big enough market to support the insane amount of R&D required to make that next leap.
ACs don't waste your time replying, your posts are never seen by me.
My SSD based laptop boots a lot faster than Windows 3.1.
As far as "planned obsolescence", I'm running Windows 10 on a Core 2 Duo 2.66Ghz laptop with 4Gb of RAM - a computer that was first sold in 2009. It runs my Plex Server and my PlexConnect server.
My mom still uses my 2006 era Mac Mini (Core Duo 1.66) with Windows 7, Office, and Chrome. It has 1.5Gb or RAM. When I go home and use it, it's not unusable as long as you don't try to run too many things at once.
My secondary laptop that I keep upstairs is a circa 2009 2Ghz Pentium Dual Core with 4Gb of RAM running Windows 7. In day to day use, the only thing wrong with it is a battery that won't hold a charge.
You can accuse MS of a lot of things, but not optimizing Windows to run well on fairly old hardware isn't one.
For certain operations, AVX made a huge difference. AVX2 made an even huge-r difference. Depending on what you're doing, you can see a 2x to 10x speedup on the outside vs. using a chip without AVX2 with similar performance characteristics.
My Other Computer Is A Data General Nova III.
It happened about ten years ago with the rise of GPUs for general purpose computing. Suddenly we could do a lot of things 10-100 times faster than before. You program GPUs really differently than CPUs, so we had to rewrite a lot of code and design new algorithms. But the benefit was huge.
It may be happening again with specialized chips for deep learning, like Google's TPU. These chips are designed for just one class of applications, but it's a really important class, and they can be 10x faster or more efficient for those applications.
There've been other times when a new generation brought a sudden major improvement in speed, like with vector units or multicore CPUs. But always at the cost of having to rewrite how your code works.
Now if you want new chips that work just like the old ones and run the same programs as before, just 10x faster, sorry. That isn't likely to happen. Huge jumps like that require major changes of approach.
"I'm too busy to research this and form an educated opinion, but I do have time to tell everyone my uninformed opinion."
This is nonesense. Concurrent software programming - modifying shared memory without control is a big no no. Its out of control. Out if software flow control. Means unpredictable results.
I'm not sure what you are rambling on about. Google concurrency synchronization and you will see what you write - careful consideration has been take to prevent exactly what you describe.
You are an idiot or just read an article and think you are a programmer.
And btw, like Linux, you can't magically convert sycronized program glow to something else.
D needs the result of C, C needs the result of B, B needs the result of A. Therefore the execution flow or the software is A>B>C>D.
Period. No exceptions.
I think the real issue is, semiconductors are so competitive, the current shipping product is always very close to the state of the manufacturing and physics arts. Intel, AMD, nVidia, Samsung, Toshiba, Apple, and others spend billions pushing the processes and architectures to the limit in every product so it stays competitive as long as possible.
To get a 4x or 8x improvement in size, power, or speed would imply there's a revolutionary way to do things that we just don't quite know yet. And it better be something which can be quickly turned to production because Moore's Law hasn't stopped yet. If you have a 4x improvement idea but it takes five years to release, it won't get funded. Plain CMOS silicon has too good a chance of catching up.
There's plenty of times people rolled the dice on processor moon shots. I was at HP when Itanium was first developed (~95). We thought we'd have working silicon in a few years (~98 or 99) at the astounding clock rate of 500 MHz (oh, and that was potentially retiring something like 6 to 12 instructions per cycle, I forget the details). This was when a good Pentium processor ran at around 45 MHz. We thought Itanium was going to be so frickin' fast there was no way Intel could compete. Then AMD started a clock rate war, x86 got faster really fast, Itanium took much longer to produce than we anticipated, and the rest was history.
I think the bottom line is, it's really hard to produce a system which really is even 2x faster than the competition. 4x is incredible and 8x probably has never been done.
As an analogy, consider cars and mileage. My car, a diesel Passat (which shortly will not be road legal :() actually exceeds 50 MPG on a good day. What would it take to make a car which gets 100 MPG with a 600 mile range? How about 200 MPG? With no compromises? And a sales price of $28k? It's pretty hard to imagine.