Intel Reveals 10nm Sunny Cove CPU Cores That Go Deeper, Wider, and Faster (pcworld.com)

← Back to Stories (view on slashdot.org)

Intel Reveals 10nm Sunny Cove CPU Cores That Go Deeper, Wider, and Faster (pcworld.com)

Posted by msmash on Wednesday December 12, 2018 @04:00AM from the about-time dept.

Long criticized for reusing old cores in its recent CPUs, Intel on Wednesday showed off a new 10nm Sunny Cove core that will bring faster single-threaded and multi-threaded performance along with major speed bumps from new instructions. From a report: Sunny Cove, which many believe will go into Intel's upcoming Ice Lake-U CPUs early next year, will be "deeper, wider, and smarter," said Ronak Singhal, director of Intel's Architecture Cores Group.

Singhal said the three approaches should boost the performance of Sunny Cove CPUs. By doing "deeper," Sunny Cove cores find greater opportunities for parallelism by increasing the cache sizes. "Wider" means the new cores will execute more operations in parallel. Compared to the Skylake architecture (which is also the basis of Kaby Lake and Coffee Lake chips), the chip goes from a 4-wide design to 5-wide. Intel says Sunny Cove also increases performance in specialized tasks by adding new instructions that will improve the speed of cryptography and AI and machine learning.

8 of 90 comments (clear)

Min score:

Reason:

Sort:

Re:Have they fixed Spectre & Meltdown yet? by SurenEnfiajyan · 2018-12-12 04:10 · Score: 2

Spectre has many variants and it's almost impossible to fix all of them, it's actually the price for the performance caused by caching. Meltdown is a horrible issue and it should be shame for Intel if they not fix it in the upcoming CPU.
what about more pci-e lanes? by Joe_Dragon · 2018-12-12 04:20 · Score: 2

what about more pci-e lanes?
More pepole have seen the Loch Ness monster by JoeyRox · 2018-12-12 04:22 · Score: 3, Insightful

Than have seen a 10nm Intel microprocessor.
Intel shills LIE about spectre by Anonymous Coward · 2018-12-12 04:43 · Score: 5, Interesting

We all know how all Intel CPUs are broken, but the why is very important.
AMD invented 64-bit for x86 chips AND invented the first true dual core x64 part. At that time AMD had a massive lead over Intel, and it's god-awful, hyper long pipeline Netburts. Tho outlets like Slashdot and Anandtech informed you, at the time, that Netburst- with its race to 10GHz- was the WINNING architecture.
Then Netburst went bust, and Intel went back to the Pentium 3, updated it with AMD's best ideas (legal due to cross patent agreement) and produced the Core 2 architecture.
But, here's the thing. Intel made the NSA and performance friendly decision to BREAK multi-threading on the CPU.
Proper on-chip multi-threading MUST be 'lock and key'. This means each thread has a unique ID, and that ID acts as a 'key' to open the 'lock' of memory resources that thread has the right to access. Intel NEVER implemented 'lock and key' but AMD always did.
So what did Intel's CHEAT achieve apart from ensuring the NSA always has low level access to your Intel CPU?
1) massively improved memory latency, for the hardware mechanism that implements the 'lock' has a real impact on access speeds.
2) massive improvements on power efficiency (the lock and key takes power for each memory access)
3) much higher clock speeds due to 1 and 2
In other words, ALL the advantages Intel seemed to have over AMD from the core 2 onwards were down to Intel using an illegal (in CS terms) broken by design CPU architecture.
Today the ONLY way to fix the Intel issue is to run ONE thread at a time on the CPU, and do a complete state flush between multi-tasking thread exchanges. The performance hit would approach 80-95%, which is why no solution uses this extreme but correct adjustment.
Next year, AMD's Zen 2 (ryzen 3) utterly wipes out Intel- and Intel will never recover. But Intel sits on a literal mountain of cash, so expect no end of PAID Intel promotion on sites like Slashdot in the continuing future.
Speed bumps? by chthon · 2018-12-12 04:58 · Score: 3, Interesting

Isn't their purpose to reduce speed?
Re:Finally by Targon · 2018-12-12 05:52 · Score: 2

Fab process improvements will not fix or change the actual CPU design, it's just the implementation. The shift from the old Pentium 3 to the Pentium 4 was a significant change to the actual CPU design. Then, Intel went back to the Pentium 3 as the basis for much of the Core design. Improvements have been made, but Intel hasn't been forced to actually come up with a fully new design in a VERY long time, so all we see have been tweaks. IPC being stagnant for years is how you see that fundamental problem. More L1, L2, and L3 cache will help feed the existing design better, but it's still the same basic design. Make the same design wider, it's still the same design, just improved.
AMD on the other hand, has gone through changes over the generations, from the old Athlon/Athlon64, then the X2, Phenom series. They went with Bulldozer(FX series desktop chips), but the IPC wasn't very good compared to Intel, so high clock speed, but poor efficiency. AMD is on the Zen cores now, which are not based on previous designs, and the performance proves that point. A new fab process will improve the implementation, higher clock speeds, lower voltages, or a combination of the two, but no fab process can save a bad design, and fab process improvements won't get around a stagnant design either.
Intel is trying to claim that their only way to improve the design of the Core series is through fab process improvements? Did all the real innovators at Intel die or retire 5+ years ago, since this is NOT a difficult concept. A better design on an old fab process will still be better than a bad design on that old fab process. Clock speeds might be lower, but the DESIGN is key, along with having enough cache to feed the CPU cores. Yes, cache helps, but honestly, Intel BS still stinks.
Re:Finally by epine · 2018-12-12 09:37 · Score: 4, Informative

We're both old timers, but apparently I've kept up better than you have.
First of all, cache (and the rest of the communications fabric) is more than the half the design of a high performance CPU. Long ago now are the days where the core itself was the anchor tenant, and the rest of chip amounted to window dressing. The primacy of the core to the chip (and the ISA to the core) was the central (and false) conceit of the original RISC paradigm. If the window dressing hadn't been more important than they wished to acknowledged, there's a good chance that one of the RISC designs would have succeed in unseating Intel, long ago.
Intel was almost forced into this by accident. Starved of registers in the ISA, but having a tight read/modify/write instruction format that efficiently allowed the local stack to function as an extended register set (more efficiently than for RISC), Intel was forced to accept that their competitive foundation was memory agility (without taking this view, their ISA was the crippling liability all their RISC competitors so loudly proclaimed).
When Intel's first OOO chip came out in the mid 1990s with the first Pentium Pro there was the great day of reckoning in the RISC camp. They had all naively assumed that x86 would never achieve those kinds of performance numbers on heavy, server workloads. RISC people read the numbers and muttered under their breath "oh, shit, we're doomed". And they were right.
RISC still easily won single threaded workloads, and floating point workloads by a factor of 2:1, but on a heavily loaded server, the P6 simply never caved. Small register sets make for faster task switching. Intel had provisioned several layers of cache, with lots of internal concurrency, and an external split-transaction data bus. Departmental file and mail servers all went straight to the P6, while dedicated COTS workstations, especially engineering workstations, went in for Alpha or MIPS (you could obtain Windows NT in a variety of flavours back then). Which market would you rather have? COTS Windows NT workstations were a niche market poaching from Sun's well-defended back yard.
The press roundly thrashed the P6 because it wasn't very good at running Window 95. Talk about short-term small-minded priorities. Meanwhile, it ran 32-bit protected OSes like a champ. Most important chip in Intel's history, in my opinion, and the one true reason why x86.die.die.die never came to pass as confidently foretold by every enlightened RISC chip-head to ever awaken under a juniper bush after eating way too much majestic, desert-sunset peyote.
Except for the Pentium IV debacle, every major chip Intel has released since is basically just a P6 fitted out with a king cab and jacked suspension. AMD kindly contributed an expanded ISA with more and wider registers. Intel gradually provided wider decoders, more dispatch paths, more execution units, more in-flight instructions, better branch prediction, larger TLBs, larger caches, better cache prediction, some fancy new SIMD instructions, etc. but it was all just more of the same.
As the multicore era progressed, an actual new technology was the invisible core added to manage the thermal envelope. This was not something the P6 needed to do. There was no instruction mix that would burn the chip out, if it didn't self limit (though some especially pernicious instruction mixes would separate the men from the boys in your CPU's cooling system.)
This ushered in a new design regime where peak performance (aka bragging rights) had to compromise with performance/watt. Just because a clever design would make some subsystem faster, didn't mean that design would win (you had to also look at the thermal cost). Gradually, the performance/watt criteria became the senior cook in the kitchen.
Performance/watt is joined at the hip with your fabrication node. Modern nodes don't offer just a single transistor dimension, but multiple choices of transistor dimension, depending on whether you wish to emphasize speed or thermal efficie
Re:Finally by epine · 2018-12-12 09:59 · Score: 2

I should note that the improperly maligned P6 was also trashed by a second camp, the assembly language power optimizers, such as Michael Abrash (though I don't recall his complaints, specifically).
The superscalar Pentium was deterministic. You always got the same clock count from the same initial conditions.
But on the P6, the OOO pipeline has it's own complex internal history, and it inserted random bubbles into the pipeline that no-one ever explained.
The problem with a bubble is that it can knock your instruction decode cadence into a different alignment and that could change dispatch order, and then you'd get weird, fluctuating benchmark scores that would be 2.7 IPC on one pass through the loop, then 2.9 IPC on the next pass through the loop.
People who naturally go into this line of work were almost uniformly more irritated that 2.7 != 2.9 than they were impressed that 2.7 >> 2 (the best IPC the Pentium ever achieved).
Daniel Kahneman could have studied this and included it in Thinking Fast and Slow. It's not just Israeli parole judges suffering from an empty stomach who defy rational comprehension, turns out our own tribe is also far from immune.