Slashdot Mirror


Intel Reveals 10nm Sunny Cove CPU Cores That Go Deeper, Wider, and Faster (pcworld.com)

Long criticized for reusing old cores in its recent CPUs, Intel on Wednesday showed off a new 10nm Sunny Cove core that will bring faster single-threaded and multi-threaded performance along with major speed bumps from new instructions. From a report: Sunny Cove, which many believe will go into Intel's upcoming Ice Lake-U CPUs early next year, will be "deeper, wider, and smarter," said Ronak Singhal, director of Intel's Architecture Cores Group.

Singhal said the three approaches should boost the performance of Sunny Cove CPUs. By doing "deeper," Sunny Cove cores find greater opportunities for parallelism by increasing the cache sizes. "Wider" means the new cores will execute more operations in parallel. Compared to the Skylake architecture (which is also the basis of Kaby Lake and Coffee Lake chips), the chip goes from a 4-wide design to 5-wide. Intel says Sunny Cove also increases performance in specialized tasks by adding new instructions that will improve the speed of cryptography and AI and machine learning.

1 of 90 comments (clear)

  1. Re:Finally by epine · · Score: 4, Informative

    We're both old timers, but apparently I've kept up better than you have.

    First of all, cache (and the rest of the communications fabric) is more than the half the design of a high performance CPU. Long ago now are the days where the core itself was the anchor tenant, and the rest of chip amounted to window dressing. The primacy of the core to the chip (and the ISA to the core) was the central (and false) conceit of the original RISC paradigm. If the window dressing hadn't been more important than they wished to acknowledged, there's a good chance that one of the RISC designs would have succeed in unseating Intel, long ago.

    Intel was almost forced into this by accident. Starved of registers in the ISA, but having a tight read/modify/write instruction format that efficiently allowed the local stack to function as an extended register set (more efficiently than for RISC), Intel was forced to accept that their competitive foundation was memory agility (without taking this view, their ISA was the crippling liability all their RISC competitors so loudly proclaimed).

    When Intel's first OOO chip came out in the mid 1990s with the first Pentium Pro there was the great day of reckoning in the RISC camp. They had all naively assumed that x86 would never achieve those kinds of performance numbers on heavy, server workloads. RISC people read the numbers and muttered under their breath "oh, shit, we're doomed". And they were right.

    RISC still easily won single threaded workloads, and floating point workloads by a factor of 2:1, but on a heavily loaded server, the P6 simply never caved. Small register sets make for faster task switching. Intel had provisioned several layers of cache, with lots of internal concurrency, and an external split-transaction data bus. Departmental file and mail servers all went straight to the P6, while dedicated COTS workstations, especially engineering workstations, went in for Alpha or MIPS (you could obtain Windows NT in a variety of flavours back then). Which market would you rather have? COTS Windows NT workstations were a niche market poaching from Sun's well-defended back yard.

    The press roundly thrashed the P6 because it wasn't very good at running Window 95. Talk about short-term small-minded priorities. Meanwhile, it ran 32-bit protected OSes like a champ. Most important chip in Intel's history, in my opinion, and the one true reason why x86.die.die.die never came to pass as confidently foretold by every enlightened RISC chip-head to ever awaken under a juniper bush after eating way too much majestic, desert-sunset peyote.

    Except for the Pentium IV debacle, every major chip Intel has released since is basically just a P6 fitted out with a king cab and jacked suspension. AMD kindly contributed an expanded ISA with more and wider registers. Intel gradually provided wider decoders, more dispatch paths, more execution units, more in-flight instructions, better branch prediction, larger TLBs, larger caches, better cache prediction, some fancy new SIMD instructions, etc. but it was all just more of the same.

    As the multicore era progressed, an actual new technology was the invisible core added to manage the thermal envelope. This was not something the P6 needed to do. There was no instruction mix that would burn the chip out, if it didn't self limit (though some especially pernicious instruction mixes would separate the men from the boys in your CPU's cooling system.)

    This ushered in a new design regime where peak performance (aka bragging rights) had to compromise with performance/watt. Just because a clever design would make some subsystem faster, didn't mean that design would win (you had to also look at the thermal cost). Gradually, the performance/watt criteria became the senior cook in the kitchen.

    Performance/watt is joined at the hip with your fabrication node. Modern nodes don't offer just a single transistor dimension, but multiple choices of transistor dimension, depending on whether you wish to emphasize speed or thermal efficie