Slashdot Mirror


Intel Reveals 10nm Sunny Cove CPU Cores That Go Deeper, Wider, and Faster (pcworld.com)

Long criticized for reusing old cores in its recent CPUs, Intel on Wednesday showed off a new 10nm Sunny Cove core that will bring faster single-threaded and multi-threaded performance along with major speed bumps from new instructions. From a report: Sunny Cove, which many believe will go into Intel's upcoming Ice Lake-U CPUs early next year, will be "deeper, wider, and smarter," said Ronak Singhal, director of Intel's Architecture Cores Group.

Singhal said the three approaches should boost the performance of Sunny Cove CPUs. By doing "deeper," Sunny Cove cores find greater opportunities for parallelism by increasing the cache sizes. "Wider" means the new cores will execute more operations in parallel. Compared to the Skylake architecture (which is also the basis of Kaby Lake and Coffee Lake chips), the chip goes from a 4-wide design to 5-wide. Intel says Sunny Cove also increases performance in specialized tasks by adding new instructions that will improve the speed of cryptography and AI and machine learning.

2 of 90 comments (clear)

  1. Intel shills LIE about spectre by Anonymous Coward · · Score: 5, Interesting

    We all know how all Intel CPUs are broken, but the why is very important.

    AMD invented 64-bit for x86 chips AND invented the first true dual core x64 part. At that time AMD had a massive lead over Intel, and it's god-awful, hyper long pipeline Netburts. Tho outlets like Slashdot and Anandtech informed you, at the time, that Netburst- with its race to 10GHz- was the WINNING architecture.

    Then Netburst went bust, and Intel went back to the Pentium 3, updated it with AMD's best ideas (legal due to cross patent agreement) and produced the Core 2 architecture.

    But, here's the thing. Intel made the NSA and performance friendly decision to BREAK multi-threading on the CPU.

    Proper on-chip multi-threading MUST be 'lock and key'. This means each thread has a unique ID, and that ID acts as a 'key' to open the 'lock' of memory resources that thread has the right to access. Intel NEVER implemented 'lock and key' but AMD always did.

    So what did Intel's CHEAT achieve apart from ensuring the NSA always has low level access to your Intel CPU?

    1) massively improved memory latency, for the hardware mechanism that implements the 'lock' has a real impact on access speeds.
    2) massive improvements on power efficiency (the lock and key takes power for each memory access)
    3) much higher clock speeds due to 1 and 2

    In other words, ALL the advantages Intel seemed to have over AMD from the core 2 onwards were down to Intel using an illegal (in CS terms) broken by design CPU architecture.

    Today the ONLY way to fix the Intel issue is to run ONE thread at a time on the CPU, and do a complete state flush between multi-tasking thread exchanges. The performance hit would approach 80-95%, which is why no solution uses this extreme but correct adjustment.

    Next year, AMD's Zen 2 (ryzen 3) utterly wipes out Intel- and Intel will never recover. But Intel sits on a literal mountain of cash, so expect no end of PAID Intel promotion on sites like Slashdot in the continuing future.

  2. Re:Finally by epine · · Score: 4, Informative

    We're both old timers, but apparently I've kept up better than you have.

    First of all, cache (and the rest of the communications fabric) is more than the half the design of a high performance CPU. Long ago now are the days where the core itself was the anchor tenant, and the rest of chip amounted to window dressing. The primacy of the core to the chip (and the ISA to the core) was the central (and false) conceit of the original RISC paradigm. If the window dressing hadn't been more important than they wished to acknowledged, there's a good chance that one of the RISC designs would have succeed in unseating Intel, long ago.

    Intel was almost forced into this by accident. Starved of registers in the ISA, but having a tight read/modify/write instruction format that efficiently allowed the local stack to function as an extended register set (more efficiently than for RISC), Intel was forced to accept that their competitive foundation was memory agility (without taking this view, their ISA was the crippling liability all their RISC competitors so loudly proclaimed).

    When Intel's first OOO chip came out in the mid 1990s with the first Pentium Pro there was the great day of reckoning in the RISC camp. They had all naively assumed that x86 would never achieve those kinds of performance numbers on heavy, server workloads. RISC people read the numbers and muttered under their breath "oh, shit, we're doomed". And they were right.

    RISC still easily won single threaded workloads, and floating point workloads by a factor of 2:1, but on a heavily loaded server, the P6 simply never caved. Small register sets make for faster task switching. Intel had provisioned several layers of cache, with lots of internal concurrency, and an external split-transaction data bus. Departmental file and mail servers all went straight to the P6, while dedicated COTS workstations, especially engineering workstations, went in for Alpha or MIPS (you could obtain Windows NT in a variety of flavours back then). Which market would you rather have? COTS Windows NT workstations were a niche market poaching from Sun's well-defended back yard.

    The press roundly thrashed the P6 because it wasn't very good at running Window 95. Talk about short-term small-minded priorities. Meanwhile, it ran 32-bit protected OSes like a champ. Most important chip in Intel's history, in my opinion, and the one true reason why x86.die.die.die never came to pass as confidently foretold by every enlightened RISC chip-head to ever awaken under a juniper bush after eating way too much majestic, desert-sunset peyote.

    Except for the Pentium IV debacle, every major chip Intel has released since is basically just a P6 fitted out with a king cab and jacked suspension. AMD kindly contributed an expanded ISA with more and wider registers. Intel gradually provided wider decoders, more dispatch paths, more execution units, more in-flight instructions, better branch prediction, larger TLBs, larger caches, better cache prediction, some fancy new SIMD instructions, etc. but it was all just more of the same.

    As the multicore era progressed, an actual new technology was the invisible core added to manage the thermal envelope. This was not something the P6 needed to do. There was no instruction mix that would burn the chip out, if it didn't self limit (though some especially pernicious instruction mixes would separate the men from the boys in your CPU's cooling system.)

    This ushered in a new design regime where peak performance (aka bragging rights) had to compromise with performance/watt. Just because a clever design would make some subsystem faster, didn't mean that design would win (you had to also look at the thermal cost). Gradually, the performance/watt criteria became the senior cook in the kitchen.

    Performance/watt is joined at the hip with your fabrication node. Modern nodes don't offer just a single transistor dimension, but multiple choices of transistor dimension, depending on whether you wish to emphasize speed or thermal efficie