Slashdot Mirror


Leaked Benchmarks Suggest Intel Will Drop Hyperthreading From Core i7 Chips (arstechnica.com)

According to leaked benchmarks found in the SiSoft Sandra database, there is an Intel Core i7-9700K processor that doesn't appear to have hyperthreading available. "This increases the core count from the current six cores in the 8th generation Coffee Lake parts to eight cores, but, even though it's an i7 chip, it doesn't appear to have hyperthreading available," reports Ars Technica. "It's base clock speed is 3.6GHz, peak turbo is 4.9GHz, and it has 12MB cache. The price is expected to be around the same $350 level as the current top-end i7s." From the report: For the chip that will sit above the i7-9700K in the product lineup, Intel is extending the use of its i9 branding, initially reserved for the X-series High-End Desktop Platform. The i9-9900K will be an eight-core, 16-thread processor. This bumps the cache up to 16MB and the peak turbo up to 5GHz -- and the price up to an expected $450. Below the i7s will be i5s with six cores and six threads and below them, i3s with four cores and four threads. Even without hyperthreading, the new i7s should be faster than old i7s. A part with eight cores is going to be faster than the four-core/eight-thread chips of a couple of generations ago and should in general also be faster than the six-core/12-thread 8th generation chips. Peak clock speeds are pushed slightly higher than they were for the 8th generation chips, too.

2 of 199 comments (clear)

  1. Never been a fan of hyperthreading by InvalidsYnc · · Score: 5, Interesting

    I've always seen them as "pseudo" cpu's, and not been all that happy with them overall. Yeah, some workloads benefit from it just fine, but others get tanked, but you'll never know because it just looks like those CPU's are flying along (according to task mangler or whatever).

    Anyway, glad to see that there will be some parts out there that people can choose to buy that don't have it.

    1. Re:Never been a fan of hyperthreading by gman003 · · Score: 5, Interesting

      A lot of them, actually.

      A modern "core" has several "execution units". Unlike very early x86, where it was divided between ALU and FPU, these are divided more finely and evenly - one might do integer math, vector shifts, and branches, while another might do integer math, vector logic, and data stores. There's usually redundancy on common instruction types (eg. Haswell has three that can do address stores, but only one can do divides).

      In a single thread, this is used for superscalar execution. If you have code something like "a = b / c; d = e * f;", both instructions can be run in parallel since neither depends on the other. This also hides the cost of x86's more complicated addressing modes - computing the address gets dispatched to an execution unit just like a normal multiply/add, and the result just gets sent to the store unit.

      But sometimes a thread has lots of dependencies, or does mainly a single type of operation. Maybe it's crunching through a bunch of multiply-adds. Rather than let the remainder of the core sit idle, you can run another thread, or even another process, on it. If this second one mainly hits a different EU - say, it's doing a lot of shifting and bit-twiddling - you can get a 100% speedup.

      You rarely get so much of a boost in practice. A worker-thread type of program, splitting a parallel task across cores, will generally be using the same execution units in each thread. And SMT doesn't help if you're bottlenecked on something besides execution - well-optimized code, as often as not, is limited by memory throughput rather than execution.

      The other boost comes from covering memory latency. If one thread hits a load that isn't in L1 cache, it will stall while the load is served. If it's in L2 cache, that's not too long - a dozen cycles or so. If you're going out to main memory, you're looking at a few hundred, maybe a few thousand cycles of NOPs - so why not switch to another thread, that has all it's data in L1 cache already? Modern x86 processors have pretty low memory latency compared to other architectures, so two threads is generally the most you'd find useful for this, but other systems with harsher memory latency will go even wider - the latter-day SPARCs do eight threads per core, and some parts of a GPU will operate in the hundreds. This is why some non-superscalar architectures will still have multiple threads per core - it's only ever actually running one instruction, but it will rarely be running zero.