Slashdot Mirror


Intel Flagship Core i7-6950X Broadwell-E To Offer 10-Cores, 20-Threads, 25MB L3 (hothardware.com)

MojoKid writes: Intel has made a habit of launching enthusiast versions of previous generation processors after it releases a new architecture. As was the case with Intel's Haswell architecture, high-end Broadwell-E variants are expected and it looks like Intel is readying a doozy. Recently revealed details show four new processors under the new HEDT (High-End Desktop) banner for Broadwell, which is one more SKU than Haswell-E brought to the table. The most intriguing of the new chips is the Core i7-6950X, a monster 10-core CPU with Hyper Threading support. That gives the Core i7-6950X 20 threads to play with, along with a whopping 25MB of L3 cache. The caveat is the CPU's clockspeed — it will run at just 3.0GHz (base), so for applications that aren't properly tuned to take full advantage of large core counts and threads, it could potentially trail behind the Core i7-6700K, a quad-core Skylake processor clocked at 3.4GHz (base) to 4GHz (Turbo).

19 of 167 comments (clear)

  1. Wrong specs on Skylake by SeeManRun · · Score: 4, Informative

    I have been seeing this a lot lately for some reason. The i7 6700K runs at 4.0 ghz base clock and turbo's up to 4.2. So it will be quite a performance beatdown by Skylake if clockspeed instead of threads is important.

  2. Re:20 cores DOES matter by fyngyrz · · Score: 4, Interesting

    I was under the (admittedly vague) impression that was true only if the thread was using floating point.

    CPUs that offer more cores and/or threads than they do FPUs is one of the reasons I write a lot of my multi-threaded stuff (image and baseband RF processing) utilizing appropriately scaled integer math.

    I have 8 cores with 8 FPUs on my desk, but many of my users are stuck with some of the wheezier I5 variants.

    --
    I've fallen off your lawn, and I can't get up.
  3. AMD's response? by Khyber · · Score: 3, Interesting

    Assuming Intel doesn't go Xeon-scale in pricing for this CPU (who am I kidding, of course they will) I wonder how AMD plans to respond to this.

    For now, they've got the consoles holding them afloat. And while I am an AMD fan, I see they are rapidly losing out on the desktop space when it comes to performance (despite both companies having rather meager performance gains for the past several years.)

    They'd better figure out what the fuck they're doing, and come up with some competing responses, quickly. Hell, I've got ideas for them, all involving that HBM tech.

    1. Use a modified version of that HBM tech to stack their CPU cores and load it up with tons of cache memory (for their non-APU line.) And don't forget to drop a process node, for fuck's sake.
    2. Use modified HBM tech to create stacked CPU/GPU/RAM/CACHE on the same die (for their APU line.)
    3. Use modified HBM to create stacked single-die CrossFire GPUs that don't consume gobs of power (GPU line.)
    4. Use modified HBM tech to create a true monolithic SOC package that integrates EVERYTHING, thus eliminating the need for motherboards - at that point and time, it just becomes a breakout board with a socket. They could probably do away with the interposer as well if They were clever enough in the design.

    --
    Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    1. Re:AMD's response? by Nemyst · · Score: 4, Informative

      HBM only works for stacking memory (hence why it's called High Bandwidth Memory). You can't stack CPU cores because they output waaaaaay too much heat. You can dissipate heat from memory passively, so stacking them and slapping an active cooler can work. Good luck stacking CPU cores in the same way.

    2. Re:AMD's response? by gman003 · · Score: 5, Informative

      AMD has been developing a new microarchitecture, Zen, which will replace the horribly-designed Bulldozer. It's rumored to be made on a 14nm node, and they re-hired the guy who designed the K10 architecture (aka the last good CPUs AMD made), so I expect it to be reasonably competitive with Intel. I really hope it is, at least.

      Your terminology is completely out of whack ("stacked single-die CrossFire GPU" is a phrase with more contradictions than whitespace characters), but I'll analyze what you were trying to say instead of what you actually said:
      #1: Current chip-stacking tech doesn't allow for all that much bandwidth between chips, especially when going above two layers. CPU cores need a pretty hefty amount of bandwidth to their cache, so that's already problematic. Stacking dies also limits thermal performance - if you stack two dies, you have 2x the heat in 1x the heat-conducting surface area. For low-power stuff, that's fine, but CPU cores get pretty hot. Many high-performance dies are already performance-constrained by how much heat they can conduct to their cooler.
      #2. This is a good idea. Or rather, the good idea is "APU on an interposer using HBM for main memory". You'd need bigger CPU caches - HBM is ridiculously high-latency even by VRAM standards, it will really hurt CPU performance otherwise. And it will limit upgradability - no way to just pop another DIMM of DDR3 in there. But the GPU gains should be worth it.
      #3. Again, thermals will absolutely prevent you from stacking GPU dies. HBM and stacking doesn't do ANYTHING for the power efficiency of the chips you're stacking, so that's two 100W+ dies on top of each other. Not gonna happen. You could stack them side-by-side on an interposer, but at that point why not just fabricate them as one die?
      #4. The cost of an interposer is significantly greater than that of a printed circuit board, and a lot of stuff won't benefit from the greater bandwidth to the CPU - stuff like a USB controller or audio chipset. Stacking the dies is also more expensive than just using a PCB - it's done in phones where space is REALLY constrained, but even the smallest desktops aren't that tight for space yet. So all that's left is putting everything onto one die - which runs into yield problems, because with bigger individual dies, a single defect will wipe out a lot more silicon. AMD actually *is* already doing this with their lowest-end laptop/desktop parts - look at Socket AM1, there's not much on the motherboard besides external connectors and power-delivery circuits. But they're also pretty low-end in performance.

    3. Re:AMD's response? by gman003 · · Score: 3, Insightful

      Being able to describe something in five words does not make it easy.

    4. Re:AMD's response? by Khyber · · Score: 2

      We have microfluidics for stacking dies and removing heat. We do it on p-n junctions on some of the latest LEDs (which are fucking MASSIVE at nearly 7mm x 7mm on just the die alone, not including any mount, circuitry, etc.) to keep them very cool.

      I don't speak of ideas unless I already know we've got the technology to handle it.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
  4. Re:20 cores DOES matter by beelsebob · · Score: 2

    Apparently the last time you checked was in 2004. Hyper threading gets you a lot more than a 10% gain on modern CPUs.

  5. Next step: Send consultants to MySQL. by SuricouRaven · · Score: 2

    "Hi. We're from Intel, and we'd like to take a look at your multithreading, such as it is."

  6. Re:20 cores DOES matter by fyngyrz · · Score: 2

    Yep. Although I have to say, I really enjoy working with the domain 0.0-1.0; there are so many neat tricks that can be pulled. You can do them in integer too, but there is hoop-jumping involved.

    --
    I've fallen off your lawn, and I can't get up.
  7. Re: 20 cores DOES matter by m.dillon · · Score: 4, Interesting

    Actually, parallel builds barely touch the storage subsystem. Everything is basically cached in ram and writes to files wind up being aggregated into relatively small bursts. So the drives are generally almost entirely idle the whole time.

    It's almost a pure-cpu exercise and also does a pretty good job testing concurrency within the kernel due to the fork/exec/run/exit load (particularly for Makefile-based builds which use /bin/sh a lot). I've seen fork/exec rates in excess of 5000 forks/sec during poudriere runs, for example.

    -Matt

  8. Re:20 cores DOES matter by gman003 · · Score: 3, Informative

    You likely have not checked for a while. I saw figures of 120% performance ("each core at 60% performance" as you put it) back under the Pentium 4 HT, 140% under Nehalem/Sandy Bridge, and 150% under Haswell.

  9. Re:Nice, but... by jeffb+(2.718) · · Score: 3, Funny

    If you don't have at least 10 cores, how can you expect to run the ads, tracking software and gratuitous animations required to fully participate in the online society of the late 20-teens?

  10. Re:20 cores DOES matter by m.dillon · · Score: 4, Informative

    Hyperthreading on intel gives about a +30 to +50% performance improvement. So each core winds up being about 1.3 to 1.5 times the performance with two threads verses 1.0 with one. Quite significant. It depends on the type of load, of course.

    The main reason for the improvement is of course due to one thread being able to make good use of execution units while the other thread is stalled on something (like memory or TLB, significant integer shifts, or dependent Integer or FPU multiply and divide operations).

    -Matt

  11. Re: Intel often communicates poorly. by Fwipp · · Score: 3, Interesting

    "Intel has made a habit of launching enthusiast versions of previous generations processors after it releases it a new architecture."

  12. Re: 20 cores DOES matter by m.dillon · · Score: 4, Informative

    Urm. And you've investigated this and found that your drive is pegged because? Of What? Or you haven't investigated this and you have no idea why your drive is pegged. I'll take a guess... you are running out of memory and the disk activity you see is heavy paging.

    Let me rephrase... we do bulk builds with pourdriere of 20,000 applications. It takes a bit less than two days. We set the parallelism to roughly 2x the number of cpu threads available. There are usually several hundred processes active in various states at any given moment. The cpu load is pegged. Disk activity is zero for most of the time.

    If I do something less strenuous, like a buildworld or buildkernel, almost the same result. Cpu is mostly pegged, disk activity is zero for the roughly 30 minutes the buildworld takes. However, smaller builds such as a buildworld or buildkernel, or a linux kernel build, regardless of the -j concurrency you specify, will certainly have bottlenecks in the build subsystem that have nothing to do with the cpu. A little work on the Makefiles will solve that problem. In our case there are always two or three ridiculously huge source files in the GCC build that the Make has to wait for before it can proceed with the link pass. Similarly with a kernel build there is a make depend step at the beginning which is not parallelized and the final link at the end which cannot be parallelized which actually take most of the time. Compiling the sources in the middle finishes in a flash.

    But your problem sounds a bit different... kinda sounds like you are running yourself out of memory. Parallel builds can run machines out of memory if the dev specifies more concurrency than his memory can handle. For example, when building packages there are many C++ source files which #include the kitchen sink and wind up with process run sizes north of 1GB. If someone only has 8GB of ram and tries a -j 8 build under those circumstances, that person will run out of memory and start to page heavily.

    So its a good idea to look at the footprint of the individual processes you are trying to parallelize, too.

    Memory is cheap these days. Buy more. Even those tiny little BRIX one can get these days can hold 32G of ram. For a decent concurrent build on a decent cpu you want 8GB minimum, 16GB is better, or more.

    -Matt

  13. Re:20 cores DOES matter by nadaou · · Score: 3, Interesting

    It depends on the task. For double precision FP calculations using MPI multi-processing (e.g. FORTRAN CFD), the extra overhead of the extra cores talking to each other mostly cancel out the gains.

    For many many small short-lifetime processes you'll probably do better.

    --
    ~.~
    I'm a peripheral visionary.
  14. VMWare whitebox heaven by barc0001 · · Score: 2

    This will be nice to pop into a whitebox VMWare ESXi machine. Definitely cheaper than a 2 x 6 core build.

  15. Re:Software needs to catch up by iamacat · · Score: 2

    Really? Every browser window or tab should hang if Javascript in one of them is slow? Loading and decoding an image for one of the icons on screen should prevent the UI from processing touch events? Many of those problems have been solved by important applications ad-hoc, but sane behaviour by default would be great.