Intel Flagship Core i7-6950X Broadwell-E To Offer 10-Cores, 20-Threads, 25MB L3 (hothardware.com)

← Back to Stories (view on slashdot.org)

Intel Flagship Core i7-6950X Broadwell-E To Offer 10-Cores, 20-Threads, 25MB L3 (hothardware.com)

Posted by timothy on Sunday November 15, 2015 @04:15AM from the upping-the-count dept.

MojoKid writes: Intel has made a habit of launching enthusiast versions of previous generation processors after it releases a new architecture. As was the case with Intel's Haswell architecture, high-end Broadwell-E variants are expected and it looks like Intel is readying a doozy. Recently revealed details show four new processors under the new HEDT (High-End Desktop) banner for Broadwell, which is one more SKU than Haswell-E brought to the table. The most intriguing of the new chips is the Core i7-6950X, a monster 10-core CPU with Hyper Threading support. That gives the Core i7-6950X 20 threads to play with, along with a whopping 25MB of L3 cache. The caveat is the CPU's clockspeed — it will run at just 3.0GHz (base), so for applications that aren't properly tuned to take full advantage of large core counts and threads, it could potentially trail behind the Core i7-6700K, a quad-core Skylake processor clocked at 3.4GHz (base) to 4GHz (Turbo).

6 of 167 comments (clear)

Min score:

Reason:

Sort:

Wrong specs on Skylake by SeeManRun · 2015-11-15 04:49 · Score: 4, Informative

I have been seeing this a lot lately for some reason. The i7 6700K runs at 4.0 ghz base clock and turbo's up to 4.2. So it will be quite a performance beatdown by Skylake if clockspeed instead of threads is important.
Re:AMD's response? by Nemyst · 2015-11-15 05:37 · Score: 4, Informative

HBM only works for stacking memory (hence why it's called High Bandwidth Memory). You can't stack CPU cores because they output waaaaaay too much heat. You can dissipate heat from memory passively, so stacking them and slapping an active cooler can work. Good luck stacking CPU cores in the same way.
Re:AMD's response? by gman003 · 2015-11-15 06:06 · Score: 5, Informative

AMD has been developing a new microarchitecture, Zen, which will replace the horribly-designed Bulldozer. It's rumored to be made on a 14nm node, and they re-hired the guy who designed the K10 architecture (aka the last good CPUs AMD made), so I expect it to be reasonably competitive with Intel. I really hope it is, at least.
Your terminology is completely out of whack ("stacked single-die CrossFire GPU" is a phrase with more contradictions than whitespace characters), but I'll analyze what you were trying to say instead of what you actually said:
#1: Current chip-stacking tech doesn't allow for all that much bandwidth between chips, especially when going above two layers. CPU cores need a pretty hefty amount of bandwidth to their cache, so that's already problematic. Stacking dies also limits thermal performance - if you stack two dies, you have 2x the heat in 1x the heat-conducting surface area. For low-power stuff, that's fine, but CPU cores get pretty hot. Many high-performance dies are already performance-constrained by how much heat they can conduct to their cooler.
#2. This is a good idea. Or rather, the good idea is "APU on an interposer using HBM for main memory". You'd need bigger CPU caches - HBM is ridiculously high-latency even by VRAM standards, it will really hurt CPU performance otherwise. And it will limit upgradability - no way to just pop another DIMM of DDR3 in there. But the GPU gains should be worth it.
#3. Again, thermals will absolutely prevent you from stacking GPU dies. HBM and stacking doesn't do ANYTHING for the power efficiency of the chips you're stacking, so that's two 100W+ dies on top of each other. Not gonna happen. You could stack them side-by-side on an interposer, but at that point why not just fabricate them as one die?
#4. The cost of an interposer is significantly greater than that of a printed circuit board, and a lot of stuff won't benefit from the greater bandwidth to the CPU - stuff like a USB controller or audio chipset. Stacking the dies is also more expensive than just using a PCB - it's done in phones where space is REALLY constrained, but even the smallest desktops aren't that tight for space yet. So all that's left is putting everything onto one die - which runs into yield problems, because with bigger individual dies, a single defect will wipe out a lot more silicon. AMD actually *is* already doing this with their lowest-end laptop/desktop parts - look at Socket AM1, there's not much on the motherboard besides external connectors and power-delivery circuits. But they're also pretty low-end in performance.
Re:20 cores DOES matter by gman003 · 2015-11-15 06:09 · Score: 3, Informative

You likely have not checked for a while. I saw figures of 120% performance ("each core at 60% performance" as you put it) back under the Pentium 4 HT, 140% under Nehalem/Sandy Bridge, and 150% under Haswell.
Re:20 cores DOES matter by m.dillon · 2015-11-15 06:16 · Score: 4, Informative

Hyperthreading on intel gives about a +30 to +50% performance improvement. So each core winds up being about 1.3 to 1.5 times the performance with two threads verses 1.0 with one. Quite significant. It depends on the type of load, of course.
The main reason for the improvement is of course due to one thread being able to make good use of execution units while the other thread is stalled on something (like memory or TLB, significant integer shifts, or dependent Integer or FPU multiply and divide operations).
-Matt
Re: 20 cores DOES matter by m.dillon · 2015-11-15 07:14 · Score: 4, Informative

Urm. And you've investigated this and found that your drive is pegged because? Of What? Or you haven't investigated this and you have no idea why your drive is pegged. I'll take a guess... you are running out of memory and the disk activity you see is heavy paging.
Let me rephrase... we do bulk builds with pourdriere of 20,000 applications. It takes a bit less than two days. We set the parallelism to roughly 2x the number of cpu threads available. There are usually several hundred processes active in various states at any given moment. The cpu load is pegged. Disk activity is zero for most of the time.
If I do something less strenuous, like a buildworld or buildkernel, almost the same result. Cpu is mostly pegged, disk activity is zero for the roughly 30 minutes the buildworld takes. However, smaller builds such as a buildworld or buildkernel, or a linux kernel build, regardless of the -j concurrency you specify, will certainly have bottlenecks in the build subsystem that have nothing to do with the cpu. A little work on the Makefiles will solve that problem. In our case there are always two or three ridiculously huge source files in the GCC build that the Make has to wait for before it can proceed with the link pass. Similarly with a kernel build there is a make depend step at the beginning which is not parallelized and the final link at the end which cannot be parallelized which actually take most of the time. Compiling the sources in the middle finishes in a flash.
But your problem sounds a bit different... kinda sounds like you are running yourself out of memory. Parallel builds can run machines out of memory if the dev specifies more concurrency than his memory can handle. For example, when building packages there are many C++ source files which #include the kitchen sink and wind up with process run sizes north of 1GB. If someone only has 8GB of ram and tries a -j 8 build under those circumstances, that person will run out of memory and start to page heavily.
So its a good idea to look at the footprint of the individual processes you are trying to parallelize, too.
Memory is cheap these days. Buy more. Even those tiny little BRIX one can get these days can hold 32G of ram. For a decent concurrent build on a decent cpu you want 8GB minimum, 16GB is better, or more.
-Matt