Intel's Knights Landing — 72 Cores, 3 Teraflops

← Back to Stories (view on slashdot.org)

Intel's Knights Landing — 72 Cores, 3 Teraflops

Posted by Soulskill on Saturday January 4, 2014 @11:07AM from the go-big-or-go-home dept.

New submitter asliarun writes "David Kanter of Realworldtech recently posted his take on Intel's upcoming Knights Landing chip. The technical specs are massive, showing Intel's new-found focus on throughput processing (and possibly graphics). 72 Silvermont cores with beefy FP and vector units, mesh fabric with tile based architecture, DDR4 support with a 384-bit memory controller, QPI connectivity instead of PCIe, and 16GB on-package eDRAM (yes, 16GB). All this should ensure throughput of 3 teraflop/s double precision. Many of the architectural elements would also be the same as Intel's future CPU chips — so this is also a peek into Intel's vision of the future. Will Intel use this as a platform to compete with nVidia and AMD/ATI on graphics? Or will this be another Larrabee? Or just an exotic HPC product like Knights Corner?"

11 of 208 comments (clear)

Min score:

Reason:

Sort:

Programmability? by gentryx · 2014-01-04 11:20 · Score: 4, Informative

I wonder how nice these will be to program. The "just recompile and run" promise for Knights Corner was little more than a cruel joke: to get any serious performance out of the current generation of MICs you have to wrestle with vector intrinsics and that stupid in-order architecture. At least the latter will apparently be dropped in Knights Landing.
For what it's worth: I'll be looking forward to NVIDIA's Maxwell. At least CUDA got the vectorization problem sorted out. And no: not even the Intel compiler handles vectorization well.

--
Computer simulation made easy -- LibGeoDecomp
Requires parallelism by tepples · 2014-01-04 11:42 · Score: 5, Informative

Multicore implies more speed only if your process is parallelized. Not all interactive processes on a single-user computer can be, wrote Amdahl.
Re:Yay more cores that I won't be using much of! by H0p313ss · 2014-01-04 11:49 · Score: 4, Insightful

Because you can never have too many cores that you aren't using most of the time.
How about more speed? Or is that too hard?
Pretty sure it wasn't meant for you (or me).
However, for servers, including hypervisors, it would be very interesting. There are lots of client/server products that scale better with more cores.

--
XML is a known as a key material required to create SMD: Software of Mass Destruction
Re:No it cannot compete with nVidia and AMD/ATI by rsmith-mac · 2014-01-04 12:04 · Score: 5, Informative

"eDRAM" in this article is almost certainly an error for that reason.
eDRAM isn't very well defined, but it basically boils down to "DRAM manufactured on a modified logic process," allowing it to be placed on-die alongside logic, or at the very least built using the same tools if you're a logic house (Intel, TSMC, etc). This is as opposed to traditional DRAM, which is made on dedicated processes that is optimized for space (capacitors) and follows its own development cadence.
The article notes that this is on-package as opposed to on-die memory, which under most circumstances would mean regular DRAM would work just fine. The biggest example of on-package RAM would be SoCs, where the DRAM is regularly placed in the same package for size/convenience and then wire-bonded to the processor die (although alternative connections do exist). Conversely eDRAM is almost exclusively used on-die with logic - this being its designed use - chiefly as a higher density/lower performance alternative to SRAM. You can do off-die eDRAM, which is what Intel does for Crystalwell, but that's almost entirely down to Intel using spare fab capacity and keeping production in house (they don't make DRAM) as opposed to technical requirements. Which is why you don't see off-die eDRAM regularly used.
Or to put it bluntly, just because DRAM is on-package doesn't mean it's eDRAM. There are further qualifications to making it eDRAM than moving the DRAM die closer to the CPU.
But ultimately as you note cost would be an issue. Even taking into account process advantages between now and the Knight's Landing launch, 16GB of eDRAM would be huge. Mind bogglingly huge. Many thousands of square millimeters huge. Based on space constraints alone it can't be eDRAM; it has to be DRAM to make that aspect work, and even then 16GB of DRAM wouldn't be small.
Embarrassingly parallel by tepples · 2014-01-04 12:20 · Score: 4, Informative

You saw a speed-up because video and 3D are in a class of problems that are very easy to parallelize. So is decompressing all the images in an HTML document. Laying out the document, on the other hand, isn't so easy to parallelize, if only because every floating box theoretically affects all the boxes that follow it.
How does the intercommunication work? by Animats · 2014-01-04 12:33 · Score: 4, Informative

OK, we have yet another mesh of processors, an idea that comes back again and again. The details of how processors communicate really matter. Is this is a totally non-shared-memory machine? Is there some shared memory, but it's slow? If there's shared memory, what are the cache consistency rules?
Historically, meshes of processors without shared memory have been painful to program. There's a long line of machines, from the nCube to the Cell, where the hardware worked but the thing was too much of a pain to program. Most designs have suffered from having too little local memory per CPU. If there's enough memory per CPU to, well, run at least a minimal OS and some jobs, then the mesh can be treated as a cluster of intercommunicating peers. That's something for which useful software exists. If all the CPUs have to be treated as slaves of a control machine, then you need all-new software architectures to handle them. This usually results in one-off software that never becomes mature.
Basic truth: we only have three successful multiprocessor architectures that are general purpose - shared-memory multiprocessors, clusters, and GPUs. Everything other than that has been almost useless except for very specialized problems fitted to the hardware. Yet this problem needs to be cracked - single CPUs are not getting much faster.
1. Re:How does the intercommunication work? by joib · 2014-01-04 19:47 · Score: 4, Informative
  
  The mesh replaces the ring bus used in the current generation MIC as well as mainstream Intel x86 CPU's. Each node in the mesh is 2 CPU cores and L2 cache. The mesh is used for connecting to the DRAM controllers, external interfaces, L3 cache, and of course, for cache coherency. The memory consistency model is the standard x86 one. So from a programmability point of view, it's a multi-core x86 processor, albeit with slow serial performance and beefy vector units.
Re:Yay more cores that I won't be using much of! by morcego · 2014-01-04 12:59 · Score: 5, Funny

Because you can never have too many cores that you aren't using most of the time.
Install McAfee Antivírus, and problem solved: no more unused cores.

--
morcego
Re:Bitcoin/Litecoin Performance by InvalidError · 2014-01-04 15:01 · Score: 4, Interesting

BitCoin has ASIC miners with ~10X the mining power per watt than most programmable alternatives such as GPGPU and FPGA. Anything less efficient than that is or soon will become cost-prohibitive to run.
The newer Bitcoin alternatives use memory-bound algorithms to prevent such a steep mining power escalation since memory capacity and bandwidth scale much more slowly than processing power but much more quickly on costs: with Bitcoin, increasing throughput by 10X simply required 10X the processing power but with the memory-bound alternatives, you also need 10X the RAM and 10X the memory bandwidth.
ipad by goombah99 · 2014-01-04 16:43 · Score: 4, Funny

They tested this for the next ipad. While apple felt the 5 second battery life was too short to be practical, the beta testers were more concerned about the apple shaped 3rd degree burns imprinted on their thighs and palms

--
Some drink at the fountain of knowledge. Others just gargle.
1. Re: ipad by Macthorpe · 2014-01-05 01:56 · Score: 5, Funny
  
  To be fair, Apple are very committed to branding.
  
  --
  "It does not do to leave a live dragon out of your calculations, if you live near him." - Tolkien