Inside Intel's Core i7 Processor, Nehalem

← Back to Stories (view on slashdot.org)

Inside Intel's Core i7 Processor, Nehalem

Posted by Soulskill on Friday August 22, 2008 @11:57AM from the upgrades dept.

MojoKid writes "Intel's next-generation CPU microarchitecture, which was recently given the official processor family name of 'Core i7,' was one of the big topics of discussion at IDF. Intel claims that Nehalem represents its biggest platform architecture change to date. This might be true, but it is not a from-the-ground-up, completely new architecture either. Intel representatives disclosed that Nehalem 'shares a significant portion of the P6 gene pool,' does not include many new instructions, and has approximately the same length pipeline as Penryn. Nehalem is built upon Penryn, but with significant architectural changes (full webcast) to improve performance and power efficiency. Nehalem also brings Hyper-Threading back to Intel processors, and while Hyper-Threading has been criticized in the past as being energy inefficient, Intel claims their current iteration of Hyper-Threading on Nehalem is much better in that regard." Update: 8/23 00:35 by SS: Reader Spatial points out Anandtech's analysis of Nehalem.

11 of 146 comments (clear)

Min score:

Reason:

Sort:

Slashdotted by Spatial · 2008-08-22 12:19 · Score: 5, Informative

The article seems to be down, here's Anandtech's analysis.
how much is enough? by Tumbleweed · 2008-08-22 12:59 · Score: 4, Informative

At this point, as long as I can watch HD video without any noticeable slowdowns, I'm good. A GPU or integrated video solution that can do that plus some energy efficient CPU is really all I'm interested now. The software issues with the 4500HD are disappointing, but hopefully it's *just* a software issue this time, and can be fixed soon enough.
Then again, that's just me; I'm not a gamer or video editor.
Re:That old question by Spatial · 2008-08-22 13:09 · Score: 2, Informative

What fish-phillandering flounder modded this troll? Grow a sense of humour you silly chit!
Re:Here we go again by JorDan+Clock · 2008-08-22 13:15 · Score: 4, Informative

After reading the overview from Anandtech, it has been revealed that Hyper-Threading is far more efficient on Nehalem than any P4 could have hoped to be. It has better cache, better access to memory, and is a much wider core. Hyper-Threading also allows Nehalem to do more with each clock. I highly suggest reading Anandtech's breakdown of Nehalem. It is very comprehensive and does a great job of explaining things in quite a fine grain of detail.
Re:only the super high desk tops have Quick Path a by RightSaidFred99 · 2008-08-22 14:57 · Score: 2, Informative

Unfortunately, AMD's "advanced technology" in HT doesn't help them win anywhere but in multi-socket servers. Intel's FSB is plenty sufficient for single socket desktops. So..what's your point again?
Re:Here we go again by salimma · 2008-08-22 19:18 · Score: 3, Informative

8 threads per core in Niagara 2; you get up to 64 threads, as the chip is available with 4, 6 or 8 cores.

--
Michel
Fedora Project Contribut
Re:yeah, yeah, yeah.. they said this the last time by TheRaven64 · 2008-08-22 23:23 · Score: 4, Informative

The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency
The entire point of SMT (of which HT is am implementation) is that it helps hide memory latency. If one thread stalls waiting for memory then the other gets to use the CPU. Without SMT, then a cache miss stalls the entire core. With SMT, it stalls one context but the other can keep executing until it gets a cache miss, which hopefully doesn't happen until the other one has resumed.

--
I am TheRaven on Soylent News
Not on the desktop it isn't by Chemisor · 2008-08-23 00:19 · Score: 5, Informative

> Desktop users think electricity costs.
Bullshit. The difference between a 130W Nehalem and a 65W Core2 is 65W, which is 11 cents per day (at 7c/kW) or $39/year if you run the computer 24/7. Most people turn the computer off when it's not in use, and 8 hours per day is more likely, or 3 cents per day and maybe $10/year. I'd say the cost is entirely negligible, especially when you compare it to your $80/month Comcast bill.
Re:Here we go again by amorsen · 2008-08-23 01:58 · Score: 2, Informative

Most applications have inherently parallel workloads that are implemented in sequential code because context switching on x86 is painfully expensive.
Context switching on x86 is dead cheap. It's probably the cheapest of all general purpose architectures available right now. We're talking a few hundred cycles cheap. Only the P4 is a bit behind, and Nehalem makes things faster, to the point where Intel almost catches up with AMD.
Windows manages to make process switches a lot more expensive than necessary, but thread switching isn't bad. With Linux it hardly matters whether you switch processes or threads, they're both fast.

--
Finally! A year of moderation! Ready for 2019?
Re:Here we go again by segedunum · 2008-08-23 02:47 · Score: 2, Informative

Sun pushed hyperthreading to its limits to achieve very impressive energy efficiency for certain niche workloads with its Niagra CPUs and derivatives. (IIRC, up to 128 threads per chip.)
Unfortunately those are very, very, very, very, very niche workloads. Your workloads have to be insanely parallel and each thread very independent of others so that you have little that is blocking. In short, Niagra is just marketing.
Re:Will OS X's Snow Leopard use HT more? by TheRaven64 · 2008-08-23 04:00 · Score: 2, Informative

Actually, scheduling for SMT can be very difficult or very easy, depending on the architecture. Something like the Niagara is easy to schedule for - every context basically gets 1/8th of the CPU, the decoder just issues one instruction from each in turn. In more fine-grained implementations you have one thread running and another thread getting to use the execution units when the first one isn't (e.g. if the first one is issuing a load of floating point operations and the other thread has an integer operation next in line). Scheduling for these is hard because the amount of time a thread has spent running doesn't necessarily correspond to the number of instructions it has been allowed to execute. Worse, threads may actually perform better running as the second context on an SMT core than on the other core, even though they would get more CPU time the other way around, because sharing the L1 cache with the other thread eliminates a lot of time spent waiting for memory and cache coherency locks.

--
I am TheRaven on Soylent News