Inside Intel's Core i7 Processor, Nehalem
MojoKid writes "Intel's next-generation CPU microarchitecture, which was recently given the official processor family name of
'Core i7,' was one of the big topics of discussion at IDF. Intel claims that Nehalem represents its biggest platform architecture change to date. This might be true, but it is not a from-the-ground-up, completely new architecture either. Intel representatives disclosed that Nehalem 'shares a significant portion of the P6 gene pool,' does not include many new instructions, and has approximately the same length pipeline as Penryn. Nehalem is built upon Penryn, but with significant architectural changes (full webcast) to improve performance and power efficiency. Nehalem also
brings Hyper-Threading back to Intel processors, and while Hyper-Threading has been criticized in the past as being energy inefficient, Intel claims their current iteration of Hyper-Threading on Nehalem is much better in that regard."
Update: 8/23 00:35 by SS: Reader Spatial points out Anandtech's analysis of Nehalem.
The article seems to be down, here's Anandtech's analysis.
At this point, as long as I can watch HD video without any noticeable slowdowns, I'm good. A GPU or integrated video solution that can do that plus some energy efficient CPU is really all I'm interested now. The software issues with the 4500HD are disappointing, but hopefully it's *just* a software issue this time, and can be fixed soon enough.
Then again, that's just me; I'm not a gamer or video editor.
What fish-phillandering flounder modded this troll? Grow a sense of humour you silly chit!
After reading the overview from Anandtech, it has been revealed that Hyper-Threading is far more efficient on Nehalem than any P4 could have hoped to be. It has better cache, better access to memory, and is a much wider core. Hyper-Threading also allows Nehalem to do more with each clock. I highly suggest reading Anandtech's breakdown of Nehalem. It is very comprehensive and does a great job of explaining things in quite a fine grain of detail.
Unfortunately, AMD's "advanced technology" in HT doesn't help them win anywhere but in multi-socket servers. Intel's FSB is plenty sufficient for single socket desktops. So..what's your point again?
At this point CPU's brands don't matter much, because they are as fast as we need them to be. And OS such as Windows is not fully using all the cores of a CPU -- and most games are not design to benefit duel core or quad core processors.
Even veals have more autonomy!
8 threads per core in Niagara 2; you get up to 64 threads, as the chip is available with 4, 6 or 8 cores.
Michel
Fedora Project Contribut
The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency
The entire point of SMT (of which HT is am implementation) is that it helps hide memory latency. If one thread stalls waiting for memory then the other gets to use the CPU. Without SMT, then a cache miss stalls the entire core. With SMT, it stalls one context but the other can keep executing until it gets a cache miss, which hopefully doesn't happen until the other one has resumed.
I am TheRaven on Soylent News
I have been using Nvidia graphics hardware for the pass 2+ years (before that had an ATI 9600 XT - another good value for money card at that time, and more Nvidia cards from the pre-geforce days till then)
Recently I got myself an ATI 4850 card primarily cos of the open sourc'ing of the drivers.
I also got a 4870 card for another friend of mine (Gamer + office related work).
I also run Vista on my system whereas my friend dual boots Vista / XP.
We both have had blue screens due to the driver at least once so far (running 8.8 Catalyst - the latest) and under Vista the system had to recover from grapichs driver issues.
It is nice to have a good piece of hardware which is very good value for money, but current windows drivers have not been very stable so far (both XP / Vista).
As I don't do much graphics work in Linux, I can't comment on that.
> Desktop users think electricity costs.
Bullshit. The difference between a 130W Nehalem and a 65W Core2 is 65W, which is 11 cents per day (at 7c/kW) or $39/year if you run the computer 24/7. Most people turn the computer off when it's not in use, and 8 hours per day is more likely, or 3 cents per day and maybe $10/year. I'd say the cost is entirely negligible, especially when you compare it to your $80/month Comcast bill.
Most applications have inherently parallel workloads that are implemented in sequential code because context switching on x86 is painfully expensive.
Context switching on x86 is dead cheap. It's probably the cheapest of all general purpose architectures available right now. We're talking a few hundred cycles cheap. Only the P4 is a bit behind, and Nehalem makes things faster, to the point where Intel almost catches up with AMD.
Windows manages to make process switches a lot more expensive than necessary, but thread switching isn't bad. With Linux it hardly matters whether you switch processes or threads, they're both fast.
Finally! A year of moderation! Ready for 2019?
Unfortunately those are very, very, very, very, very niche workloads. Your workloads have to be insanely parallel and each thread very independent of others so that you have little that is blocking. In short, Niagra is just marketing.
Actually, scheduling for SMT can be very difficult or very easy, depending on the architecture. Something like the Niagara is easy to schedule for - every context basically gets 1/8th of the CPU, the decoder just issues one instruction from each in turn. In more fine-grained implementations you have one thread running and another thread getting to use the execution units when the first one isn't (e.g. if the first one is issuing a load of floating point operations and the other thread has an integer operation next in line). Scheduling for these is hard because the amount of time a thread has spent running doesn't necessarily correspond to the number of instructions it has been allowed to execute. Worse, threads may actually perform better running as the second context on an SMT core than on the other core, even though they would get more CPU time the other way around, because sharing the L1 cache with the other thread eliminates a lot of time spent waiting for memory and cache coherency locks.
I am TheRaven on Soylent News