Inside Intel's Core i7 Processor, Nehalem
MojoKid writes "Intel's next-generation CPU microarchitecture, which was recently given the official processor family name of
'Core i7,' was one of the big topics of discussion at IDF. Intel claims that Nehalem represents its biggest platform architecture change to date. This might be true, but it is not a from-the-ground-up, completely new architecture either. Intel representatives disclosed that Nehalem 'shares a significant portion of the P6 gene pool,' does not include many new instructions, and has approximately the same length pipeline as Penryn. Nehalem is built upon Penryn, but with significant architectural changes (full webcast) to improve performance and power efficiency. Nehalem also
brings Hyper-Threading back to Intel processors, and while Hyper-Threading has been criticized in the past as being energy inefficient, Intel claims their current iteration of Hyper-Threading on Nehalem is much better in that regard."
Update: 8/23 00:35 by SS: Reader Spatial points out Anandtech's analysis of Nehalem.
The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency in the x86 architecture. It's true, some apps will see a 20% or better improvement in performance, but most won't see anything more than a marginal increase.
Still, if one can safely enable hyperthreading without slowing down your system, unlike the last time we went through this, we should consider it a success. Hopefully, Quickpath will provide the needed memory improvements.
only the super high desk tops have Quick Path and Triple channel DDR3 and the bigger joke is the that there will be 2 differnt 1 cpu desktop Socket.
also the mobile will not have Quick Path.
all AMD cpus use hyper transport and all desktops will use the same socket and the upcoming AM3 cpus will work in the older am2+ boards. Also on amd you can use more then 1 chipset will intel it looks like you will be locked in to a intel chipset.
Nehalem is really the realization of what many slashdotters have claimed before - the typical user doesn't need that much more performance. Both datacenters and laptop users ask for the same thing - power efficiency - and Intel delivers. The Atom is another part of the strategy, even though it's current coupled with a very inefficient chipset.
The thing is, today we have the knowledge and complexity to fire up kilowatt systems and more - but they're costly running. Certainly there's the extreme hardcore gamers who won't mind running the hottest, most powerhungry quad crossfire system, but they're few and far between. Laptop users think battery life. Desktop users think electricity costs. The result is Nehalem, which promises to deliver a lot more performance per watt.
If the practise is as good as the theory, AMD is unfortunately in deep shit. They've always been good at delivering ok processors at an ok price, but power efficiency has really only been their strength compared to the Netburst (PIV) processors, not P3 or the Cores. If it amounts to "yeah your processors are cheaper but they cost more to operate" things will fall apart, which is sad since ATI is really doing fine. The 48xx series are kick-ass cards, I just hope they can keep up the competition against Intel...
Live today, because you never know what tomorrow brings
The article seems to be down, here's Anandtech's analysis.
I'm pretty sure the parent post was written by a machine. Turing test: failed.
Don't assume that since Hyper-Threading failed with Netburst that it is forever doomed to fail again. The primary problem with that architecture was that stages along the pipeline didn't support multiple threads. So, any thread context switches forced a flush of Netburst's very, very long pipeline. Intel's next generation of pipelines track multiple threads at all stages and make the prospect of HT much more attractive.
At this point, as long as I can watch HD video without any noticeable slowdowns, I'm good. A GPU or integrated video solution that can do that plus some energy efficient CPU is really all I'm interested now. The software issues with the 4500HD are disappointing, but hopefully it's *just* a software issue this time, and can be fixed soon enough.
Then again, that's just me; I'm not a gamer or video editor.
See here
I know it's a tomshardware article but compared to what people have been posting in silent pc review forums the results are consistent. I do think with a better chipset and laptop style power supply the atom platform can go down to sub 20watts, but for now Intel is not making those boards or even allowing atom platforms to have fancy features like PCI-Express. In fact with the older AMD 690G chipset, some people at silent pc review were able to build sub 30watt systems.
It's really quite amazing how much the hardware has outstripped the ability of software to keep up.
It's not amazing at all. Most desktop applications are single-threaded because you, the operator, are single-threaded. MS Word could enter words on all 100 pages of your book simultaneously, but you aren't able to produce them. An audio player could decode and play 100 songs to you at the same time, but you want to listen to one song at a time...
I can see niche desktop applications where multiple threads are of use. For example, GIMP (or Paint.net or Photoshop) could apply your filter to 100 independent squares of the photo if you have 100 cores. However the gain would be tiny, the extra coding labor would be considerable, and you still need to stitch these squares... all to gain a second or two of a rare filter operation?
The most effective use of multiple cores today is either in servers, or in finite element modeling applications.
You probably also want a user interface that does what you mean, not what you said.
After reading the overview from Anandtech, it has been revealed that Hyper-Threading is far more efficient on Nehalem than any P4 could have hoped to be. It has better cache, better access to memory, and is a much wider core. Hyper-Threading also allows Nehalem to do more with each clock. I highly suggest reading Anandtech's breakdown of Nehalem. It is very comprehensive and does a great job of explaining things in quite a fine grain of detail.
Take a deep breath. It's OK if AMD and intel both have good chips. The question really comes down to the brand of salsa anyways.
meep
It's not amazing at all. Most desktop applications are single-threaded because you, the operator, are single-threaded....
That's a pretty simplistic view. Other than the obvious historical reasons, I believe that most applications are single threaded because the languages and tools for writing non-trivial robust multi-threaded applications is lagging far behind the capability to run them.
Given how closely Apple has worked with Intel before and after the processor switch from PowerPC, I wonder how much more Hyper-Threading aware OS X 10.6 (AKA Snow Leopard) will be? After all, it's supposed to be a "tuning" release focused on full 64 bit performance across the OS, so it wouldn't surprise me to see OS X 10.6 to see much greater speed gains from HT than Vista on Nehalem, especially given Anandtech's description of how Vista screws up Turbo mode on Penryn-based systems. (And of course, MS won't go back and put hyperthreading awareness in XP at all...)
Lawrence Person (lawrencepersonh@gmailh.com (remove all "h"s to mail)
http://www.lawrenceperson.com/
Actually I don't know if they are cutting their own throat or not,but I have noticed I'm building a lot more AMD machines lately. And for the first time since the old K2(IIRC,they were the 400MHz ones) I am actually looking at building an AMD board for myself. The price on AMD dual cores has just gotten so cheap I can cut a good 35% off the cost by going AMD. But for most folks the X2 series has enough power that it is frankly overkill. But as always this is my 02c,YMMV
ACs don't waste your time replying, your posts are never seen by me.
I'm not sure what you mean by geometries. SRAM arrays, flops, random logic, carry-lookahead adders, Wallace-tree multipliers (building blocks of processors) generally look similar across all high-performance ASICs over the past 15 years. Circuit geometries themselves have almost certainly changed completely since P6 days - 45nm is a hell of a lot smaller than 350nm, and the rules governing how close things can be have almost certainly changed.
I think what the article really means is that Nehalem shares a lot of the architectural concepts and style of the P6: similar number of pipe stages, similar number of execution units, similar decode/dispatch/execute/retire width (I think Core 2/Penryn/Nehalem are 4 and P6 was 3), similar microcode, etc. Of course enhancements and improvements have been made in things like the branch predictor, load-store unit, and obviously the interconnect/bus...but if you look at Nehalem closely enough, and indeed if you look at Pentium M, Core 2, Penryn too, you can see the architecture of the P6 as an ancestor.
The problem that you describe can also be applied to having multiple cores. If you read the article you will realize that they have taken MANY steps to prevent this.
:-p, just from what I read and learned in school.
For one they use ddr3 memory. Another thing is that they have much more intelligent pre-fetching mixed with the loop detection thingy. The cache size/design itself allows for many applications to run.
The problem that you describe is a problem with the OS's scheduler. It should understand the architecture that it is running on. It should know about the types of caches the way each processor shares them. etc. Thus, it only makes sense to use hyper-threading if 1. you are simply out of cores (the choice of using ht cores is iffy) 2. a single application has spawned multiple threads. Even then you have to take into account the availability of other cores that share the l2 or l3 cache.
I personally think that intelligent pre-fetching and loop detection thingy is something that needs more tests/statistics thrown at.
Like you say, there are some applications that take advantage of HT let them take advantage of it while writing smarter OSs that understand the problems with doing so.
Maybe they need a feed back mechanism from the processor for the OS to understand what is the best way to schedule tasks.
I dont know much about CPUS
The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency
The entire point of SMT (of which HT is am implementation) is that it helps hide memory latency. If one thread stalls waiting for memory then the other gets to use the CPU. Without SMT, then a cache miss stalls the entire core. With SMT, it stalls one context but the other can keep executing until it gets a cache miss, which hopefully doesn't happen until the other one has resumed.
I am TheRaven on Soylent News
> Desktop users think electricity costs.
Bullshit. The difference between a 130W Nehalem and a 65W Core2 is 65W, which is 11 cents per day (at 7c/kW) or $39/year if you run the computer 24/7. Most people turn the computer off when it's not in use, and 8 hours per day is more likely, or 3 cents per day and maybe $10/year. I'd say the cost is entirely negligible, especially when you compare it to your $80/month Comcast bill.