Inside Intel's Core i7 Processor, Nehalem
MojoKid writes "Intel's next-generation CPU microarchitecture, which was recently given the official processor family name of
'Core i7,' was one of the big topics of discussion at IDF. Intel claims that Nehalem represents its biggest platform architecture change to date. This might be true, but it is not a from-the-ground-up, completely new architecture either. Intel representatives disclosed that Nehalem 'shares a significant portion of the P6 gene pool,' does not include many new instructions, and has approximately the same length pipeline as Penryn. Nehalem is built upon Penryn, but with significant architectural changes (full webcast) to improve performance and power efficiency. Nehalem also
brings Hyper-Threading back to Intel processors, and while Hyper-Threading has been criticized in the past as being energy inefficient, Intel claims their current iteration of Hyper-Threading on Nehalem is much better in that regard."
Update: 8/23 00:35 by SS: Reader Spatial points out Anandtech's analysis of Nehalem.
The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency in the x86 architecture. It's true, some apps will see a 20% or better improvement in performance, but most won't see anything more than a marginal increase.
Still, if one can safely enable hyperthreading without slowing down your system, unlike the last time we went through this, we should consider it a success. Hopefully, Quickpath will provide the needed memory improvements.
Meh. I'm still waiting for multicore quantum computing. Or at least something that can execute code that doesn't exist yet, so i can play Duke Nukem Forever. Actually, what I really want is a processor that can execute code by its spirit, rather than its letter, so buggy code will work correctly anyway. :-)
McCain/Palin '08. Now THAT's hope and change!
'nuff said?
intel is inside.
Do you even lift?
These aren't the 'roids you're looking for.
Is it 3.999999999 more accurate?
One line blog. I hear that they're called Twitters now.
only the super high desk tops have Quick Path and Triple channel DDR3 and the bigger joke is the that there will be 2 differnt 1 cpu desktop Socket.
also the mobile will not have Quick Path.
all AMD cpus use hyper transport and all desktops will use the same socket and the upcoming AM3 cpus will work in the older am2+ boards. Also on amd you can use more then 1 chipset will intel it looks like you will be locked in to a intel chipset.
Nehalem is really the realization of what many slashdotters have claimed before - the typical user doesn't need that much more performance. Both datacenters and laptop users ask for the same thing - power efficiency - and Intel delivers. The Atom is another part of the strategy, even though it's current coupled with a very inefficient chipset.
The thing is, today we have the knowledge and complexity to fire up kilowatt systems and more - but they're costly running. Certainly there's the extreme hardcore gamers who won't mind running the hottest, most powerhungry quad crossfire system, but they're few and far between. Laptop users think battery life. Desktop users think electricity costs. The result is Nehalem, which promises to deliver a lot more performance per watt.
If the practise is as good as the theory, AMD is unfortunately in deep shit. They've always been good at delivering ok processors at an ok price, but power efficiency has really only been their strength compared to the Netburst (PIV) processors, not P3 or the Cores. If it amounts to "yeah your processors are cheaper but they cost more to operate" things will fall apart, which is sad since ATI is really doing fine. The 48xx series are kick-ass cards, I just hope they can keep up the competition against Intel...
Live today, because you never know what tomorrow brings
The article seems to be down, here's Anandtech's analysis.
I for one welcome the death of FSB and all that, but yet again it means a new motherboard, a new CPU socket and all that (DDR3 too). Better save up!
AMD is big on cost and with Intel forcing you to use there chip set it will push costs up where as you can get a AMD 790GX / 780G board with side port ram for about $100 and up lower for boards with out it GeForce board with good on board video are the same price add 4gb of ram for under $100 and get a quad core staring at $150 3 core start at about $100 or a dual start at $50 and you can get a nice for a low cost and a board with 64-128 of board video ram will be good for vista and is better then a intel board that uses system ram and has slower on board video.
Hyperthreading. I thought I was getting an ultra-tech processor when I bought my Dell 8400 some years back, with its 3.2 GHz P4 hyperthreaded power-sucking processor. Once all the reviews and independent technical evaluations and benchmarks were in, it was revealed that outside of a few niche application areas, hyperthreading wasn't all that great.
It's a good sign Nehalem is also focusing on lowering power usage, the reason Intel had to finally abandon their Tejas plans (the old 8400 Coppermine P4 was a juice junkie). But why return to a feature like hyperthreading that has been thoroughly debunked? New software being written is still struggling with SMP multiple cores and threads running in parallel. Why gum up the works even more with a questionable feature? It makes very little sense to me.
One justification would be if it had the potential to significantly reduce rendering times in animation and CGI applications. I thought Intel's plans for the mid-term were to go towards many-core processors (many more than 4 or even 8). Maybe hyperthreading is just a way to kick software designers in the arse, because software that can really take advantage of multi-threading is scarce. It's really quite amazing how much the hardware has outstripped the ability of software to keep up.
I'm pretty sure the parent post was written by a machine. Turing test: failed.
At this point, as long as I can watch HD video without any noticeable slowdowns, I'm good. A GPU or integrated video solution that can do that plus some energy efficient CPU is really all I'm interested now. The software issues with the 4500HD are disappointing, but hopefully it's *just* a software issue this time, and can be fixed soon enough.
Then again, that's just me; I'm not a gamer or video editor.
See here
I know it's a tomshardware article but compared to what people have been posting in silent pc review forums the results are consistent. I do think with a better chipset and laptop style power supply the atom platform can go down to sub 20watts, but for now Intel is not making those boards or even allowing atom platforms to have fancy features like PCI-Express. In fact with the older AMD 690G chipset, some people at silent pc review were able to build sub 30watt systems.
"completely new architecture either. Intel representatives disclosed that Nehalem 'shares a significant portion of the P6 gene pool,"
That's like saying equations share a significant portion of numbers gene pool. It's all geometry when you get down to it. I mean really, there are going to be certain circuit geometries that are always good to use and whom you can't totally get away from.
Still, if one can safely enable hyperthreading without slowing down your system, unlike the last time we went through this, we should consider it a success.
Aye, I remember the joys of the first HT tick back when tom's hardware was a less cluster fuck of a webpage. I do remember intel saying that although it wouldn't be found on later chips they did in fact plan on using the technology in one for or another eventually.
On the Oregon Cost born and raised, On the beach is where I spent most of my days
Take a deep breath. It's OK if AMD and intel both have good chips. The question really comes down to the brand of salsa anyways.
meep
>only the super high desk tops have Quick Path and Triple channel DDR3
So, you're saying that Intel is also supplying marijuana with these systems?
Sold!
You won't be locked into an Intel chipset. Obviously NVIDIA will be making chipsets for Nehalem processors. So with Intel processors you will have Intel and NVIDIA chipsets. With AMD processors you will have AMD and NVIDIA chipsets. It won't be much different than it currently is, except most likely VIA will completely drop out of the market in favor of other ventures.
Given how closely Apple has worked with Intel before and after the processor switch from PowerPC, I wonder how much more Hyper-Threading aware OS X 10.6 (AKA Snow Leopard) will be? After all, it's supposed to be a "tuning" release focused on full 64 bit performance across the OS, so it wouldn't surprise me to see OS X 10.6 to see much greater speed gains from HT than Vista on Nehalem, especially given Anandtech's description of how Vista screws up Turbo mode on Penryn-based systems. (And of course, MS won't go back and put hyperthreading awareness in XP at all...)
Lawrence Person (lawrencepersonh@gmailh.com (remove all "h"s to mail)
http://www.lawrenceperson.com/
Actually I don't know if they are cutting their own throat or not,but I have noticed I'm building a lot more AMD machines lately. And for the first time since the old K2(IIRC,they were the 400MHz ones) I am actually looking at building an AMD board for myself. The price on AMD dual cores has just gotten so cheap I can cut a good 35% off the cost by going AMD. But for most folks the X2 series has enough power that it is frankly overkill. But as always this is my 02c,YMMV
ACs don't waste your time replying, your posts are never seen by me.
Isn't that one of the books of Mormon?
I want to delete my account but Slashdot doesn't allow it.
More than any other organization, Intel knows that multithreading is bad. Lots of smart people such as professor Edward Lee (the head of U.C. Berkeley's Parallel Computing Lab) have warned Intel of the disaster down the road. It is time for Intel and everybody else to make a clean break with the old stuff. There is an infinitely better way to design and program parallel computers that does not involve the use of threads at all. Instead of the Penryn, Intel should have picked something similar to the Itanium, which has a superscalar architecture. A sequential (scalar) core has no business doing anything in a parallel multicore processor. Intel will regret this. Sooner or later, a competitor will read the writings on the wall and do things right. Intel and the others will be left holding an empty bag. To find out the right way to design a multicore processor, read Transforming the TILE64 into a Kick-Ass Parallel Machine.
Yes and it was included in Atom (that looks very much like a Pentium 4 with a 45nm process) before it was reintroduced in nehalem.
Obviously? I really doubt nVidia will be able to make chipsets for Intel. And if they can it'll be crap ones, and even worse than typical nForces because they won't have QuickPath.
What's with the Hebrew? Nehalem? Are these the chips Mossad uses to accelerate the backdoor access to the Israeli-coded crypto cyphers? :-)
Unfortunately, AMD's "advanced technology" in HT doesn't help them win anywhere but in multi-socket servers. Intel's FSB is plenty sufficient for single socket desktops. So..what's your point again?
Now that the memory controller will be in the CPU, does that mean they'll enable ECC RAM support for their consumer-level systems, the same way most AMD boards do?
The idea of using 4GB or more with no error correction just doesn't interest me.
The QuickPath sounds so like AMD's HyperTransport. 3 pairs per CPU, integrated controller is exactly what AMD's doing for long long time.
20-bit wide 25.6 GB/s per link? HyperTransport is already capable at deliverying 41.6 GB/s per link in 2006. (according to Wikipedia)
Nvidia won't be competing with the initial X58 chipset, but they do plan to start supporting Nehalem at some point after launch.
I dunno whether this is common knowledge yet (bracing for karma hit if it is) but the big deal with the new processors should not be that they will have completely different sockets. I happen to know someone who knows someone who knows an engineer who's designing a cooling system for a server that uses one of these new CPUs. The huge architecture change is partly a result that the cores in these new procs will self-scale their own clocks and voltages (SpeedStep) to an extent never before seen (thus the need for a more reactive cooling system). They're also almost preposterously power efficient.
At this point CPU's brands don't matter much, because they are as fast as we need them to be. And OS such as Windows is not fully using all the cores of a CPU -- and most games are not design to benefit duel core or quad core processors.
Even veals have more autonomy!
Really? Try a Quad core with some memory intensive apps.
The problem that you describe can also be applied to having multiple cores. If you read the article you will realize that they have taken MANY steps to prevent this.
:-p, just from what I read and learned in school.
For one they use ddr3 memory. Another thing is that they have much more intelligent pre-fetching mixed with the loop detection thingy. The cache size/design itself allows for many applications to run.
The problem that you describe is a problem with the OS's scheduler. It should understand the architecture that it is running on. It should know about the types of caches the way each processor shares them. etc. Thus, it only makes sense to use hyper-threading if 1. you are simply out of cores (the choice of using ht cores is iffy) 2. a single application has spawned multiple threads. Even then you have to take into account the availability of other cores that share the l2 or l3 cache.
I personally think that intelligent pre-fetching and loop detection thingy is something that needs more tests/statistics thrown at.
Like you say, there are some applications that take advantage of HT let them take advantage of it while writing smarter OSs that understand the problems with doing so.
Maybe they need a feed back mechanism from the processor for the OS to understand what is the best way to schedule tasks.
I dont know much about CPUS
erm, loop stream detection?
maybe this iteration iNTEL will burn for their sins.
one core for the mouse, one core for the display, no, make that two. Two cores for the s/ata and four cores for the USB3.
That's the monkey goes, ...
Uh, could you repeat that in English and use more than one period instead of having one big long incomprehensible run on sentence with spelling errors 'cause as much as I try my parser is choking on it.
My favorite would be a robot which will clean up my house. Not just hoover or clean up a floor. Also, clean up higher standing things, recognize what is a useful thing, what is a piece of rubbish and what I should decide if it should be tossed out. That kind of robot would also alert me that something needs to be repaired (like leaking roof), fix simple things (leaking pipes?), and generally take care of my property keeping it well by maintaining and fixing early enough, taking care of all living plants etc. And i would rather talk with this device using a natural language than program it by clicking or writing some kind of bizzare script ;)
That kind of thing certainly needs enormous computational power. You need to recognize objects in images coming from its sensors (be it cameras, laser/infrared sensors etc.), solve a kinematic and dynamic equations of robot arms in realtime, have some advanced AI - both in solving basic problems of geometry and moving objects, and more sophiscated AI, including some non-trivial ontology-like database (so robot won't close a plant in a cabinet letting it die. So, you need to crunch incredible amounts of data and do not consume too much power. I think that current designs still needs some work to keep with such kind of workload.
The problem with hyperthreading is that it fails to deal with the fundamental problem of memory bandwidth and latency
The entire point of SMT (of which HT is am implementation) is that it helps hide memory latency. If one thread stalls waiting for memory then the other gets to use the CPU. Without SMT, then a cache miss stalls the entire core. With SMT, it stalls one context but the other can keep executing until it gets a cache miss, which hopefully doesn't happen until the other one has resumed.
I am TheRaven on Soylent News
> Desktop users think electricity costs.
Bullshit. The difference between a 130W Nehalem and a 65W Core2 is 65W, which is 11 cents per day (at 7c/kW) or $39/year if you run the computer 24/7. Most people turn the computer off when it's not in use, and 8 hours per day is more likely, or 3 cents per day and maybe $10/year. I'd say the cost is entirely negligible, especially when you compare it to your $80/month Comcast bill.
What's with the Hebrew? Nehalem? Are these the chips Mossad uses to accelerate the backdoor access to the Israeli-coded crypto cyphers? :-)
Nehalem is a small town in Oregon, USA.
Of course you do realize that there has been quite a lot of improvements in the front-end, resulting in a drastically improved memory bandwidth? I believe this is part of the justification for bringing back SMT to their CPUs. Also, QuickPath doesn't really directly compare to anything in Core, since they have an off-die common memory controller.
only the super high desk tops have Quick Path and Triple channel DDR3 and the bigger joke is the that there will be 2 differnt 1 cpu desktop Socket.
also the mobile will not have Quick Path.
...but then, they won't have as much to use QuickPath for either.
If you already have 4 to 8 cores, Why on Earth would you need Hyperthreading ?
It's refreshing to see someone who has actually had experience with parallel architectures make a post on this subject. Thank you for refuting the previous AC with actual details concerning what SMT actually does.
It should also be noted that SMT can be used to augment something like a superscalar processor without requiring massive changes to the architecture. Aside from the additional logic required to fetch instructions for the two threads (actual SMT implementations don't use more than two), little extra hardware is required; you don't have to replicate functional units or anything like that. In terms of energy efficiency, this is better than simply replicating an entire processor as in multi-core architectures and, depending on cache miss rates, can yield a similar performance benefit.
"Is not a sentence" is not a sentence. Well damn.
Whops, meant to reply to TheRaven64's comment; the one which states:
The entire point of SMT (of which HT is am implementation) is that it helps hide memory latency. If one thread stalls waiting for memory then the other gets to use the CPU. Without SMT, then a cache miss stalls the entire core.
"Is not a sentence" is not a sentence. Well damn.
Problem being - if most people don't natively benefit from HT then aside from benchmarks or off-the-wall memory intensive apps, HT wouldn't be that impressive.
I've had a core2duo 6600 for over a year now - and from what I've been reading, Nehalem isn't really any large performance boost for the typical user over Penryn. Usually I'll buy new CPU/systems when the performance of mainstream games suffer due to the CPU being outdated; in fact, this e6600 is the first system I've had that I've actually upgraded the video card on without doing a complete swap of mobo/cpu along with it.
Karnal
have all cpus use ht = more chipsets to use with a amd cpu.
You won't be locked into an Intel chipset.
You already are locked into an intel chipset; if you want reliability with an intel CPU, you need an intel chipset. I have always been sadly disappointed with any non-intel chipset. Things vary in AMD-land, but they're not all that much different there; I always shop for a board with an AMD chipset. It will probably not be the fastest, but it will probably work.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
OK. I'll do that. Then you go out and try a quad core showdown between Phenom and C2Q on a smattering of real life applications and tell me who wins. Just kidding - I already know who wins.
It's also worth noting, that the last two graphics cards I had - one on a laptop machine, one on a desktop - Both of them more or less stopped working due to massive driver glitches that ATI just couldn't be bothered to fix.
This despite the desktop chip only being out for a couple of months.
So yeah, guess there's more than performance to consider when grabbing a graphics card..
From TFL you embedded: (wikipedia.org) "Intel has historically named IC development projects after geographical names (since they can never be trademarked by someone else) of towns, rivers or mountains near the location of the Intel facility responsible for the IC. Many of these are in the American West, particularly in the state of Oregon (where most of Intel's CPU projects are designed; see well-known project codenames). As Intel's development activities have expanded, this nomenclature has expanded to Israel and India. Some older codenames refer to celestial bodies."