Inside Intel's Next Generation Microarchitecture
Overly Critical Guy writes "Arstechnica has the technical scoop on Intel's next-generation Core chips. As other architectures move away from out-of-order execution, the from-scratch Core fully adopts it, optimizing as much code as possible in silicon, and relies on transistor size decreases--Moore's Law--for scalability."
Do we get two front page articles because the Core Duo has two cores? Goodie!!
It even links to the same article...
Well, at least it's not verbatim...
www.afterthought.cjb.cc
Ok, so I know I'm going to get a lot of AMD people agreeing with me and a lot of Intel people outright ripping me to shreds. But I'm going to speak my thoughts come hell or high water and you can choose to be a yes-man (or woman) with nothing to add to the conversation or just beat me with a stick.
I believe that AMD had this technology [wikipedia.org] before Intel ever started in on it. Yes, I know it wasn't really commercially available on PCs but it was there. And I would also like to point out a nifty little agreement between IBM and AMD [pcworld.com] that certainly gives them aid in the development of chips. Let's face it, IBM's got research money coming out of their ears and I'm glad to see AMD benefit off it and vice versa. I think that these two points alone show that AMD has had more time to refine the multicore technology and deliver a superior product.
As a disclaimer, I cannot say I've had the ability to try an Intel dual core but I'm just ever so happy with my AMD processor that I don't see why I should.
There's a nice little chart in the article but I like AMD's explanation [amd.com] along with their pdf [amd.com] a bit better. As you can see, AMD is no longer too concerned with dual core but has moved on to targeting multi core.
Do I want to see Intel evaporate? No way. I want to see these two companies go head to head and drive prices down. You may mistake me for an AMD fanboi but I simply was in agony in high school when Pentium 100s costed an arm and a leg. Then AMD slowly climbed the ranks to be a major competitor with Intel--and thank god for that! Now Intel actually has to price their chips competitively and I never want that to change. I will now support the underdog even if Intel drops below AMD just to insure stiff competition. You can call me a young idealist about capitalism!
I understand this article also tackles execution types and I must admit I'm not too up to speed on that. It's entirely possible that OOOE could beat out the execution scheme that AMD has going but I wouldn't know enough to comment on it. I remember that there used to be a lot of buzz about IA-64's OOOE [wikipedia.org] processing used on Itanium. But I'm not sure that was too popular among programmers.
The article presents a compelling argument for OOOE. And I think that with a tri-core or higher processor, we could really start to see a big increase in sales using OOOE. Think about it, a lot of IA-64 code comes to a point where the instruction stalls as it waits for data to be computed (most cases, a branch). If there are enough cores to compute both branches from the conditional (and third core to evaluate the conditional) then where is the slowdown? This will only break down on a switch style statement or when several if-thens follow each other successively.
In any case, it's going to be a while before I switch back to Intel. AMD has won me over for the time being.
$30 Off All Plans: Use code TRIPLESAWBUCK
Each core can be in two places at once!
It's like a landmark-- "Surf until you get to a geekish news site with anti-Microsoft bent and a couple dupes on the front page. When you get there, you're on Slashdot."
When politicians are involved, everyone loses.
Seriously this is gonna be so cool, slashdot will never be the same again!
Dupe articles with identical links? Meh. Bring it on. When are we getting dupes with identical summaries?
We can now reveal, for the first time anywhere wihtout a cover charge, the central secret of slashdot: Everything old is new again.
But not after only 13 hours, hosehead. Gotta let the little sister sites have their turn at it, then you can reference them tomorrow. It's all one big, incestuous, irrelevant family.
So apparently Intel had to go to Israel to find computer engineers to design their flagship architecture for the next 5+ years. With a population of only 7 million how is it that so many brilliant chip designers are in Israel?
Bite me twice.
If the editors can post a dupe story, why can't I post a dupe comment?
The mods gotta loosen up a little. Sheesh.
$30 Off All Plans: Use code TRIPLESAWBUCK
Can someone summarize nicely and neatly, the practical difference(s) between out-of-order and in-order executions?
Why is it important that Intel is embracing OOOE and everyone else is moving away.
[Fuck Beta]
o0t!
The real problem with dupes isn't the fact that there are the same two articles on the front page, nor the whines that come from it, or even the whitty banter chidding the mods.
If I see an article I've already read at the top of the page I QUIT READING.
This has happened to me several times over the number of years I've read this site. Then I end up coming back and realizing it was a dupe and that I missed several interesting articles inbetween.
SO FOR THE LOVE OF GOD READ YOUR OWN WEBSITE.
IMAGE VERIFICATION IS EVIL!
Or worse. The paying customers get charged twice.
Wasn't the Achilles heel of the P4 and Itanium crappy code, that caused a pipeline stall on their very long pipes? Every time someone pointed out that AMD didn't have this problem, an Intel fanboy would reply that "with better compilers" you could avoid conditions where you'd have to flush the pipeline, thus maintaining execution speed.
Well, those "better compilers" don't seem to be falling from the sky, and AMD is beating Intel in work/MHz because of it.
Is Intel finally deciding "screw it, we'll make the CPU so smart, that even the crappiest compiled code will run smoothly" ?
I want to delete my account but Slashdot doesn't allow it.
I just want a planet with two cores now.
How many escape pods are there? "NONE,SIR!" You counted them? "TWICE, SIR!"
Does this mean we're not going to be seeing mid-ten-digit clock rates any more? That was one thing that really annoyed me about the P4; a 2 GHz P4 was NOT more than twice as fast as a 850 MHz P3. It meant one couldn't compare CPUs with each other any more.
Hey, this is great promotion! I own some INTC stock, can you post this same article again tomorrow as well, and the next day, and the next...
I just thought it should be stated for the record. Moore's law isn't a definite fact that cannot be disproven. It has been working so well up to now and will for a while yet that it is rather easy to seriously call it a law, but, we shouldn't forget that, in the end, there are physical limitations. I don't know how much longer we have until we reach them though. It could be five years, it could be twenty. It is there though and eventually we will hit that point to where transistors will get no smaller no matter what kind of technology you throw at it. At that point, a new method must be put into place to continue growth. This is why I personally like reading Slashdot so much for articles on things like quantum computing and the like. Those may be pipe dreams perhaps, but, the point is, they are alternate methods that may have hope someday of becoming truly powerful and useful. Perhaps the eventual sucessor to the current system will arise soon? Let's keep an eye out for it with open minds though.
Anyway, I do understand a bit about how it all works. OOOE has amazing potential, but, in the end the fact remains that you can only optomize things so much. The idea there is actually to kind of break up instructions in such a way that you can actually kind of multi-thread a task not originally designed for multi-tasking. A neat idea I must say, with definite potential. However, honestly, in the end the fact remains that you will run into a lot of instructions that it can't figure out how to break up or which actually can't be broken up to begin with. If they continue to run with this technology, they will improve upon both situations, but, in the end, the nature of machine instructions leads me to believe that this idea may not take them far to be brutally honest.
Let's not forget that one of the biggest competitors in the processors that focus on SIMD is kind of fading now. Apple is going to x86 architechure with all their might (and I must say I'm impressed at how smoothly they are switching -- it's actually exciting most Apple fans rather than upsetting them) and I think I read they no longer will even be producing anything with PowerPC style chips, which I suppose isn't good for the people who make them (maybe they wanted to move on to something else annyway?) At this point it's looking like it's more and more just the mobile devices who benefit from this style of chip, which is primarily just due to the fact that between their lack of need for higher speeds and overall design to use what they have efficiently, they use very little power and do what they do well in a segment like that.
Multi-threading, however, is a viable solution today and in the future as well. It just makes sense really. You start to run into the limitations as to how fast the processor is going to run, how many transistors you can squeeze on there at once, power and heat limitations, etc, however, if you stop at those limits and simply add more processors handling things, you don't really have to design the code all THAT well to take advantage of it and keep the growth continuing in it's own way. I can definitely see multicore having a promising future with a lot of potential for growth because even when you hit size limitations for a single core you can still squeeze more in there. Plus, I wonder if multicore couldn't work in a multi-processor setup? If it can't today, won't it in a future? Who knows, there are limits on how far you can go with multi-core, but, those limits are further away than single core by far and I really feel like they are more promising than relying on smart execution on a single core running around the same speed. In the end, a well designed program will be splitting up instructions on a SMP/multicore system much like the OOOE will try to do. While the OOOE may be somewhat better at poorly designed programs (ignoring for a moment the advantages that multithreading provides to a multitasking os since even on a minimal setup a bunch of other stuff is running in the background) overa
Now, frequency isn't everything, but performance scaling is nearly linear if you hold the pipeline depth constant. (And scale the bandwidth, which has also been done..) For more information about Power6, take a look at:
http://www.serverpipeline.com/showArticle.jhtml?ar ticleId=180200700&pgno=1
Alright mod me offtopic, but if
There is truth in humor.
Comment removed based on user account deletion
AMD = Pwned?
No way Jose, SIMD isn't going out of style at all. What do you think the SPE:s of the Cell processor do best? SIMD. What did Intel put a LOT of resources into in its new Core architecture, theoretically doubling the speed of this part? SIMD. What is it that makes it possible for a PII300 to decode DivX, or a P4 3GHz/Athlon64 2GHz able to decode video in HDTV resolutions? SIMD. It wouldn't stand a chance with just regular scalar instructions. MMX/SSE2 are essential.
The initial "Google search" in the above should be Google search
2^3 * 31 * 647
Hey Intel peeps, listen to this idea. (hope this is the right place to
blabber about this)
Ok, I have read from sources that the 2 reasons for making the celeron
and the A series (btw, the 300a, great processor guys) processor is to give
people who want a not-so-costly preformance solution and to compete with the
low end market of AMD. So far it has worked.
For a gamer, the real only logical processor (unless you have gobs of
money to get a Xeon or a P3) is a celeron, its fast, its Intel, hey, it
works! BUT, now with the new PIII, with the KNI/SSE(whatever your calling
it) SIMD/SIMD-fp instructions, which has been said by people from Intel to
improve the speed of 3d and voice recognition software, as much as 30%. That
leaves a great deal of heavy gamers wanting a pIII, but its just too darn
expensive! ($80-$160 vs $550-$750) The solution? Celeron SSE!
Basically the celeron version of the Pentium III, throw it at speeds of
400-500, price it at $180 - $275 and it will sell big. I have a Celeron
processor and I love it, I play games heavily, and reading about all the SSE
and its preformance boost per clock speed makes me want to get a PIII, but
its too expensive to really consider making the means to get that kind of
money. Now, a Celeron SSE = p3 core, 128k on-die full speed cache (or more!)
the same PIII SSE/KNI/whatever, on a slot-1 PCB, and you have a big
contender for the 'lower end SIMD fight' (see K6-III) I have no doubt a
'celeron SSE' would beat the socks off a K6-3. Since the K6-3 has on-die
full speed cache, and 3dnow! (their weaker SIMD) and is priced around $260
for the 450mhz. Clearly the PIII beats it, but its not really what most
gamers will get. The celeron SSE would be a dream come true for many people,
driving many more next-gen intel processors.
Aaron (a gaming enthusiast/hardware junky)
Submision changed n^2 complexities to n complexities.
Its register rename, choocing which instruction goes next etc... increasing n^2 when when core changes.
Emacs is good operating system, but it has one flaw: Its text editor could be better.
Japan's little known microprocessors, win hands down on energy consumption, as some were designed so. VIA added a lot, to come up with something reasonable. However, AMD knows what the market wants, and worked out a decent tradeoff mix whilst keeping x86.
I would like to see the day where microcode can be loaded into processors, like the old days. However trimmig code bloat, to say match Z80, 8Mb desktop solutions would go a long way.
Not OOOE.
Sounds to me like Intel will run generically compiled Windows code better whereas AMD may run specifically-compiled Linux code better. Guess which one will generate more "I am faster" headlines and make more money?
This is by far Intel's most aggressive, and possibly best, design ever. They've taken their cue not to ignore performance per cycle from AMD, while keeping and extending much of what is good in their processor design.
Out Of Order Execution was first adopted by INTEL in 1995 when they released the Pentium Pro, their '6th Generation' architecture. INTEL's been using OOOE in all of their x86 CPUs since then.
well lemme see :3
according to this page,conroe can do 6 fp instructions per cycle,and with simd that gives us 4 numbers per fp instruction,or 24 fp calculations per cycle,at 3 ghz this means 72 gflops per core,and as the conroe has two cores,this mean 144 gflops,or even 288 if you count that quad core stuffs
cell can only do 210 gflops,with no branch prediction,out of order execution or 4 mb L2 cache as Conroe has
poor poor cell,beaten by a x86 before even put into commercial use
The poster obviosly hasn't design any CPU:s. Nor doesn't know about physics related to semiconductor design.
He's programmer who doesn't need to think those things.
n^2 or n^3 algorithms (in terms of power and aread) are used in MOST part of the core. So when the guy recommends that in next generation instead of having 4 cores we have single core he suggested that we have one core which is twice as wide as one of those 4 cores.
Large fraction of code is pointer chasing, large fraction of code has ILP equal or lower than 1. There just are too many data dependensies.
Just like latency of cache it depends on its size, instead of is it L1 or L2 or whatever. Physics says that the drive strength and distance is important. Same happens inside core, when you have quadrupled the core size you need to drive all the instructions and data around, its like prescott. You spend huge resources moving instructions around in pipeline stages instead of doing computation, and you have to do it since the distance you travel between different parts of cores is so much bigger now. The register renamers take a lot more die area, and have longer latency, so does the out of order queus, then there is latencies between aluinstructions, latencies INSIDE the logic selecting which instruction goes next is growing since number of locations for which each instruction in queu has gone up, and number of instructions in queu has gone up too. The bad part is that the logic isn't linear its n^2 algorithm in terms of width, so its width^2*quedepth. So in his recommendation of doubling he get 8 times the area and power consumption there.
Of course trying to educate masses of programmers is futile attempt here, there is plenty of people who know nothing about the costs of doing something proposing solutions that the people who have designed CPU:s for 20+ years have probably already dismissed because of their infeasibility.
Emacs is good operating system, but it has one flaw: Its text editor could be better.
The Merom/Conroe/Woodcrest cores are NOT based on the PM, aka Banias/Dothan/Yonah. Any even cursory look at the architectures, from pipeline depth to functional units will show they are totally different.
Who keeps perpetuating this stupidity, and when have we as a culture lost the ability to look past shiny things shown to us by guys in lab coats? The cores rock, the previos cores rock, they are not the same.
Just because Merom is more like the PM than the P4 means all of squat.
-Charlie
A few factors. First, as one poster pointed out, the Israeli team has lots of experience designing low power processesors. This started when they started designing system-on-a-chip processors (a project which was untimately cancelled), while the guys in the US were concentrating on Netburst. They were able to build up a team with some really good people AND experience, which means a lot for a design team.
Secondly, the Technion. Imagine if, in the US, Intel was located next to MIT, and had a job-sharing program where they hired the best and brighest (while paying for their education). That's what you get at the Israel Development Center in Haifa, at which the Technion is the major supplier of new grads to Intel. These guys are damn smart - and Intel has first dibs at them.
Thirdly, Intel gets massive tax concessions from the Israeli government - so you can put a fab and a development center in the same country barely larger than New Jersey. Makes a massive difference in development time, etc.
Fourth, the Israeli/Jewish culture stresses education and the Israeli guys are damn aggressive. In the Intel culture they've moved up like crazy because Intel has a "constructive confrontation" policy - a fairly aggressive work environment in which they do well, and once they get into c-level positions - can move work to Israel, where they have counterparts and are comfortable.
Lastly, a huge amount of the IDC guys are US-educated and raised. They would be in hot demand anywhere, but they want to live in Israel (for the sake of raising their families there, mostly), and Intel is the biggest game in town. So where do you go when you want to go back to Israel?
If anyone else has worked at Intel or in Haifa, feel free to add or correct. And I agree with another poster - your comment was close to being racist, although I'm sure you didn't mean it that way....