Quad Core Battle, Intel Yorkfield vs AMD Altair
Joe writes "Yorkfield Extreme Edition based on the 45nm Penry core architecture will meet
heads-on with AMD Altair based on the 65nm K8L core in Q3 2007 as
reported by VR-Zone. Due to its
advanced 45nm process technology, Yorkfield XE is able to pack a total of 12MB
L2 cache (2 x 6MB L2) and still achieving a much smaller die size and higher
clock speed of 3.43-3.73Ghz. Yorkfield will feature Penryn
New Instructions (PNI) or more officially known as SSE4 with 50 more new
instructions. Yorkfield XE will pair up nicely with the
Bearlake-X chipset supporting DDR3
1333, PCI Express 2.0 and ICH9x coming in the Q3 '07 timeframe as well."
I for one... Will... wait for those 80 core CPU's intel said they will have in a 'few' years. I'll refuse to upgrade till I get one! :D
Mod me down im a newf (wiki)
Ooooh. Blinkenlights on a processor!
This guy's the limit!
I've said it before, I'll say it again: This is exactly why competition rocks. Soon, we'll say Moore was no prophet, he was a pessimist!
Ok, so we have all this neat info about the Intel chip; what about the AMD processor (it gets a whole sentence and a half)? If this is supposed to be a "battle", it seems that most of the comparison has already been done in favor of Intel before the event even takes place, if this article is any reference. :P
I don't reply to Anonymous posts; if you have something to say to me, identify yourself or I won't reply.
12MB cache divided by four processors = 3MB cache.
No, I didn't RTFA.
Processer speed as well as cores are just numbers to me. The only thing high processer speed means to me is that I am able to write unefficient code and get away with it. For(int i = 0; i9000;i++){For(int j = 0; j9000;j++){For(int l = 0; l9000;l++){System.out.println("More Cores")}}}
" I think that freedom is Americas biggest export. Atleast untill China can stamp it out for 20 cents a unit."
I've mocked intel in previous threads for not beating AMD by a much larger margin with core2 than they actually did. This stuff (mentioned in post) is the kind of performance jump I was expecting to see. Bravo! If they get this stuff out the door ontime, Intel just might make it back onto my vendor list.
Ground invasion is where it's at. Space battles can be equalized with sufficient technology taken from captured planets.
Intel is going to need that HUGE cache because of it's limited FSB. It will be interesting to see how they do side by side.
The AMD with it's Hyper-transport could have an advantage over the Intel chip but right now it is all pie in the sky.
I wish that AMD had access to the Intel Fab tech. Just how fast and low power would their chips be if they where 65nm right now like Intel's?
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Please, no more!
It's shared though, not dedicated (IIRC). So it's a 12MB cache, so one proc could in theory use 5 megs of it while another only used 1, could it not?
-Rick
"Most people in the U.S. wouldn't know they live in a tyrannical state if it walked up and grabbed their junk." - MyFirs
I've often wondered, what are these new instruction Intel keep thinking up? Are they some sort of fancy array processing, new addressing modes? I'm curious. Whatever happened to RISC?
Its not exactly 3 MB a core, and you cannot count it in the same method as 12MB for one core either. The reason for this is that future multicore processors will share L2 cache. As the number of cores per cpu increases, it is less likely that a single thread will resume on the same core it was once running on. As a result, instead of refetching data from memory, a shared L2 cache allows recently fetched memory to stay in the cache and be accessed by multiple cpus. The problem that arises is that multiple threads can be using this cache simultaneously and cause thrashing of the data. One way to solve this is to increase the cache size enough that memory thrashing is kept to a minimum. As a result, the Cache hit ratio on a single core cpu with 12MB would be very high, and the quad core version not quite as high, although, probably still substantially higher than giving each core only a 3MB l2 cache.
Penryn new instructions = PNI = SSE4
Prescott new instructions = PNI = SSE3
Therefore SSE3 = SSE4...?
Strikes me that Intel is running out of buzzwords! Was the marketing dept. severely depleted in the last round of purges?
THe next 12 months or so are going to be a very interesting time for the CPU world. All Intel needs to do it get their chips' idling power down into the same ballpark as AMD, and AMD need that 65nm process in volume *now*! I've actually been finding myself forcing myself not to look at computer stores and upgrade my workstation because I know that six months down the line there'll be something orders of magnitudes better on the scene...
Moderation Total: -1 Troll, +3 Goat
That's because Intel is cheating. They don't have a quad-core die, they have two dual core dies shoved onto a multi-chip package. Each die has a shared 6MB cache.
Like many laws, people mention Moore's without actually knowing what it says.
Pining for the fjords
You've got to be kidding - a secure OS?
The only way to have a secure computer is to have it separately firewalled from the net for worms, and to run with a lowest user priviledged account, using non-MS software.
Modern games are another ball of wax, and I've actually gotten to the point of creating a separate OS installed partition for any new games.
The cesspool just got a check and balance.
I mean, frankly... isn't 12MB L2 overkill? We're barely putting today's 2-4MB to good use.
Are you kidding me? With a 4-way superscalar processor running at 3GHz, any cache miss can result in the processor being completely idle for 50-100ns. At an aggressive 50ns memory latency, this is up to 600 wasted opportunities to retire instructions.
I used to upgrade systems about every 3 years when CPU speed typically tripled or more.
So my first system was a 486-25.
Second system was a P-90.
Third was a 300MHz AMD.
Fourth was 1.2 GHz AMD.
Current system is a P4 2.7 GHz and it's at least 3 years old. And I don't feel any urgency to upgrade my basic system, perhaps a video card and some more RAM instead.
I simply don't see that CPU horsepower increasing in the steps like it used to. Yes, I understand multicore, more-cache, hyperthreaded CPUs are going to offer performance not indicated by something as simple as CPU speed, but is it THAT much?
-Styopa
I hope they come up with a new acronym. PNI is already used for Prescott New Instructions.
Support SETI@home
The Altair, AMD's Quad-Core CPU, being named for the first widely available home computer, the Altair 8800, is just too fun.
Let's hope AMD's altair is more useful.
You like your new Mac more than you like me, don't you, Dave? Dave? I asked...She said Yes.
I built myself a new system last year to replace my ancient Micron 486. It was so old that the CPU didn't even use a fan (had plenty of dust in the heatsink though), the VLB graphics board had 2 MB vram, 48 MB RAM, the monitor was a 14" CRT that had ghosting probs, and the hard drive space was less than most high-end MP3 players. Even the mouse and keyboard barely worked anymore. I pretty much milked it for every penny I paid.
I did a 100% rebuild. Now I've got a AMD 64 X2 3800+, Lian-Li case, UPS, 19" LCD, 2 GB RAM, 500 GB total SATA HD space, NVidia 7800GT, etc. I have no idea how many generations I must've jumped. I felt like a hermit walking out of his cave and blinking at the sun. I've even gone from dial-in to high-end DSL. I'll replace the CPU/MB when apps start grinding on it. 5 years prob?
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
This is ameliorated to a significant amount by SMT. If you have a cache miss in one context, you switch the execution units over to the next one and let it run for a bit. Of course, this requires developers to realise that they're not writing code for a PDP-11 anymore...
I am TheRaven on Soylent News
Actually based on current information it looks like Yorkfield is two dual core blocks (with related cache) on a single die NOT an MCM. This is much the same as current AMD packages and the coming K8L.
The K8L looks likely to have 512 MiB of cache (L2) per core with 2 MiB of cache (L3) shared among all four cores while Yorkfield will have two independent 6 MiB cache (L2) blocks shared between two cores and on die glue between the independent dual core blocks and the FSB.
Hey, did you notice? You were drooling. Stop.
True, though they're to be lauded for creating an architecture that can take advantage of a ginormous cache as well as it does. They really did learn to "work smarter, not harder" in their current chip. And of course, in the end, the consumer doesn't care how performance is achieved. :)
In the case of Intel's quad core solution, they seem to be achieving higher overall performance, as expected, but at the expense of pushing their thermal envelope back up to 130W even after the die shrink. AMD, on the other hand, has commited to shipping a 68W quad part and that's including the integrated memory controller.
Still, it's not only a few months of bragging rights that Intel is buying - Woodcrest only supports dual socket configs which means four cores at most unless you fall back to Dempsey, which simply isn't price or performance competitive with the Opteron. This at least scales them to eight core configurations. Of course on the AMD side of the house you can pick up an eight-way board, which gets you to sixteen cores today, and will scale up to a whopping thirty-two cores next year. Between that and Intel's lackluster bus technology, they've got serious problems at the high end of the server market.
Sounds great to me. What rulebook are you quoting that says Intel is "cheating"? Where in the article does it say that Intel is using two dies for the processor and why does it matter? This part uses a 45nm process and is most likely a single die.
Hrm, hadn't caught that change in the winds of rumour. Interesting. This looks a twist in the rumour progression. So far as I can tell it went from Yorkfield being mostly a die shrink four core MCM, to an eight core MCM, to a single die with unified cache, and now to the single die with split cache.
The K8L design has been pretty concrete for a while now. AMD makes a point of talking up the benefits of having seperate cores with regards to contention. Of course nothing is really universal. It seems a win for AMD since inter-core communication happens on-die, and so far Intel has that all happen over the FSB.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
No, we're not. Core2's IPC numbers are nothing as bad as NetBurst. Faster clocks mean faster processors when the core stays the same or improves it's IPC. You are welcome to underclock your processors if you think it will help performace however.
There are half height video cards that provide DVI. Look into the slim chassis HTPCs and you'll find them.
I don't think losing taht segment of the server market is that big a deal. There are very few case where you can't get a better price/performance ratio by scaling out instead of up. The only one I can think of off the top of my head is an OLTP database server. Even with that, some recent HP studies indicate that scaling out may be viable in many cases. They pitted a 32-way cluster of blades vs a 32-way superdome server. For some operations the blades got trounced, but in others they defated the superdome handily.
2 socket servers start around $1k. 4 socket servers start around $7k. 8 socket servers start around $20k.
The biggest diff between the two, in my opinion, is the drastically different methodolgy they took to acheive 4-core status.
:) I'm looking forward the 80 core CPU another slashdotter mentioned.
Intel took two dual cores and packaged them in one unit (but inside that unit they are actually just two separate dual core CPUs) whereas AMD has made an actual quad core single die CPU.
I'm not saying Intel's method is wrong or even disadvantaged, just that it's quite different. Intel will therefore get to market much quicker than AMD, I beleive, but once bother are on the shelves (sans benchmarks, which we don't have yet) my money is on AMD's solution being the performance winner. Still, getting the market first is a huge bonus and will give Intel the breating room to go back and make a true quad core single die CPU. Who knows how this will end? All I know is we win!
Tom Caudron
http://tom.digitalelite.com/
-Tom
AMD Opteron chip sets also have more pci-e lanes then Woodcrest systems have.
aka in most systems each cpu links to the 2 chip halfs.
amd intel
r |aaa| |aaa| r |aaa| |aaa|
a-|cpu|--|cpu|-a |cpu| |cpu|
m |aaa| |aaa| m |aaa| |aaa|
| | | |
|chip| |chip| |chip Set|-ram
|
|chip|
why can't intel give it's on board video some of it's own ram like ATI hypermemory and NVIDIA TurboCache?
It's not so stupid a remark.
L2 is what helps a CPU compensate the disparity of the FSB to the main system memory's speed.
The larger it is, the faster the CPU will run- so long as the data and executables remain there.
If you halve the L2 on the Core CPUs (Matching the AMD's Cache size...), you will see a 20% or
more drop in overall performance.
If you drop about 15-20% of the performance, you see that the Core Duo is actually SLOWER than
the comparable AMD and that the only real edge is the overal TDP which goes to Intel on this
round. The architechture itself isn't as good as AMD's and "wins" speed-wise only because
Intel can jam double or triple the L2 on die because of process shrink. If you run an app that
forces L2 thrash (which is a hell of a lot of them, actually...) you'll see the Cores running
roughly neck and neck with the AMD chips in the same class- and only there because of the larger
L2.
You pop off terms, but do you HONESTLY know what they all mean? From your comment, I'd say not-
I could be wrong, but it strongly looks like you don't get it. I do- it's sort of what I studied
in my Master's studies when I was working on my MSCS years ago.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Ultrasparc T1 (Niagara) may be the last chance for server RISC CPUs.
http://revj.sourceforge.net
It is called a design decision... Intel's Core 2 Duo family is designed with a large L2 cache coupled with a higher latency memory access, just as AMD is design with smaller L2 cache a lower latency memory access.
Anyway making statement about well if you cut this, change or run this specific task that then you see this makes little sense when trying to truthfully compare real devices. Compare the devices as they are running the work load YOU need to run and see which is best for what YOU want to do with them.
Not always the faster one, or better one, but the one the average customer decides to buy.
"To be is to do." --Socrates
"To do is to be." -- Aristotle
"Do-Be-Do-Be-Do..." --Sinatra
There is a ton of money to be made in the high end of the market. Fewer boxes, but much higher revenue (and profit margin) per box.
:)
Whether it makes sense... depends. Some applications really do benefit from big boxes - OLTP, ERP, datamining, application servers, directory servers. Commodity virtualization has made consolidation a buzzword again.
The other thing to keep in mind is that when you move into a large environment the initial purchase price is not necessarily the primary cost - rack space costs money, as do power, switch ports, battery backed power, power distribution, cooling.
Some of the pitfalls of horizontal scaling are addressed by blade centers, but then you're looking at a cost of more like $2-3k for a dual processor system, depending on the vendor and architecture. You're also going to pay more for storage (2.5" disks), and be limited in capacity (typically no more than 2 x 74GB). There's a fair number of blades out that that will only take one disk, which precludes mirrored storage without external storage of some sort. The cost of the chassis also needs to be taken into account. Usually only a few grand, but then the switch module is a few grand more, redundant power supplies another grand or two, storage module if you need it another grand or two. If you're not already running high power equipment, getting new power circuits and equipment will cost thousands more.
Then of course there's the soft cost - aggregate support costs are likely lower with multiple boxes, operating system licensing, software licensing (expecially when we're talking about stuff that runs into six figures per machine), system administrators, etc.
Not knocking horizontal scaling, mind you. We should be picking up at least seventy more blades before the end of the year. The demand for big boxes isn't going to disappear anytime soon, though.
(Not to mention that $20k is still firmly in the "volume server" category. They may sound expensive next to a $1-2k x86 machine, but compare it to $200k for an eight-way Sparc or PA-RISC system and it looks like a mighty fine bargain.
They made a lot of smart decisions in their design. Svartalf is likely right in one sense though - the actual execution pipelines of the Opteron are theoretically more capable than those of the Core architecture. Of course that doesn't mean anything if the pipeline isn't kept fed.
Something that I might be concerned about at Intel is that some of their optimizations aren't necessarily coupled to their design decision, such as the intelligent pre-fetch in their memory controller. It was more necessary in their design, since they have inherently higher latency to overcome. There's no reason the idea couldn't be adapted to do pre-fetch into L3 cache on the new AMD design, though, to further cut down on latency (especially when they add support for FB-DIMMs).
Nope. Not dead. SPARC is up to 8 cores, and unlike above-mentioned quads, it is actually a real processor shipping and running today for reasonable $. I was configuring one a couple weeks ago. Sweeeeeeet box ....
-- Windows is not simply installed on a computer; it is inflicted.
I actually harbor no enmity toward Intel. That statement was meant more tongue-in-cheek than it came off, though.
However, there is a reason to care about how quad core is being achieved. In both the first generation design (where they have two disctint does on an MCM) and the second generation design (the Yorkfield, with two dual core blocks on one die) - the cache. Not only is there likely to be data duplicated in the two L2 cache sets (thus reducing the "effective" amount of cache), any cache traffic has to travel over the FSB.
"some recent totally unbiased HP studies indicate"
You can't be serious...
Show me any possible way to design a microprocessor that can do number crunching faster if the memory subsystem is hampered. Do you realize what would happen to that Athlon if we disabled the on-board memory controller and forced it to communicate with memory through an intermediate chipset?
All, I repeat *all* processors need fast memory accesses. It's simply how they chose to make memory accesses faster that differs. The Athlon went with an on-board memory controller and the Core uses a load of cache and more intelligent pre-fetch algorithms. They *all* need some form of "cheating" to overcome the fact that they have to go to memory.
In the situation we're talking about, an on-board memory controller ala the Opteron would require *more* software intelligence, not less. In a shared bus architecture, the most that can happen is that data is in the wrong cache and an inter-cache transfer (I'm not sure how Intel does this but I would assume there's a bus between the two caches) is required. In the case of separate memory paths as in the Opteron multiprocessor system, information can be in the wrong *memory* bank. It would then have to be read out of memory, travel to one die, get transfered over the HT link (which isn't exactly low-latency) and used in the other die. Yes this can be avoided with smart software. But only smart *low-level* software. The OS will have to do this as applications do not even have access to physical memory address.
(C) You know who.
- Penryn is a quad-core notebook processor with 6MB of L2 cache. Unlike today's quad-core Kentsfield, Penryn is a "true/real" quad-core CPU with all four cores on one die. The 6MB of L2 cache is shared among the four cores.
- Yorkfield is an 8-core desktop processor with 2x6MB of L2 cache. Like Kentsfield, Yorkfield is a "cheat" and is really just two quad-core dies (two Penryns) on a single package. However, Kentsfield's early benchmarks have looked pretty good for a "cheat," so I'm reserving judgement on this design decision. I've also read that, similar to how Kentsfield trailed Conroe by a few months, Yorkfield will be released several months after Penryn.
Intel may be "cheating" with Yorkfield, but it looks like Intel ship their "8 cores in one socket" CPU long before AMD can release their "real" 8-core CPU.Some links:
TO START
PRESS ANY KEY
Where's the 'ANY' key? I see Esk, Kitarl, and Pig-Up...
Are you running a distributed microkernel OS on it?
You said "Unfortunately, as long as it needs a full-height graphics card, it'll take up just as much space."
There are half height DVI cards so your statement isn't true. You can get a small chassis that takes either size card anyway. It's not that big a deal.
Enjoy your mini and it's slow hard drive if that's your solution. I think you have more options than that.
The rumours of yorkfield being eight core have largely fallen by the wayside. See here for more recent speculation. (The link you provided points to rumours from 2005-12, mine are from 2006-09.)
One thing you guys need to understand is that memory latency is a secondary issue in PC architecture (much as AMD would love you to believe otherwise). Fact 1: while CPUs have gone from 2GHz, commodity DRAM latency has gone from ~100ns to ~50ns. Clearly density, cost, and power are the priorities. Fact 2: memory vendors could reduce latency drastically if there was a desire to. Take RLDRAM-2 from Micron... 20ns latency, slightly larger die. Current cost: 10X the price of DDR2 because the market is so much smaller. I have actually asked DDR2 program managers whether they would make the part 20% lower latency at 5% more cost and they replied "absolutely not". Fact 3: Intel may have an off-chip memory controller, but in a four-socket AMD board, 75% of RAM is attached to an off-chip controller also. 8-way is worse. Conclusion: Winning CPU architectures have been dealing with memory latency for ages. Architectures that work around memory latency, versus needing less of it, will scale better given facts #1 and #2. AMD and Intel have just chosen different bus/cache solutions. They both deal with horrendous memory latencies. AMD added a bunch of pins to their CPU, Intel added a bunch of cache; both add cost. IMHO, AMD's N-way socket story (good due to HT) is hurt by the fact that most memory is now off-chip, while they may not have enough cache to deal with that.