Intel's Big Chip
DeadBugs writes "News.com has an article about the size of the upcoming revision for the Itanium. The "McKinley" chip will be 464 square millimeters which would make it one of the largest ever produced. Most of this is due to the 64 bit registers and 3MB of Level 3 Cache. There is also a link to an article about "Chivano" an Itanium which will include concepts from the Alpha architecture"
Is this the start of the manly "Mine is bigger than yours" battle?
The Athlon chips i have are around 2-2.5 inches on a side, however, the die in the middle is quite small, i'd estimate it it be 200-250 square mm, so a 400+ square millimeter is huge, compared to that.
Anyone have any exact numbers for the chips? I didn't get a ruler out to measure it.
Ace's Hardware has this bit with more information including links to an Intel presentation.
"Slide 22 of the presentation features a die photo of McKinley. The large 3 MB L3 cache is notable, and according to the presentation, it consumes 20% less area than traditional designs and is overall 85% efficient (~70% for traditional designs)."
And here's a story with the photo from that same article (no need to download 2.5 meg pdf...)
-Russ
Me
Just for some minor clarifications: The 464 mm squared is the area of the actual cpu die. Like the little square on top of an athlon. So 2 cm per side die is kind of huge for a processor. The actual processor out of the box would have to be much larger than previous models. Next, 3 MB cache sounds nice, but L3? It may be on die, but by that point the clock reduction probably makes it perform equivalently to a 256 k L1 cache, or a 512 or larger L2. Not that it won't help a lot for complicated instructions, and it's probably less expensive/difficult to engineer to hook a larger amount of cache to a slower pipeline than to add more cache deeper into the cpu's core. 64-bit cpu's will be important in the future, but only when compatible apps and OS designs become mainstream.
Way back when the 386 was hot stuff there was a series of mother boards that had a 64K of cache that was outperformed by a board that had 16K of cache.
How? The 16K board cache was four way set associative. This allowed for the CPU to determine in one clock cycle if the next instruction was in cache. The 64K cache design could not always do this. Thus it was often slower. Why not make the 64K cache 4 way set associative? Cost. The overhead in silcon and motherboard space made this impossible at the time.
Wouldn't a larger surface area allow for better cooling? Isn't that the whole principle of a heatsink in the first place?
If the die uniformly heats, then yes, this is true. But that's not always the case. The latest P3's are so low power that you just need a heatsink or fan-sink, depending on frequency. The first P4s had a head spreader that sat on the back of the die and connected to the fansink.
Plus heat in a die goes up/down easier then left/right because the thermal conductivity of the heatsink is much better than that of silicon, and is closer than the edge of the die. If you've got local hot spots on the die, a bigger die doesn't by you anything. The thermal properties and requirements of the heatsink are driven more by local heat density than by overall heat.
Tom Pabst had a good discussion about this a while ago, but I can't remember the article's URL.
https://www.accountkiller.com/removal-requested
Yeah, right. Intel is the big player. Right.
;)
My calculator's processor has 64 bit registers. You think i'm trolling? Check it out for yourself:
google search
There are a lot more (and more powerful) procs out there, but this one just seems more appropriate for intel bashing
Looking for people to chat about multicopters, coding, music. skype: gtsiros
Actually, according to this article over at The Inquirer Intel can now demo a 5GHz chip using the .13 micros process that can run at room temperature. That'd be interesting to see in action if it benches as well as a 5GHz chip should in comparison to current chips.
http://www.lostcircuits.com/cpu/hp_pa8800
8 70 0wp.pdf
Has 3Mbyte L1 cache and 32Mbyte L2 cache and
a transistor count of 300 million.
To quote:
"The HP PA-8800 L1 cache is probably the biggest L1 that ever existed so far with separate 750 KBytes of data and instruction cache for each core. This results in no less of 4 blocks of ¾ MB density each for a total of an unprecedented 3 MB L1 cache, physically twice as much as the combined L1+L2 on IBM's Power4. Accordingly, the transistor count of the HP-PA8800 is with 300 Million transistors almost twice as high as the 170 Million transistors of the IBM Power4 and results in a die size of 23.6x15.5 mm2 or 361 mm2. The L2 cache of the PA-8800 is off-chip and consists of four 72 Mbit "1 Transistor SRAM" chips developed by Enhanced Memory Systems.
http://www.cpus.hp.com/technical_references/PA-
has a roadmap of the hp-pa and Itanium chips so
really there is nothing new or exciting to report
that hasn't already been said 9 months ago.
... if you can't run the apps.
Intel x86 is restricted to 48-bit addressing (with segment registers), and practically 64GB with modern OSes. (http://linux-mm.org/)
If I want more than 64GB of addressable physical memory (which I do for some apps), then who cares if you can give me a 32-bit x86 running at 900GHz, it's not going to do diddly squat for me, since _going over the PCI bus_ for swap is going to kill me vs a 1.6GHz 64-bit processor. And since you need to go over the PCI bus just to get to a pseudo-disk stuffed with RAM, that solution is still bogus.
I see your point that this isn't what Joe Blow's gonna put on his desk. But the improved address space is definately a big win, and that's assuming that they can't ramp up the clock speed in a hurry.
Now that you mention AMD. It has been roumoured last week all over the net that intel has a backup plan, an P4 with 64bit extenstions
.18.
.18 and the next generation is on .13. note that this can make a differce of a factor 2 (13^2/18^2= 0.52)
os.opinion article
news.com
by the way, the amd hammer is expected to 105 mmm^2 on 130 nanometer (.13).
the current amd MP (palomino) has a die size of 129mm on
the original P4 has a die size of 217mm and is now at 150 mm^2.(with a bigger cache)
Note that the original article does mention the 424 size is on
As people have pointed out the 800Mhz Itanium chips - the fastest you can buy - have an integer performance slightly less than an 800Mhz PIII.
From the article: "Applications will be about one and a half to two times faster than what you get on a (current) Itanium"
I'm assuming this is WITH the huge L3 cache in pilot systems if they are claimed actual application performance.
Let's compare this to the REAL competition: IBMs Power4.
IBM Power4 1.3GHz - shipping for a while now:
SPECint2000 = 814 SPECint_base2000 = 790
SPECfp2000 = 1169 SPECfp_base2000 = 1098
Even the best Itanium reported int numbers are:
SPECint2000 = 365 SPECint_base2000 = 358
(Same box) SPECfp2000 = 610 SPECfp_base2000 = 526
Even if the McKinley (which doesn't ship for 6 months or so) produces double the Itanium numbers (which it won't) it'll still lag the currently shipping Power4 chips.
And with only an clock speed increase of 60% over the next three years IBM can stay ahead simply by getting the 1.8Ghz models out the door in the next 24 months. (That's assuming that the 1.6Ghz McKinleys will even outperform the current Power4s.)
It looks like Intel has increased clock speed by 25% added a bunch of L3 cache and is claiming 150%-200% gain. I think Intel has a (big) dog on their hands and they're trying to dress it up. The P4 performance will probably continue to outrun their flagship "server" chip and because of AMD Intel can't afford to strangle the P4's performance as they might have been able to in the past.
Intel said, "Wait for Merced." - which we did for years. Then they said, "Well, the Itanium sucks, but wait for McKinley!"
=tkk
Bill Gates - Creationist?!?