Intel's Big Chip
DeadBugs writes "News.com has an article about the size of the upcoming revision for the Itanium. The "McKinley" chip will be 464 square millimeters which would make it one of the largest ever produced. Most of this is due to the 64 bit registers and 3MB of Level 3 Cache. There is also a link to an article about "Chivano" an Itanium which will include concepts from the Alpha architecture"
Sounds silly to me. By the time you get out to the 3rd level of cache, on a 1GHz core, there should be enough slow down that chip to chip interconnect will be adequately fast.
Either Intel has actually put research into this and discovered that it's a good tradeoff performancewise, or they've still got marketing driven engineering and someone said "wow! over 3 MB of on chip cache!"
Any guess on the wattage? Has Intel broken 100 Watts on their upward march of hot chips?
Start Running Better Polls
Way back when the 386 was hot stuff there was a series of mother boards that had a 64K of cache that was outperformed by a board that had 16K of cache.
How? The 16K board cache was four way set associative. This allowed for the CPU to determine in one clock cycle if the next instruction was in cache. The 64K cache design could not always do this. Thus it was often slower. Why not make the 64K cache 4 way set associative? Cost. The overhead in silcon and motherboard space made this impossible at the time.
Yeah, right. Intel is the big player. Right.
;)
My calculator's processor has 64 bit registers. You think i'm trolling? Check it out for yourself:
google search
There are a lot more (and more powerful) procs out there, but this one just seems more appropriate for intel bashing
Looking for people to chat about multicopters, coding, music. skype: gtsiros
http://www.lostcircuits.com/cpu/hp_pa8800
8 70 0wp.pdf
Has 3Mbyte L1 cache and 32Mbyte L2 cache and
a transistor count of 300 million.
To quote:
"The HP PA-8800 L1 cache is probably the biggest L1 that ever existed so far with separate 750 KBytes of data and instruction cache for each core. This results in no less of 4 blocks of ¾ MB density each for a total of an unprecedented 3 MB L1 cache, physically twice as much as the combined L1+L2 on IBM's Power4. Accordingly, the transistor count of the HP-PA8800 is with 300 Million transistors almost twice as high as the 170 Million transistors of the IBM Power4 and results in a die size of 23.6x15.5 mm2 or 361 mm2. The L2 cache of the PA-8800 is off-chip and consists of four 72 Mbit "1 Transistor SRAM" chips developed by Enhanced Memory Systems.
http://www.cpus.hp.com/technical_references/PA-
has a roadmap of the hp-pa and Itanium chips so
really there is nothing new or exciting to report
that hasn't already been said 9 months ago.
Well more cache is a good thing an unmamafacurable chip or an overly pricey chip is really bad. The yeilds on a 464 sq mm chip will be really low, thats 21.5mm on a side assuming a square die and they ussually are. 25.4mm=inch means that this is very nearly a square inch. I'm working on a 10mm x10mm chip now and we expect a ~40% yeild just due to area. SRAM is twice as likely to fail as general logic so large caches L1 or L2 just go to reducing yeild. The larger the chip failures go up exponentially. An example of this would be you wanting to make a table out of pine so you go to the lumber yard and buy some wood, you get what you get and some pieces have wholes or knots that would ruin you table so you toss those. If you make a small table you may be able to get the table by avoiding the knots, but as the table gets bigger your ability to find a good full size peice deceases to the point that every peice of wood has at least one knot so you can never make a knot free table. If you get 20 defects per wafer and you only get 50 die per wafer you will probably get 30 chips (60% yeild) if your die increase to 30 per wafer you may only get 10 (33% yeild). If the die grows to 20/wafer you may get none (0% yeild). All wafers cost pretty much the same so yeild*wafer cost pretty much sets the cost, the lower the yeild the higher the cost. HP has done some interesting work in making chips with extra resources, such that problems can be avoided, route around the defect, they are quite a ways from making this work but this is probably what the future of silicon is going to look like. The other big factor in cost is testing and the more complex a chip the longer the test, its getting to the point that testing costs are out weighting silicon costs, so a self repair chip can help both yeild and testing issues.
The cache miss penalty is huge in IA64 because it can't reorder stalled instructions. That's one reason its performance is terrible on irregular memory-intensive applications (i.e., most server workloads). Anything that reduces the cache miss rate has got to help.
Why not just follow IBM and build a good old fashioned Risc chip like the Power4. No instead they decide to build this uberweird VLIW chip with a new name and marketing spin. It is never a good idea to make a product out of research project. Hell they could have purchased the HP-PA Risc or Compaq alpha assets, used the Instruction Set and build an uberfast Risc chip besting their competition in a shorter period of time and continued research using the money spent on Itanium. It is just a crazy processor with too many registers and oversimplified instructions to the point where they have become complex. Hell if amd was to start from scratch they could have just ripped the microOps off of the Athalon used a new instruction set and had a faster chip than the itanium. Intel is trying to make too large of a step too fast. As for 64 bits, who cares except for servers for floating point intel, amd, and the PPC all use an 80 bit mantessa effectively allowing for greater than 64bit fp math and inter math above 32 bits is not very common. The only reason for the 64 bit chips is 64 bit addressing making large ram access possible and also increasing the rate instructions can be sent to the fp units as they are only sent half at a time due to 32 bit memory addressing.
As people have pointed out the 800Mhz Itanium chips - the fastest you can buy - have an integer performance slightly less than an 800Mhz PIII.
From the article: "Applications will be about one and a half to two times faster than what you get on a (current) Itanium"
I'm assuming this is WITH the huge L3 cache in pilot systems if they are claimed actual application performance.
Let's compare this to the REAL competition: IBMs Power4.
IBM Power4 1.3GHz - shipping for a while now:
SPECint2000 = 814 SPECint_base2000 = 790
SPECfp2000 = 1169 SPECfp_base2000 = 1098
Even the best Itanium reported int numbers are:
SPECint2000 = 365 SPECint_base2000 = 358
(Same box) SPECfp2000 = 610 SPECfp_base2000 = 526
Even if the McKinley (which doesn't ship for 6 months or so) produces double the Itanium numbers (which it won't) it'll still lag the currently shipping Power4 chips.
And with only an clock speed increase of 60% over the next three years IBM can stay ahead simply by getting the 1.8Ghz models out the door in the next 24 months. (That's assuming that the 1.6Ghz McKinleys will even outperform the current Power4s.)
It looks like Intel has increased clock speed by 25% added a bunch of L3 cache and is claiming 150%-200% gain. I think Intel has a (big) dog on their hands and they're trying to dress it up. The P4 performance will probably continue to outrun their flagship "server" chip and because of AMD Intel can't afford to strangle the P4's performance as they might have been able to in the past.
Intel said, "Wait for Merced." - which we did for years. Then they said, "Well, the Itanium sucks, but wait for McKinley!"
=tkk
Bill Gates - Creationist?!?
I've never been able to figure that out.
That's true.
150-200% is a modest prediction for performance.
This was the prediction of an Intel representative. I can't imagine he was TOO conservative... Then again it's academic since no one is actually running software on an Itanium - who can compare their current results with future ones?
But seriously - the faster clock speed and cache (since Int operations are much more sensitive to cache changes) would account for a nice bump in performance. I'd expect nearly a 50% increase in speed simply from the changes I noted. Even if it is twices as fast then new chip arch is only reponsible for a small increase in that speed.
My point is that HP decided as early as 1996 that the Merced project would never surpass PA-RISC and essentially took their marbles and went home. McKinley was an attempt to get something out of the project after it was clearly headed for failure. Intel should have known they had a dog on their hands and yet the flogged the FUD for years and after billions of dollars they have yet to deploy a compelling technology.
You should also note in your SPEC marks that there's accusations that IBM "cheated" with their submissions.
Thank goodness Intel has never been accussed of anything so horrid!
I'm not sure on the details on it, but I was reading parts of it on www.realworldtech.com the other day.
Well if it's on the Internet it MUST be true...
Let me get this straight - because you "heard something" you can't back up I should note that IBM's officially submitted Spec results are faked? How do you figure?
=tkk
Bill Gates - Creationist?!?