Slashdot Mirror


Intel's Big Chip

DeadBugs writes "News.com has an article about the size of the upcoming revision for the Itanium. The "McKinley" chip will be 464 square millimeters which would make it one of the largest ever produced. Most of this is due to the 64 bit registers and 3MB of Level 3 Cache. There is also a link to an article about "Chivano" an Itanium which will include concepts from the Alpha architecture"

28 of 282 comments (clear)

  1. big chip... big fan by Transient0 · · Score: 3, Insightful

    i wonder if the oversized chip will lead to particular cooling difficulties(i.e. standard fans and heatsinks can't cool the entire surface area)...

    1. Re:big chip... big fan by Sebastopol · · Score: 5, Informative

      Wouldn't a larger surface area allow for better cooling? Isn't that the whole principle of a heatsink in the first place?

      If the die uniformly heats, then yes, this is true. But that's not always the case. The latest P3's are so low power that you just need a heatsink or fan-sink, depending on frequency. The first P4s had a head spreader that sat on the back of the die and connected to the fansink.

      Plus heat in a die goes up/down easier then left/right because the thermal conductivity of the heatsink is much better than that of silicon, and is closer than the edge of the die. If you've got local hot spots on the die, a bigger die doesn't by you anything. The thermal properties and requirements of the heatsink are driven more by local heat density than by overall heat.

      Tom Pabst had a good discussion about this a while ago, but I can't remember the article's URL.

      --
      https://www.accountkiller.com/removal-requested
    2. Re:big chip... big fan by Anonymous Coward · · Score: 3, Funny

      Nope. I'm already moving on this issue, with the assistance of several eager venture capitalists.

      We're going to start up a business modifying Sears deep freezers, providing a means of placing a PC directly into it.

      Although your entire computer system will be the size of a bathtub and double your electric bill, you WILL be able to use PC's based on these new CPU's.

      We're also going to figure out how to work rain forest defoliation into the process.

    3. Re:big chip... big fan by fobbman · · Score: 4, Troll

      Actually, according to this article over at The Inquirer Intel can now demo a 5GHz chip using the .13 micros process that can run at room temperature. That'd be interesting to see in action if it benches as well as a 5GHz chip should in comparison to current chips.

    4. Re:big chip... big fan by haruharaharu · · Score: 3, Informative

      Intel can now demo a 5GHz chip using the .13 micros process that can run at room temperature.

      Big deal. It has 12 instructions, is ~2mm^2, and consumes 267mW. This looks more like research than something that you would use for real work.

      --
      Reboot macht Frei.
  2. Die size war? by Guitarzan · · Score: 5, Funny

    Is this the start of the manly "Mine is bigger than yours" battle?

  3. Note, this is the *DIE* size by jaxdahl · · Score: 4, Informative

    The Athlon chips i have are around 2-2.5 inches on a side, however, the die in the middle is quite small, i'd estimate it it be 200-250 square mm, so a 400+ square millimeter is huge, compared to that.

    Anyone have any exact numbers for the chips? I didn't get a ruler out to measure it.

  4. Die Photo and Size by rbeattie · · Score: 5, Informative

    Ace's Hardware has this bit with more information including links to an Intel presentation.

    "Slide 22 of the presentation features a die photo of McKinley. The large 3 MB L3 cache is notable, and according to the presentation, it consumes 20% less area than traditional designs and is overall 85% efficient (~70% for traditional designs)."

    And here's a story with the photo from that same article (no need to download 2.5 meg pdf...)

    -Russ

    --
    Me
  5. A few minor points by mtnharo · · Score: 4, Insightful

    Just for some minor clarifications: The 464 mm squared is the area of the actual cpu die. Like the little square on top of an athlon. So 2 cm per side die is kind of huge for a processor. The actual processor out of the box would have to be much larger than previous models. Next, 3 MB cache sounds nice, but L3? It may be on die, but by that point the clock reduction probably makes it perform equivalently to a 256 k L1 cache, or a 512 or larger L2. Not that it won't help a lot for complicated instructions, and it's probably less expensive/difficult to engineer to hook a larger amount of cache to a slower pipeline than to add more cache deeper into the cpu's core. 64-bit cpu's will be important in the future, but only when compatible apps and OS designs become mainstream.

  6. It's how you use it by ouija147 · · Score: 4, Interesting

    Way back when the 386 was hot stuff there was a series of mother boards that had a 64K of cache that was outperformed by a board that had 16K of cache.

    How? The 16K board cache was four way set associative. This allowed for the CPU to determine in one clock cycle if the next instruction was in cache. The 64K cache design could not always do this. Thus it was often slower. Why not make the 64K cache 4 way set associative? Cost. The overhead in silcon and motherboard space made this impossible at the time.

  7. Itanium at 1.6 GHz in 2003 ? by Utopia · · Score: 3, Insightful

    For the article
    Madison is expected to come out in 2003 and run between 1.2GHz and 1.6GHz, according to sources.

    I wonder how Intel expects people to adopt Itanium-based processors considering
    that x86 processors will be running at 4GHz in 2003.

    1. Re:Itanium at 1.6 GHz in 2003 ? by roca · · Score: 3, Informative

      > There's nothing on any recent Intel roadmaps that
      > will have Itanic replacing x86 on the desktop.

      Which is really going to hurt them. The latest version of Everquest recommends 512MB of RAM. High-end gamers are going to need 64-bit addressing in a few years. AMD will be able to supply cheap 64-bit chips, Intel will be playing catch-up at best.

    2. Re:Itanium at 1.6 GHz in 2003 ? by sconeu · · Score: 3, Funny

      4ghz of Hot P4 Action

      I'm sorry, but I just got the mental image of the geek pr0n site that would use this tagline!

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    3. Re:Itanium at 1.6 GHz in 2003 ? by HiredMan · · Score: 4, Interesting
      AFAIK - Enough said.


      As people have pointed out the 800Mhz Itanium chips - the fastest you can buy - have an integer performance slightly less than an 800Mhz PIII.


      From the article: "Applications will be about one and a half to two times faster than what you get on a (current) Itanium"
      I'm assuming this is WITH the huge L3 cache in pilot systems if they are claimed actual application performance.

      Let's compare this to the REAL competition: IBMs Power4.

      IBM Power4 1.3GHz - shipping for a while now:
      SPECint2000 = 814 SPECint_base2000 = 790
      SPECfp2000 = 1169 SPECfp_base2000 = 1098

      Even the best Itanium reported int numbers are:
      SPECint2000 = 365 SPECint_base2000 = 358
      (Same box) SPECfp2000 = 610 SPECfp_base2000 = 526

      Even if the McKinley (which doesn't ship for 6 months or so) produces double the Itanium numbers (which it won't) it'll still lag the currently shipping Power4 chips.
      And with only an clock speed increase of 60% over the next three years IBM can stay ahead simply by getting the 1.8Ghz models out the door in the next 24 months. (That's assuming that the 1.6Ghz McKinleys will even outperform the current Power4s.)

      It looks like Intel has increased clock speed by 25% added a bunch of L3 cache and is claiming 150%-200% gain. I think Intel has a (big) dog on their hands and they're trying to dress it up. The P4 performance will probably continue to outrun their flagship "server" chip and because of AMD Intel can't afford to strangle the P4's performance as they might have been able to in the past.

      Intel said, "Wait for Merced." - which we did for years. Then they said, "Well, the Itanium sucks, but wait for McKinley!"

      =tkk

    4. Re:Itanium at 1.6 GHz in 2003 ? by HiredMan · · Score: 3, Interesting
      They did a lot more than that. It has a shorter pipeline, higher clockrate, additional integer units, on-die L3 cache.

      That's true.

      150-200% is a modest prediction for performance.

      This was the prediction of an Intel representative. I can't imagine he was TOO conservative... Then again it's academic since no one is actually running software on an Itanium - who can compare their current results with future ones? ;)
      But seriously - the faster clock speed and cache (since Int operations are much more sensitive to cache changes) would account for a nice bump in performance. I'd expect nearly a 50% increase in speed simply from the changes I noted. Even if it is twices as fast then new chip arch is only reponsible for a small increase in that speed.

      My point is that HP decided as early as 1996 that the Merced project would never surpass PA-RISC and essentially took their marbles and went home. McKinley was an attempt to get something out of the project after it was clearly headed for failure. Intel should have known they had a dog on their hands and yet the flogged the FUD for years and after billions of dollars they have yet to deploy a compelling technology.

      You should also note in your SPEC marks that there's accusations that IBM "cheated" with their submissions.

      Thank goodness Intel has never been accussed of anything so horrid!


      I'm not sure on the details on it, but I was reading parts of it on www.realworldtech.com the other day.


      Well if it's on the Internet it MUST be true...
      Let me get this straight - because you "heard something" you can't back up I should note that IBM's officially submitted Spec results are faked? How do you figure?

      =tkk

  8. 64 bit regs is new? by gTsiros · · Score: 4, Interesting

    Yeah, right. Intel is the big player. Right.

    My calculator's processor has 64 bit registers. You think i'm trolling? Check it out for yourself:
    google search

    There are a lot more (and more powerful) procs out there, but this one just seems more appropriate for intel bashing ;)

    --
    Looking for people to chat about multicopters, coding, music. skype: gtsiros
    1. Re:64 bit regs is new? by morcheeba · · Score: 3, Informative

      A quick summary of the Saturn microprocessor, for those interested...

      The Saturn processor is a propietary HP chip used in many of its calculators. It's generally considered a 4 bit chip (since this is the internal data bus size), but it has four 64-bit registers. I think the coolest part of the chip is that each instruction can operate on various portions of these registers -- for example, only the upper nibble, or only the lowest 4 nibbles. Since this is a calculator, math is generally done in BCD format. Externally, the chip connects using an 8-bit data bus. The address bus width (and therefore the PC, too) is 20 bits wide, and each address refers to a nibble of data. Maximum addressable memory = 1 meganibbles = 512KB. Most of the calculator firmware (such as calculating the sine of a number or matrix manipulation) is interpreted RPL to allow code reuse code (to save time, and to ensure bug-free implementations)

      HP did a great job with this calculator, including releasing internal documenation and development tools. More info here, or use google.

      It's a shame that HP shut down thier calculator division.

  9. Coffee warmer built-in! by Insightfill · · Score: 3, Funny

    At that size, a smallish mug should fit nicely on it. No use wasting all that heat!

    1. Re:Coffee warmer built-in! by sharkey · · Score: 3, Funny

      At that size, a smallish mug should fit nicely on it

      Now, someone needs to figure out how to mount it on the CD/DVD tray, so the cup-holder will be heated.

      --

      --
      "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
  10. That's Almost 3 bits per millimetre! by Bobzibub · · Score: 3, Funny

    "464 square millimeters which would make it one of the largest ever produced....due to the 64 bit registers." 464^.5=21.54mm a side.
    64bits/21.54mm=2.97 bits/mm

    They've GOT to start using smaller wavelengths!

  11. Nothing new here - take a look at the hp-pa 8800 by Anonymous Coward · · Score: 5, Interesting

    http://www.lostcircuits.com/cpu/hp_pa8800

    Has 3Mbyte L1 cache and 32Mbyte L2 cache and
    a transistor count of 300 million.

    To quote:

    "The HP PA-8800 L1 cache is probably the biggest L1 that ever existed so far with separate 750 KBytes of data and instruction cache for each core. This results in no less of 4 blocks of ¾ MB density each for a total of an unprecedented 3 MB L1 cache, physically twice as much as the combined L1+L2 on IBM's Power4. Accordingly, the transistor count of the HP-PA8800 is with 300 Million transistors almost twice as high as the 170 Million transistors of the IBM Power4 and results in a die size of 23.6x15.5 mm2 or 361 mm2. The L2 cache of the PA-8800 is off-chip and consists of four 72 Mbit "1 Transistor SRAM" chips developed by Enhanced Memory Systems.

    http://www.cpus.hp.com/technical_references/PA-8 70 0wp.pdf

    has a roadmap of the hp-pa and Itanium chips so
    really there is nothing new or exciting to report
    that hasn't already been said 9 months ago.

  12. Who cares about GHz... by jbf · · Score: 5, Insightful

    ... if you can't run the apps.

    Intel x86 is restricted to 48-bit addressing (with segment registers), and practically 64GB with modern OSes. (http://linux-mm.org/)

    If I want more than 64GB of addressable physical memory (which I do for some apps), then who cares if you can give me a 32-bit x86 running at 900GHz, it's not going to do diddly squat for me, since _going over the PCI bus_ for swap is going to kill me vs a 1.6GHz 64-bit processor. And since you need to go over the PCI bus just to get to a pseudo-disk stuffed with RAM, that solution is still bogus.

    I see your point that this isn't what Joe Blow's gonna put on his desk. But the improved address space is definately a big win, and that's assuming that they can't ramp up the clock speed in a hurry.

  13. Amd competition. more numbers. by leuk_he · · Score: 5, Informative

    Now that you mention AMD. It has been roumoured last week all over the net that intel has a backup plan, an P4 with 64bit extenstions

    os.opinion article
    news.com

    by the way, the amd hammer is expected to 105 mmm^2 on 130 nanometer (.13).

    the current amd MP (palomino) has a die size of 129mm on .18.

    the original P4 has a die size of 217mm and is now at 150 mm^2.(with a bigger cache)

    Note that the original article does mention the 424 size is on .18 and the next generation is on .13. note that this can make a differce of a factor 2 (13^2/18^2= 0.52)

  14. Re:Less Logic, More Cache? by roca · · Score: 3, Interesting

    The cache miss penalty is huge in IA64 because it can't reorder stalled instructions. That's one reason its performance is terrible on irregular memory-intensive applications (i.e., most server workloads). Anything that reduces the cache miss rate has got to help.

  15. Re:Nothing new here - take a look at the hp-pa 880 by roca · · Score: 3, Informative

    The question with an L1 cache of that size is how many cycles it takes to access the cache. It's easy to make a huge L1 cache, you just pay in increased access time. It's not impressive until we know the latency numbers as well as the size.

  16. NOT not wow! by plover · · Score: 3, Informative
    You're absolutely correct in that a substantially larger die will result in substantially lower yields (excepting any magical breakthroughs in chip fabrication, which are always possible.)

    But there are segments of today's market that are willing to pay almost any price for a high-performance chip. These people will fork over a $1000 without blinking an eye if they think it will speed up their business.

    Look at any commercial server available today. They're priced around $15000 - $20000. If chip prices go to $1000 instead of the $400 they're probably paying, that makes a difference of $2400, or about 12%, in a 4 way box. Even if chip prices went to $2000, it's a $5600 difference, or a 28% difference. If your processors are your bottleneck, then you've gained a lot of improvement for not-very-much delta in money.

    Sure, a $2000 chip is out of reach for most home users today, but there is always a market for just about anything faster they can produce.

    And there are enough crazed overclockers out there that'll spend whatever it takes to raise their frame rates on Quake III. It'll sell. It'll also drive the market to a new standard, which also sells chips.

    --
    John
  17. Thanks! Where would we be without clarifications? by megalomang · · Score: 3, Informative
    Thanks for your "clarifications". You have saved us all from a life of ignorance.

    What you meant to say (and what the article said), is that 464mm^2 is size of the actual die size of the processor This includes the CPU and the caches. The CPU is a relatively small portion of the processor die, and noting there is 3MB of L3, the total cache may amount to 2/3 of the die size. The square on top of the athlon is also the entire processor die: cpu, caches and all.

    Also, L3 cache can never perform "equivalently" to L2 or L1 cache unless it runs at core speed. And I can tell you now, it doesn't -- or they wouldn't need L1 and L2. The L3 cache probably runs at something like 10 access cycles or more. It's not difficult to engineer 10 access cycles into any pipeline -- it's impossible. Which is precisely why it's not L1.

    I'm quite sure the engineers at Intel have done their modeling homework and determined that however fast the L4 memory may be, the L3 will improve performance by that much more.

    Remember, this processor is not meant to go on you or any other Joe Sixpack's desktop. It is meant to sit inside the workstations on the desks of engineers and in the racks of high-bandwidth servers. These platforms are specifically designed to run hundreds of tasks simultaneously and handle staggeringly high memory bandwidths. It has nothing to do with "complicated instructions." The L3 exists for swapping out large pages of memory in large bursts from a significantly larger sized L4 memory (think on the order of 100's of GB) from L5 memory (local drives and SANs) that has an incomprehendable virtual memory space.

    This has absolutely nothing to do with mainstream. I'm quite certain an OS already exists that will run on the platform. An IA-64 Linux is well under way (try http://www.linuxia64.org) and you can bet that Compaq, HP, Dell, and Intel have put a total of more than 100x your lifetime earnings into developing software for that platform.

    Intel could not care less whether you or 99.9% of the /. readers out there ever buy an IA-64. They don't give a crap about your market segment, but I'm sure if you want to drop $10K+ on a IA64 workstation, be my guest. Your choices are limited. Either choose IA64 or UltraSparc. Or maybe if AMD ever gets a design win, you might get a chance to buy a Hammer box.

  18. Cache Design 101. by Christopher+Thomas · · Score: 3, Informative

    Next, 3 MB cache sounds nice, but L3? It may be on die, but by that point the clock reduction probably makes it perform equivalently to a 256 k L1 cache, or a 512 or larger L2.

    A fundamental rule of building caches is that a larger cache is slower and dissipates more energy per lookup than a smaller cache. This is why multilevel cacheing exists in the first place (otherwise we'd just have a huge L1 cache - and before you mention it, due to architectural sneakiness, HP's giant L1 cache isn't really an L1 cache).

    So, you can't just spend the L3 area on making a bigger L2. You'd end up with a slower, hotter L2, which could easily _degrade_ performance.

    As long as the L3 cache is faster to access than main memory, it'll be useful for some things. Whether it'll be useful for *most* things is another issue. This depends on the "working set" of the applications you're running (how much memory they repeatedly access). I guess Intel's banking on working sets being larger than most caches.

    Another possibility is that they're testing the cache architecture for use in future SMT or CMP designs (both of which would have multiple independent executioin contexts running). If you're running multiple *independent* contexts, the working set grows with the number of contexts.