Intel's Big Chip

Nice review. by Psmylie · 2002-02-04 07:51 · Score: 2, Informative

Another analyst said, "Jesus, that's big."

Straight and to the point. Nice.

--

psmylie's dictionary: Godzillion (noun) Any number large enough to destroy Tokyo

Note, this is the *DIE* size by jaxdahl · 2002-02-04 07:55 · Score: 4, Informative

The Athlon chips i have are around 2-2.5 inches on a side, however, the die in the middle is quite small, i'd estimate it it be 200-250 square mm, so a 400+ square millimeter is huge, compared to that.

Anyone have any exact numbers for the chips? I didn't get a ruler out to measure it.

Die Photo and Size by rbeattie · 2002-02-04 08:01 · Score: 5, Informative

Ace's Hardware has this bit with more information including links to an Intel presentation.

"Slide 22 of the presentation features a die photo of McKinley. The large 3 MB L3 cache is notable, and according to the presentation, it consumes 20% less area than traditional designs and is overall 85% efficient (~70% for traditional designs)."

And here's a story with the photo from that same article (no need to download 2.5 meg pdf...)

-Russ

--
Me

Re:big chip... big fan by Sebastopol · 2002-02-04 08:06 · Score: 5, Informative

Wouldn't a larger surface area allow for better cooling? Isn't that the whole principle of a heatsink in the first place?

If the die uniformly heats, then yes, this is true. But that's not always the case. The latest P3's are so low power that you just need a heatsink or fan-sink, depending on frequency. The first P4s had a head spreader that sat on the back of the die and connected to the fansink.

Plus heat in a die goes up/down easier then left/right because the thermal conductivity of the heatsink is much better than that of silicon, and is closer than the edge of the die. If you've got local hot spots on the die, a bigger die doesn't by you anything. The thermal properties and requirements of the heatsink are driven more by local heat density than by overall heat.

Tom Pabst had a good discussion about this a while ago, but I can't remember the article's URL.

--
https://www.accountkiller.com/removal-requested

Re:It's how you use it by Anonymous Coward · 2002-02-04 08:18 · Score: 2, Informative

That's not what set associativity means, although you're quite correct that 16Kb of 4-way sa cache will usually be better than 64Kb of direct mapped cache.

If you actually want to know what you're talking about, I'd suggest reading "Computer Organization and Design : The Hardware/Software Interface" by Patterson and Hennessy.

Re:CmdrtTaco is Terrorist!!! by Anonymous Coward · 2002-02-04 08:27 · Score: 0, Informative

Don't you get it? page widening posts are making people NOT want to read at -1....not only is it as dumb as a fucking turd, but it also has adverse affects on trolls....fucking idiot

Re:big chip... big fan by haruharaharu · 2002-02-04 08:30 · Score: 3, Informative

Intel can now demo a 5GHz chip using the .13 micros process that can run at room temperature.

Big deal. It has 12 instructions, is ~2mm^2, and consumes 267mW. This looks more like research than something that you would use for real work.

--
Reboot macht Frei.

Amd competition. more numbers. by leuk_he · 2002-02-04 08:32 · Score: 5, Informative

Now that you mention AMD. It has been roumoured last week all over the net that intel has a backup plan, an P4 with 64bit extenstions

os.opinion article
news.com

by the way, the amd hammer is expected to 105 mmm^2 on 130 nanometer (.13).

the current amd MP (palomino) has a die size of 129mm on .18.

the original P4 has a die size of 217mm and is now at 150 mm^2.(with a bigger cache)

Note that the original article does mention the 424 size is on .18 and the next generation is on .13. note that this can make a differce of a factor 2 (13^2/18^2= 0.52)

Re:Itanium at 1.6 GHz in 2003 ? by Utopia · 2002-02-04 08:39 · Score: 2, Informative

According to SPEC CPU2000 Results, The Itanium at 800 Mhz performances equivalent to a Pentium III 800 Mhz. In fact the Pentium III is a little faster.

Do you have any other figures to substantiate your claim ?

AMD Athlon by Vagrant · 2002-02-04 08:48 · Score: 2, Informative

184 square mm die size (prior to Athlon 800)

102 square mm die size (Athlon 800) ... source

Note that this article also states that: Intel has also incorporated a substantial amount of redundant circuitry in the processor, Krewell said. Chipmakers often use redundant circuitry to boost yields. Sometimes, circuits come out scrambled on a finished chip. If the manufacturer has put in two sets of the same circuits, the chip will function properly because it can use the second set.

You could have a dual Pentium machine and not even know it :)

I guess this redundancy is why the chip has gone up 10% in size in the last couple of months ... (see this article) which quotes: One of the reasons for McKinley's bigger price tag, Krewell said, is that it will cover nearly 440 square millimeters in area--or more than twice that of the Pentium 4.

Re:Itanium at 1.6 GHz in 2003 ? by roca · 2002-02-04 09:20 · Score: 3, Informative

> There's nothing on any recent Intel roadmaps that
> will have Itanic replacing x86 on the desktop.

Which is really going to hurt them. The latest version of Everquest recommends 512MB of RAM. High-end gamers are going to need 64-bit addressing in a few years. AMD will be able to supply cheap 64-bit chips, Intel will be playing catch-up at best.

Re:Nothing new here - take a look at the hp-pa 880 by roca · 2002-02-04 09:31 · Score: 3, Informative

The question with an L1 cache of that size is how many cycles it takes to access the cache. It's easy to make a huge L1 cache, you just pay in increased access time. It's not impressive until we know the latency numbers as well as the size.

NOT not wow! by plover · 2002-02-04 09:43 · Score: 3, Informative

You're absolutely correct in that a substantially larger die will result in substantially lower yields (excepting any magical breakthroughs in chip fabrication, which are always possible.)

But there are segments of today's market that are willing to pay almost any price for a high-performance chip. These people will fork over a $1000 without blinking an eye if they think it will speed up their business.

Look at any commercial server available today. They're priced around $15000 - $20000. If chip prices go to $1000 instead of the $400 they're probably paying, that makes a difference of $2400, or about 12%, in a 4 way box. Even if chip prices went to $2000, it's a $5600 difference, or a 28% difference. If your processors are your bottleneck, then you've gained a lot of improvement for not-very-much delta in money.

Sure, a $2000 chip is out of reach for most home users today, but there is always a market for just about anything faster they can produce.

And there are enough crazed overclockers out there that'll spend whatever it takes to raise their frame rates on Quake III. It'll sell. It'll also drive the market to a new standard, which also sells chips.

--
John

Re:Wow! by fitten · 2002-02-04 10:51 · Score: 2, Informative

This is correct. The way your software uses the cache(s) is what determines the performance. For instance, a naively written app may have horrible cache behavior even on large caches. Some study of the underlying algorithms and adjustment to the cache access patterns can speed up the app manyfold.

Re:Nothing new here - take a look at the hp-pa 880 by Saidin · 2002-02-04 11:25 · Score: 2, Informative

The latency is no secret. It is a 2 cycle latency cache. Pseudo 2-way set associative (you can load from an even and an odd row at the same time, but not 2 even or 2 odd)

Re:64 bit regs is new? by morcheeba · 2002-02-04 12:26 · Score: 3, Informative

A quick summary of the Saturn microprocessor, for those interested...

The Saturn processor is a propietary HP chip used in many of its calculators. It's generally considered a 4 bit chip (since this is the internal data bus size), but it has four 64-bit registers. I think the coolest part of the chip is that each instruction can operate on various portions of these registers -- for example, only the upper nibble, or only the lowest 4 nibbles. Since this is a calculator, math is generally done in BCD format. Externally, the chip connects using an 8-bit data bus. The address bus width (and therefore the PC, too) is 20 bits wide, and each address refers to a nibble of data. Maximum addressable memory = 1 meganibbles = 512KB. Most of the calculator firmware (such as calculating the sine of a number or matrix manipulation) is interpreted RPL to allow code reuse code (to save time, and to ensure bug-free implementations)

HP did a great job with this calculator, including releasing internal documenation and development tools. More info here, or use google.

It's a shame that HP shut down thier calculator division.

--
HIV Crosses Species Barrier... into Muppets

Thanks! Where would we be without clarifications? by megalomang · 2002-02-04 13:21 · Score: 3, Informative

Thanks for your "clarifications". You have saved us all from a life of ignorance.

What you meant to say (and what the article said), is that 464mm^2 is size of the actual die size of the processor This includes the CPU and the caches. The CPU is a relatively small portion of the processor die, and noting there is 3MB of L3, the total cache may amount to 2/3 of the die size. The square on top of the athlon is also the entire processor die: cpu, caches and all.

Also, L3 cache can never perform "equivalently" to L2 or L1 cache unless it runs at core speed. And I can tell you now, it doesn't -- or they wouldn't need L1 and L2. The L3 cache probably runs at something like 10 access cycles or more. It's not difficult to engineer 10 access cycles into any pipeline -- it's impossible. Which is precisely why it's not L1.

I'm quite sure the engineers at Intel have done their modeling homework and determined that however fast the L4 memory may be, the L3 will improve performance by that much more.

Remember, this processor is not meant to go on you or any other Joe Sixpack's desktop. It is meant to sit inside the workstations on the desks of engineers and in the racks of high-bandwidth servers. These platforms are specifically designed to run hundreds of tasks simultaneously and handle staggeringly high memory bandwidths. It has nothing to do with "complicated instructions." The L3 exists for swapping out large pages of memory in large bursts from a significantly larger sized L4 memory (think on the order of 100's of GB) from L5 memory (local drives and SANs) that has an incomprehendable virtual memory space.

This has absolutely nothing to do with mainstream. I'm quite certain an OS already exists that will run on the platform. An IA-64 Linux is well under way (try http://www.linuxia64.org) and you can bet that Compaq, HP, Dell, and Intel have put a total of more than 100x your lifetime earnings into developing software for that platform.

Intel could not care less whether you or 99.9% of the /. readers out there ever buy an IA-64. They don't give a crap about your market segment, but I'm sure if you want to drop $10K+ on a IA64 workstation, be my guest. Your choices are limited. Either choose IA64 or UltraSparc. Or maybe if AMD ever gets a design win, you might get a chance to buy a Hammer box.

Cache Design 101. by Christopher+Thomas · 2002-02-04 17:37 · Score: 3, Informative

Next, 3 MB cache sounds nice, but L3? It may be on die, but by that point the clock reduction probably makes it perform equivalently to a 256 k L1 cache, or a 512 or larger L2.

A fundamental rule of building caches is that a larger cache is slower and dissipates more energy per lookup than a smaller cache. This is why multilevel cacheing exists in the first place (otherwise we'd just have a huge L1 cache - and before you mention it, due to architectural sneakiness, HP's giant L1 cache isn't really an L1 cache).

So, you can't just spend the L3 area on making a bigger L2. You'd end up with a slower, hotter L2, which could easily _degrade_ performance.

As long as the L3 cache is faster to access than main memory, it'll be useful for some things. Whether it'll be useful for *most* things is another issue. This depends on the "working set" of the applications you're running (how much memory they repeatedly access). I guess Intel's banking on working sets being larger than most caches.

Another possibility is that they're testing the cache architecture for use in future SMT or CMP designs (both of which would have multiple independent executioin contexts running). If you're running multiple *independent* contexts, the working set grows with the number of contexts.

18 of 282 comments (clear)