Slashdot Mirror


Pentium IV Problems?

zottl writes: "German tech site computerchannel.de has an article about various problems concerning Intel's Pentium IV. It says that the new processors will draw lots of power (66 watts for the 1,4 GHz version), need special copper-core coolers, might need radiation shields for the socket pins for ECC compliance and will remain expensive for quite some time. It also says that the P4 will only get mass-market appeal with the introduction of the slimmed-down 0.13 micron version. Oh, and best of all, it seems to be slower for certain apps than a P3 of the Mhz. Seems like a repetition of the problems the P6 architecture had when the Pentium Pro was first introduced" Isn't this pretty much what they say about every generation of Intel chips when first released? Anyway, the article is in German, so you'll need to feed it to the fishy until translations crop up.

2 of 147 comments (clear)

  1. Moore's Law speaks only of transistors! by Mr+Z · · Score: 5

    Folks, get it right. Moore's Law simply states that the number of transisitors on a chip doubles every N months, where N = 24 in the first statement of the "law", and was revised shortly thereafter to N = 18 .

    Typically, performance scales with number of transistors, but that is not always true! There are three main reasons performance does go up roughly by the same ratio as the number of transistors:

    • Some of those transistors can be used for new functions. For example, additional functional units (such as the three-way issue pipeline on PentiumPro/PentiumII vs. the U-pipe and V-pipe on Pentium vs. the single-issue pipe on 486). This is a direct application of transistors to performance, but it only addresses computation-bottlenecked applications. Additionally, some of those transistors can be used to build wider pathways on the chip, leading to improved bandwidth to help bandwidth-starved applications.
    • Smaller transistors switch faster, and so can operate at higher clock rates. This has the dual effect of increasing the number of computations per second (again helping compute-bottlenecked applications), as well as increasing bandwidth--at least on the die. Going off-chip can still be a bottleneck. That brings us to the third bullet:
    • Smaller transistors can be used to build a bigger cache, so that the clock rate and on-chip bandwidth benefits can be used to greater effect.

    Sounds great, but what's bad?

    Well, one big thing that is not addressed by faster transistors is latency. As transistors get smaller and the wires that connect them get smaller, communication between transistors starts to become the true bottleneck. In the "Good Old Days", you could send a signal anywhere on the die in a single cycle, and you could treat a wire as an instantaneous link. In these smaller technologies, though, transport time for signals burns a significant portion of the time for any computation. This is why pipelines get deeper and deeper with each generation. Essentially, you can only make effective use of all of those transistors if you can minimize the amount of communcation between them, and that's what pipelining is all about. Unfortunately, this limits how much you can speed up many applications, especially general-purpose compute problems.

    Newer architectures address latency problems by exposing their pipeline (see EPIC or VLIW), or providing extensive resources for dealing with it. The Alpha CPUs, for instance, have an aggressive cache and reorder buffer that allow many pending cache misses to be services while non-dependent instructions are executed happily. (IIRC, the 21264 allows up to 4 hits under miss in the cache -- that is, you can have up to four misses outstanding and still take hits in the cache and allow instruction execution to proceed. I don't have Hennessey and Patterson handy to check though.) The reason this is even conceivable is that the Alpha provides a huge bank of architecturally-visible registers, and an even larger bank of rename registers for rescheduling code. Since compiled code spends most of its time moving data between registers, the architecture can easily determine which instructions are dependent on each other and very effectively hide the latency of the pipeline by reordering instructions and renaming registers.

    In contrast, the x86's highly bizarre and rather small register file create a huge bottleneck to reordering, since compiler ends up spilling many intermediate values to the stack or other memory locations. As a result, the CPU can't use register names to determine instruction dependencies as often, and so it cannot aggressively reorder instructions. As a result, it cannot hide the latency in the pipeline as effectively, and gets bitten with poor performance. All those transistors sit idle more often. (This, BTW, is why the Alpha can beat the Athlon on some apps, despite a 2x clock-speed advantage on the Athlon's part.)

    There are plenty of other reasons why x86 can't keep up performance-wise, but this is not the forum to discuss them. Just remember, x86 is keeping up with Moore's Law just fine. Don't expect its performance to keep scaling at the same rate.

    --Joe
    --
  2. Like the PPro? by be-fan · · Score: 5

    It seems that the pundits spend most of their time doubting Intel, while Intel becomes the de-facto standard with their new chips. Take the Pentium. Soon, everyone (AMD & Cyrix) moved to the super-scaler design. Intel added MMX, and AMD and Cyrix added 3DNow!. People originally thought that the PII would be a failure (it's slower than a PPro at the same clock-speed) but it became THE high-end standard for years. People thought that the Pentium wouldn't make it because it ran 486 optimized code slower than a 486. Instead, people just reoptimized their code. All these chips had quirks. Just like the P4 has quirks. However, the software industry will work around these quirks, just like they have for all the other Intel chips.

    --
    A deep unwavering belief is a sure sign you're missing something...