Slashdot Mirror


Intel Pentium 4 NetBurst Architecture Explained

fr0child writes "Next week is Intel's Developer Forum (IDF) and it seems they'll be releasing quite a bit of information (aka hype) about the Pentium 4. Anandtech seems to have gotten the scoop on Intel's NetBurst Architecture, basically covering the P4's internal architecture."

10 of 130 comments (clear)

  1. Problems with longer pipelines, as in P4 by SpinyNorman · · Score: 5

    1) Pipeline stalls / operand latency:

    If the compiler and/or CPU is unable to reorder instructions effectively (or if a particular piece of code is not amenable to reordering), then an instruction in the pipeline may not have it's operands ready when it needs them and will stall the pipeline waiting for them. With a longer pipeline it will take more clock ticks for the necessarty operands to work their way thru the pipeline to clear the stall. Intel have added a double clock speed arithmetic unit (ALU) to the P4 to try to mitigate operand latency.

    2) Branch mispredict penalty:

    When a modern CPU such as the P4 encounters a branch instruction, it predicts whether the branch will be taken or not (by using the execution history) in order to be able to continue processing instructions through the pipeline. When the branch is finally evaluated near the end of the pipeline it may turn out that the prediction was wrong, and that all the instructions following the branch (now in the pipeline) should not ne executed. In this case the processor has to flush the pipeline and instead take the correct branch. This "pipeline flush" branch mispredict penalty is obviously higher the longer the pipeline is - a 20 stage pipeline means you are throwing away 20 instructions when a branch is mispredicted.

    P4 was designed with a long pipeline so that each pipeline step could be very simple/quick and therefore the processor could have a very high clock rate. The downside of doing this is the above two problems, which mean that the average number of instructions executed per clock cycle (IPC - aka processor efficiency) gets reduced.

    P4 at 1.4GHz may be faster than P3 at 1GHz, but because P4 will have a lower IPC than P3, it won't be as fast as a 1.4GHz P3 (if we ever see one) or 1.4GHz Athlon (which we will see).

    The one area where P4 should excel is in SSE2 optimized floating point math intensive applications, which is why Intel are now trying to reposition the P4 as an Internet/multimedia CPU rather than a general purpose one. The fallacy of this is that once you can decode your DivX in real-time, you don't need to go any faster!

  2. SMP by cybaea · · Score: 3

    Hmm, it's there at the bottom of the page:

    Intel also informed us that the Pentium 4 would strictly be a uniprocessor part, meaning it won't even work in multiprocessor boards.

    So, yes, you are right: they don't support SMP so why would they split the bus?

    But I question your "intel probably can't afford to design it's next consumer level chip around a few percent of the market" comment.

    First of all, if Intel can't afford it, who can?

    But more to the point: Is it really only a few percent of the market? I've just ordered a dual PIII and I selected the chip specifically because I could get SMP support. Does anybody have any statistics on single- versus multiple CPU PIII systems shipped? Is it really only "a few percent"?

    --
    Hi!
  3. Wow, a real RISC chip... ;) by Mike+Connell · · Score: 3

    From the CNET article:

    > The chip also comes with 144 new multimedia instructions for better graphics and sound.

    I'm weeping! I *know* that they're multimedia instructions and so on, and probably really useful, and that people aren't hand coding this stuff... but doesn't anyone else think this is ugly?

    Whatever happened to RISC?

    Mike.

  4. CPU Rant by pantherace · · Score: 3


    I am tired of seeing Intel put out more and more vaporware. RDRAM, IA-64, etc, etc... I don't know of any other chip maker that puts out so much vapor. AMD's chips did what they were intended to do. DEC (compaq) Alphas haven't failed yet, (supposed to be 1.5GHz+ by the end of the year.)
    I am willing to bet that AMD will have a 64-bit arch out (mainstream) before Intel.
    IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha. (likely the kind that is 3x an Athlon or more for a P3)
    Even a 2 year old alpha can beat most P3s (1.5 -2x P3 MHz = alpha MHz in performance)
    Another thing 550 P-3 $159, 600 Duron $99 (or 109, can't remember exactly). Duron is not 2/3 a P3's performance. Is Intel too greedy? In SV, I talked to an Intel CAD engeneer and he said as long as it sold for a 24 or 26% profit Intel would make anything. I wonder what AMD's profit level is.

    btw anyone ever looked at Alpha vs Intel's touted FP performance? hint, Intel is in the dust.

  5. NetBurst? by be-fan · · Score: 3

    Is it just me, or is the name not necessarily just superflous?

    1) The P4 has very long pipes.
    2) The P4 has small caches.
    3) The P4 has huge bus bandwidth.
    4) The regular FPU has been largely depreciated in favor of SSE2.

    What does all this add up to? A chip to accelerate 3D. This feature list reads largely like the list of the Playstation 2. (Aside from the long pipelines thing.) You've got the small caches, high bandwidth, and the vector pipes. My guess is that Intel, seeing NVIDIA cramming more and more into the GPU, is trying to come back and troughly blow them out of the water. This chip might process slower per clock for many uses, but the high clock makes up for that. On the otherhand, things that are extermely regular without any branches (ahem, 3D geometry processing) will absolutely fly through this thing.

    --
    A deep unwavering belief is a sure sign you're missing something...
  6. Re:I got me some questions about this here CPU by Pulzar · · Score: 4

    Could someone explain to me how having a longer pipeline speeds things up? this seems kinda counter intuative to me. Guess its like the pipelines in the 3D GPUs, but i don't see how that would work in a general purpose CPU.

    The longer the pipeline is, the smaller each stage (of the pipeline) is. The smaller the stages are, the higher the frequencey you can run them on is. If you cut each of the existing stages exactly down the middle, you could run your CPU on twice the frequency, without making any other changes! (Of course, you can never cut a stage exactly in half, so you'll never reach 2x increase).

    Why don't we make 10,000-stage pipelines, then, you might ask :). In the ideal world, a completed instruction "comes out" of the pipeline at each clock cycle, so with 2x frequency, your cpu is twice as fast. The problem is, with a huge pipeline, you increase the chance that the instruction will "stall" along the way, and you'll get less than 1 instruction (on average) coming out on each clock cycle (the "IPC" thing the article talks about). If you add enough stalls to your pipeline, your might effectively decrease your CPU's performance.

    --
    Never underestimate the bandwidth of a 747 filled with CD-ROMs.
  7. No-hype article at The Register by cybaea · · Score: 3

    The Register has a nice anti-hype article about the P4.

    My favourite is

    There are two key words and phrases you, our readers must note. First of all, the Pentium 4 marchitecture is now to be described as Netburst, and the second phrase is that this architecture should be described as the repeated engineer execution (REE). We know what REE stands for but we prefer our version.
    --
    Hi!
  8. Re:Hmm? Server down? by stx23 · · Score: 3

    Were you actually planning on reading the article before speculating wildly?
    You must be new round these parts...

  9. How big an impact from the bus architecture? by cybaea · · Score: 3

    From the article:

    The P4's bus, unlike the Athlon's EV6, isn't a Point-to-Point bus, meaning that all CPUs must share the same 3.2GB/s of available system bandwidth. With a Point-to-Point bus, although it's more complicated to implement, each CPU in a multiprocessor environment gets its own connection to the North Bridge ...

    IANACD (I am not a chip designer), but this seems to me like a major disadvantage compared with the Athlon. Am I missing something obvious?

    --
    Hi!
  10. More information here by Jon+Erikson · · Score: 3

    Try this link at CNET for more information.

    ---
    Jon E. Erikson

    --

    Jon Erikson, IT guru