Slashdot Mirror


Intel Pentium 4 NetBurst Architecture Explained

fr0child writes "Next week is Intel's Developer Forum (IDF) and it seems they'll be releasing quite a bit of information (aka hype) about the Pentium 4. Anandtech seems to have gotten the scoop on Intel's NetBurst Architecture, basically covering the P4's internal architecture."

43 of 130 comments (clear)

  1. Re:HAHAHAHAHA by ZanshinWedge · · Score: 2

    P.S. You can also get the scoop over at sharky extreme.

  2. Questionable data. by Christopher+Thomas · · Score: 2

    I am tired of seeing Intel put out more and more vaporware. RDRAM, IA-64, etc, etc..

    You can buy RDRAM right now if you want to. Hardly vapour.

    Engineering prototypes of the IA-64 have been around for a while, with every indication that they will ship. Doesn't look very vaprous to me.

    IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha. (likely the kind that is 3x an Athlon or more for a P3)

    Firstly, GCC is not the best compiler in the world. When comparing an Alpha to an IA-64 chip, I'd use Intel's compiler on the IA-64 and Compaq's compiler on the Alpha. Both companies have a history of writing compilers that were extremely well optimized for their platforms.

    Secondly, I don't see much support for your figures. See my next point.

    Even a 2 year old alpha can beat most P3s (1.5 -2x P3 MHz = alpha MHz in performance)

    Not really. Alpha chips are about even in everything except floating point (where the Alpha blows *everyone* out of the water - Sun, HP, IBM, Motorola, etc).

    They do this with the higher speed grades of their chips that were released _recently_. Older chips used the same design but were clocked more slowly, and don't blow away present chips.

    Check http://www.spec.org for reasonably accurate benchmark information. They use the fairest system for evaluation that I've seen (standard test code supplied by SPEC, compilation and system tweaking handled by the companies owning the platforms being tested).

    As far as the performance of the Alpha or an Athlon vs. the P4 goes... The P4 is still in the final debugging stages. Wait six months and look for SPEC marks.

    Personally, I'd like to see SPEC marks for the G4. Apple has been allergic to SPEC of late.

  3. Re:"complete architectural overhaul"??? by VAXman · · Score: 2

    The author of that quote was clueless and should have said "microarchitecture". The architecture of Pentium 4 is very similar to Pentium III, but the microarchitecture is 100% different, and is a complete overhaul.

  4. Problems with longer pipelines, as in P4 by SpinyNorman · · Score: 5

    1) Pipeline stalls / operand latency:

    If the compiler and/or CPU is unable to reorder instructions effectively (or if a particular piece of code is not amenable to reordering), then an instruction in the pipeline may not have it's operands ready when it needs them and will stall the pipeline waiting for them. With a longer pipeline it will take more clock ticks for the necessarty operands to work their way thru the pipeline to clear the stall. Intel have added a double clock speed arithmetic unit (ALU) to the P4 to try to mitigate operand latency.

    2) Branch mispredict penalty:

    When a modern CPU such as the P4 encounters a branch instruction, it predicts whether the branch will be taken or not (by using the execution history) in order to be able to continue processing instructions through the pipeline. When the branch is finally evaluated near the end of the pipeline it may turn out that the prediction was wrong, and that all the instructions following the branch (now in the pipeline) should not ne executed. In this case the processor has to flush the pipeline and instead take the correct branch. This "pipeline flush" branch mispredict penalty is obviously higher the longer the pipeline is - a 20 stage pipeline means you are throwing away 20 instructions when a branch is mispredicted.

    P4 was designed with a long pipeline so that each pipeline step could be very simple/quick and therefore the processor could have a very high clock rate. The downside of doing this is the above two problems, which mean that the average number of instructions executed per clock cycle (IPC - aka processor efficiency) gets reduced.

    P4 at 1.4GHz may be faster than P3 at 1GHz, but because P4 will have a lower IPC than P3, it won't be as fast as a 1.4GHz P3 (if we ever see one) or 1.4GHz Athlon (which we will see).

    The one area where P4 should excel is in SSE2 optimized floating point math intensive applications, which is why Intel are now trying to reposition the P4 as an Internet/multimedia CPU rather than a general purpose one. The fallacy of this is that once you can decode your DivX in real-time, you don't need to go any faster!

    1. Re:Problems with longer pipelines, as in P4 by be-fan · · Score: 2

      I want to be able to decode 3 P0rn DVD's at the same time! One ALWAYS NEEDS MORE SPEED!

      --
      A deep unwavering belief is a sure sign you're missing something...
  5. Re:How big an impact from the bus architecture? by eagl · · Score: 2

    What you're missing is that the P4 is going to be a single-cpu part, so there's no reason to split up the bus. Even in a dual processor setup, each cpu isn't hitting the bus for it's full capacity anywhere close to 100% of the time unless it's running a loop or accessing memory that doesn't fit inside it's cache, in which case the software design is holding it back more than the system bus anyhow.

    I don't think that many users would ever notice the difference, and intel probably can't afford to design it's next consumer level chip around a few percent of the market.

  6. Re:hmmm by 1010011010 · · Score: 2

    Worse: cold fusion. Guess their server needs more deuterium, or fantasium, or something.

    ---- ----

    --
    Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
  7. SMP by cybaea · · Score: 3

    Hmm, it's there at the bottom of the page:

    Intel also informed us that the Pentium 4 would strictly be a uniprocessor part, meaning it won't even work in multiprocessor boards.

    So, yes, you are right: they don't support SMP so why would they split the bus?

    But I question your "intel probably can't afford to design it's next consumer level chip around a few percent of the market" comment.

    First of all, if Intel can't afford it, who can?

    But more to the point: Is it really only a few percent of the market? I've just ordered a dual PIII and I selected the chip specifically because I could get SMP support. Does anybody have any statistics on single- versus multiple CPU PIII systems shipped? Is it really only "a few percent"?

    --
    Hi!
    1. Re:SMP by be-fan · · Score: 2

      Probably less than a few percent. Consider the fact that almost all business machines are uni-proc, all consumer machines (except the DualG4) are uni-proc, the only SMP machines you have are (some) servers and workstation machines. The midrange-and-up (where you fund the bulk of multi-processor machines) and the server market are pretty tiny (in terms of machines shipped) than the consumer and business markets (I think the statistic is in the hundreds of millions of consumer machines shipped per year.)

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:SMP by be-fan · · Score: 2

      I mentioned server space. However, server space is a lot smaller. Think of this. A business may buy 100 desktops for a workgroup, and only two or three servers for that workgroup. Most of the time servers are expense, low volume items. At home, there are no servers, and the home market is huge. So in terms of percentage of total machines that are SMP, it is a very small percentage.

      --
      A deep unwavering belief is a sure sign you're missing something...
    3. Re:SMP by be-fan · · Score: 2

      Okay, I've looked up some of the statitics. First, even large server manufacters like Dell say that servers account for only about 14% of their sales. If you look at the sales of WindowsNT (which has 38% of the server market) IDC pegs it at about 2.1 million units last year. (The report is from july). Do that math, and that means about 5.1 million server OSs shipped last year. This is a pretty good indication of the number of servers shipped last year. Now, take into account that IDC says 112.5 million PCs were shipped last year, and the fact that not all servers are SMP machines, you can easily see that hte SMP market IS just a few percent of the computing market.

      --
      A deep unwavering belief is a sure sign you're missing something...
    4. Re:SMP by be-fan · · Score: 2

      The 14% isn't SMP, it's how much of Dell's sales are from servesr. I'm assuming that a large percentage of Dell's servers are SMP machines. Dell is a pretty big player in the server market, so I'm guessing that a larger percentage of their sales are from servers (and workstations) however, even for them, servers account for only 14% of sales. And I wasn't limiting myself to x86 machines. The 5.x million server operating systems include WindowsNT, Linux, and other UNIXs.

      --
      A deep unwavering belief is a sure sign you're missing something...
  8. Wow, a real RISC chip... ;) by Mike+Connell · · Score: 3

    From the CNET article:

    > The chip also comes with 144 new multimedia instructions for better graphics and sound.

    I'm weeping! I *know* that they're multimedia instructions and so on, and probably really useful, and that people aren't hand coding this stuff... but doesn't anyone else think this is ugly?

    Whatever happened to RISC?

    Mike.

    1. Re:Wow, a real RISC chip... ;) by YU+Nicks+NE+Way · · Score: 2

      SHR? It's easy: take the register, shift it right by one bit, and subtract the right-shifted version from the original...

      Oh. You meant SHift Right, not Shift-right is Halfway Recursive? Sorry.

    2. Re:Wow, a real RISC chip... ;) by luckykaa · · Score: 2

      Whatever happened to RISC?

      Intel happened. Its a shame really. The x86 architecture is all a bit of a mess. I don't think the 386 designers expected it to last until dual pipelined 1GHz chips.

    3. Re:Wow, a real RISC chip... ;) by warlock · · Score: 2

      The x86 IA was, is and will remain a CISC design.

      Besides, these are extensions for SIMD and perhaps even vector stuff. Even if it was a RISC design they'd have to add instructions for such radically new features.

    4. Re:Wow, a real RISC chip... ;) by fatphil · · Score: 2

      To have MMIs is not anti-RISC.
      RISC is more an instruction decoding/orthogonality issue than an instruction set richness issue. Many RISCs have far richer instructions than CISCs!

      I remember one (thanks to those large foreheaded types in Texas) which had _every_ possible bitwise logical operation available.
      So x86 gives you AND, OR, and XOR, big deal.
      This had NAND, NOR, IMP, NIMP, RIMP, NRIMP, ...

      Back onto
      If you have two spare bits in your opcode, then you could use those to implement
      00 = act as 64bit words
      01 = act as 2 32bit words
      10 = act as 4 16bit words
      11 = act as 8 8bit words
      et voila! every instruction can be turned into a "MMI" instruction!

      (OK, in reality you'd only need the arithmetic ones to have this feature.,
      e.g. with 3 bits
      0xx = MMI arithmetic instructions as above
      100 = logical operations
      100 = control flow
      101 = moves
      111 = something else

      Nothing non-RISC about this at all. (I assume all operations are (R1, R2) -> R3 type or similar).

      Of course, it's perfectly possible to throw orthogonality and symmetry out of the window and implement this as a complete dogs breakfast too! Intel would never do that I'm sure.

      FatPhil

      --
      Also FatPhil on SoylentNews, id 863
  9. Hmm? Server down? by arcade · · Score: 2

    I get a filter-errormessage when I try to access that page, and i'm not running any filter, so it seems their server has b0rked already.

    hmf, slashdotting is to powerfull. :)


    --

    --
    "Rune Kristian Viken" - http://www.nwo.no - arca
  10. CPU Rant by pantherace · · Score: 3


    I am tired of seeing Intel put out more and more vaporware. RDRAM, IA-64, etc, etc... I don't know of any other chip maker that puts out so much vapor. AMD's chips did what they were intended to do. DEC (compaq) Alphas haven't failed yet, (supposed to be 1.5GHz+ by the end of the year.)
    I am willing to bet that AMD will have a 64-bit arch out (mainstream) before Intel.
    IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha. (likely the kind that is 3x an Athlon or more for a P3)
    Even a 2 year old alpha can beat most P3s (1.5 -2x P3 MHz = alpha MHz in performance)
    Another thing 550 P-3 $159, 600 Duron $99 (or 109, can't remember exactly). Duron is not 2/3 a P3's performance. Is Intel too greedy? In SV, I talked to an Intel CAD engeneer and he said as long as it sold for a 24 or 26% profit Intel would make anything. I wonder what AMD's profit level is.

    btw anyone ever looked at Alpha vs Intel's touted FP performance? hint, Intel is in the dust.

    1. Re:CPU Rant by be-fan · · Score: 2

      AMD's profit margin is less than Intel's. If you've seen the kind of price decreases AMD has put up, you'll know why. You can get a 1GHz Thunderbird Athlon for around $650 on pricewatch.

      I think you've confused the term "vaporware." Vaporware is stuff that doesn't show up. It is said to come out, but never does. Given the fact that RDRAM is shipping, and the P4 is close to shipping, you can hardly term them "vaporware." Vaporware has a much narrower definition than most /.ers realize. The Nintendo 64 magnetic drive was vaporware (it was called the 64DD, is it just me, or does read like a cup size?) The Matsushita M2 (another propose console using dual PPC 603e chips) was vaporware.

      --
      A deep unwavering belief is a sure sign you're missing something...
  11. 3D on CPU? by Crazy+Man+on+Fire · · Score: 2
    In a preview of the chip at the company's headquarters, technicians showed how a Pentium 4 computer can rapidly render, or draw, 3D images downloaded from the Internet. That sort of processing power could make it easy for sellers on eBay to post virtual representations of their products, for example.
    Recently, in a preview in a dorm room in Vermont, technicians showed how a Pentium II (r) computer equiped with an inexpensive 3D graphics card can rapidly render, or draw, 3D images.

    duh. is it just me, or is this just a load of crap. with the incredible tech available right now in 3d video cards, which are getting better all the time and will probably hit the ceiling pretty soon, why would any home user want 3d on their cpu? for the extra cash it would cost to get this feature, i'd rather spend on a kick-ass 3d card. cut the crap with all this hardware bloat and just give us a fast reliable chip! oh, and a motherboard with a reasonably fast bus would be nice as well, but let's not get started on that one...
    1. Re:3D on CPU? by David+Greene · · Score: 2
      Of course I left out FP. Everyone knows x86 has a horrible FP implementation. It's not even worth comparing because it's so crippled.

      This does not invalidate what I stated. CISC does not require a braindead FP stack implementation.

      If it's just Intel's brute force keeping IA32 on top, then may I humbly suggest the other manufacturers hire away some of that brute?

      I'm not sure what other units you're referring to. Memory, perhaps? A lot of that has more to do with the PC platform than anything else -- consumer-level machines don't require huge levels of memory bandwidth and scaling.

      Keep in mind that SpecINT measures the entire processor performance runnning (mostly) integer applications. This includes memory and control flow.

      Could x86 run faster if it weren't burdened with a huge decoder? Probably. Branch mispredicts are a big problem.

      My point is that you can't judge a processor by its packaging (marketing). It's the guts and the bottom line that count. Speculations based on anecdotes and (in this case) dogma are next to useless.

      --

      --

  12. Re:More information here by ThrobbingGristle · · Score: 2

    In a preview of the chip at the company's headquarters, technicians showed how a Pentium 4 computer can rapidly render, or draw, 3D images downloaded from the Internet. That sort of processing power could make it easy for sellers on eBay to post virtual representations of their products, for example.

    Sweet! I've been waiting for features like that forever! Thanks Intel and thanks CNET! You guys rock!

  13. "complete architectural overhaul"??? by fatphil · · Score: 2

    CNET:
    "The chip [...] represents the first complete architectural overhaul of the company's processor line since 1995, when the original Pentium emerged."

    Erm. I've programmed for z80, 68K, Arm, C80-MP, H8, PPC, Axp, Sparc, HP-PA and the ubiquitous x86 (all varieties).

    If the Pentium is a "complete architectural overhaul", then what the blazes does one call the Vax->Axp change, or the 68K->PPC change, or the C80->C6000 change?

    Some people live in very sheltered worlds, evidently.

    FatPhil

    --
    Also FatPhil on SoylentNews, id 863
    1. Re:"complete architectural overhaul"??? by David+Greene · · Score: 2
      Ahem.

      P6, anyone? I would think O-O-O execution would be a pretty major overhaul.

      --

      --

    2. Re:"complete architectural overhaul"??? by FascDot+Killed+My+Pr · · Score: 2

      Here's your quote: "The chip [...] represents the first complete architectural overhaul of the company's processor line since 1995, when the original Pentium emerged."

      (my emphasis)

      Assuming "the company" referred to is Intel, then this statement is completely true.
      --

      --
      Linux MAPI Server!
      http://www.openone.com/software/MailOne/
      (Exchange Migration HOWTO coming soon)
  14. Re:How big an impact from the bus architecture? by be-fan · · Score: 2

    Considering the P4 only does a single proc config...

    --
    A deep unwavering belief is a sure sign you're missing something...
  15. NetBurst? by be-fan · · Score: 3

    Is it just me, or is the name not necessarily just superflous?

    1) The P4 has very long pipes.
    2) The P4 has small caches.
    3) The P4 has huge bus bandwidth.
    4) The regular FPU has been largely depreciated in favor of SSE2.

    What does all this add up to? A chip to accelerate 3D. This feature list reads largely like the list of the Playstation 2. (Aside from the long pipelines thing.) You've got the small caches, high bandwidth, and the vector pipes. My guess is that Intel, seeing NVIDIA cramming more and more into the GPU, is trying to come back and troughly blow them out of the water. This chip might process slower per clock for many uses, but the high clock makes up for that. On the otherhand, things that are extermely regular without any branches (ahem, 3D geometry processing) will absolutely fly through this thing.

    --
    A deep unwavering belief is a sure sign you're missing something...
  16. Maybe it's a Slashdot Effect shield ? by dingbat_hp · · Score: 2

    How about a filter that tracks referrals from Slashdot and bounces them beyond a certain load level ?

    I've no idea if that's what happening, but it's something I'd want to have on hand, if I ran a site like Anand that was regularly whacked with Slashdot's million typoing monkeys.

  17. sounds like a math co processor to me... by Da+w00t · · Score: 2

    so it'll be as much faster as when I put the 387 into my 386?

    da w00t.

    --

    da w00t. mtfnpy?
  18. I got me some questions about this here CPU by Meenky · · Score: 2
    The Internet is going from a text kind of thing to something more visual
    I don't know about the rest of you but i don't have enough bandwidth for the text based internet as it is.(It really suck living at the end of a copper line, Max = 26.4 kbps)

    In addition, the Pentium 4 will contain a 20-stage pipeline. The pipeline is a processor's assembly line. While this means the Pentium 4 will have a line twice the length of the 10-stage Pentium III, the longer pipeline will create room for speeding up the chip.
    Could someone explain to me how having a longer pipeline speeds things up? this seems kinda counter intuative to me. Guess its like the pipelines in the 3D GPUs, but i don't see how that would work in a general purpose CPU.

    It will contain 42 million transistors, compared with 28 million for the Pentium III.
    Even with a smaller feature size won't this create a lot of heat, especially running at 1.4Ghz? IANAExpert but since PIII's run at 90C can we expect this CPU to run ultra hot as well?

    Those who will not reason, are bigots, those who cannot, are fools, and those who dare not, are slaves. --George Gordon Noel Byron (1788-1824), [Lord Byron]

    1. Re:I got me some questions about this here CPU by Pulzar · · Score: 4

      Could someone explain to me how having a longer pipeline speeds things up? this seems kinda counter intuative to me. Guess its like the pipelines in the 3D GPUs, but i don't see how that would work in a general purpose CPU.

      The longer the pipeline is, the smaller each stage (of the pipeline) is. The smaller the stages are, the higher the frequencey you can run them on is. If you cut each of the existing stages exactly down the middle, you could run your CPU on twice the frequency, without making any other changes! (Of course, you can never cut a stage exactly in half, so you'll never reach 2x increase).

      Why don't we make 10,000-stage pipelines, then, you might ask :). In the ideal world, a completed instruction "comes out" of the pipeline at each clock cycle, so with 2x frequency, your cpu is twice as fast. The problem is, with a huge pipeline, you increase the chance that the instruction will "stall" along the way, and you'll get less than 1 instruction (on average) coming out on each clock cycle (the "IPC" thing the article talks about). If you add enough stalls to your pipeline, your might effectively decrease your CPU's performance.

      --
      Never underestimate the bandwidth of a 747 filled with CD-ROMs.
  19. Re:More information here by Emil+Brink · · Score: 2

    Um, I don't know if you're serious, but stuff like this works perfectly fine on earlier CPUs too, of course. ;^) I probably shouldn't be doing this, but Cycore have some neat tech for doing this. According to their download page, the plugin for their technology (Cult 3D) is available for Linux as well as the Other OS...

    --
    main(O){10<putchar(4^--O?77-(15&5128 >>4*O):10)&&main(2+O);}
  20. Re:Ugh by Junks+Jerzey · · Score: 2

    Another processor from intel? Now damn't I just
    gave them a bunch of cash for the PIII, just
    like I did the PII, and just like the Pentium and
    the Pro version.

    I didn't really notice a big jump in performance
    on the last buy, but what can I do...it is intel.


    Why is this so difficult for people to understand? Unless you are doing something really hardcore, like lots of video work or heavy numerical analysis, you're not going to notice any performance benefit. Additionally, we've reached the point where rethinking or rewriting can pay off much, much more than incremental processor speed upgrades. For example, Borland's Object Pascal compiles 10-100x faster than gcc. If you use it, then you're getting an order of magnitude increase. Compare that to the benefit gained by going from a 400MHz Pentium II to 1GHz Pentium III (less than 3x).

  21. This is bad by Animats · · Score: 2
    The P4 is worse than I'd expected.
    • All that architectural redesign, and it does less per clock. Ouch. Was it worth that much to get bragging rights for a higher clock rate? Remember the DEC Alpha? Huge clock rates, but not much got done per clock. Same idea.
    • It doesn't support multiprocessing. That cuts out the high-end market.
    • It's RAMBUS-only. That cuts out the low-end market.
    • The "Netburst" name is really stupid.

    On the other hand, note that Merced/Itanium/IA64/whatever seems to have gone away. The Register now points out that IA32 is 2x faster than IA64. Oops. Probably just as well; Merced was hell to program. VLIW architectures require miracles from the compiler.

    Besides, even the 1GHz PIII is mostly vaporware. Try to get one. Yes, they exist, but there aren't many of them.

  22. No-hype article at The Register by cybaea · · Score: 3

    The Register has a nice anti-hype article about the P4.

    My favourite is

    There are two key words and phrases you, our readers must note. First of all, the Pentium 4 marchitecture is now to be described as Netburst, and the second phrase is that this architecture should be described as the repeated engineer execution (REE). We know what REE stands for but we prefer our version.
    --
    Hi!
  23. Re:Hmm? Server down? by stx23 · · Score: 3

    Were you actually planning on reading the article before speculating wildly?
    You must be new round these parts...

  24. Re:Hmm? Server down? by arcade · · Score: 2

    I always read the articles first - don't you? :)
    --

    --
    "Rune Kristian Viken" - http://www.nwo.no - arca
  25. hmmm by karmma · · Score: 2
    Access denied to system because of URL Filter Configuration, while attempting to retrieve the URL: http://www.anandtech.com/showdoc.html?i=1301.

    Hmmm... I wonder if their webserver is running on a Pentium 4?

  26. Heat by cybaea · · Score: 2

    The article says:

    The 432-pin Pentium 4 should dissipate around 52W of heat when operating at launch speeds; this puts it below that of the 1GHz Thunderbird that is currently available.
    --
    Hi!
  27. hURLing by GavK · · Score: 2

    Maybe they're still working on a click-through NDA...

    --

    Gav

    "There's no such thing as data that can't be manipulated"

  28. How big an impact from the bus architecture? by cybaea · · Score: 3

    From the article:

    The P4's bus, unlike the Athlon's EV6, isn't a Point-to-Point bus, meaning that all CPUs must share the same 3.2GB/s of available system bandwidth. With a Point-to-Point bus, although it's more complicated to implement, each CPU in a multiprocessor environment gets its own connection to the North Bridge ...

    IANACD (I am not a chip designer), but this seems to me like a major disadvantage compared with the Athlon. Am I missing something obvious?

    --
    Hi!
  29. More information here by Jon+Erikson · · Score: 3

    Try this link at CNET for more information.

    ---
    Jon E. Erikson

    --

    Jon Erikson, IT guru