Intel Pentium 4 NetBurst Architecture Explained
fr0child writes "Next week is Intel's Developer Forum (IDF) and it seems they'll be releasing quite a bit of information (aka hype) about the Pentium 4. Anandtech seems to have gotten the scoop on Intel's NetBurst Architecture, basically covering the P4's internal architecture."
P.S. You can also get the scoop over at sharky extreme.
I am tired of seeing Intel put out more and more vaporware. RDRAM, IA-64, etc, etc..
You can buy RDRAM right now if you want to. Hardly vapour.
Engineering prototypes of the IA-64 have been around for a while, with every indication that they will ship. Doesn't look very vaprous to me.
IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha. (likely the kind that is 3x an Athlon or more for a P3)
Firstly, GCC is not the best compiler in the world. When comparing an Alpha to an IA-64 chip, I'd use Intel's compiler on the IA-64 and Compaq's compiler on the Alpha. Both companies have a history of writing compilers that were extremely well optimized for their platforms.
Secondly, I don't see much support for your figures. See my next point.
Even a 2 year old alpha can beat most P3s (1.5 -2x P3 MHz = alpha MHz in performance)
Not really. Alpha chips are about even in everything except floating point (where the Alpha blows *everyone* out of the water - Sun, HP, IBM, Motorola, etc).
They do this with the higher speed grades of their chips that were released _recently_. Older chips used the same design but were clocked more slowly, and don't blow away present chips.
Check http://www.spec.org for reasonably accurate benchmark information. They use the fairest system for evaluation that I've seen (standard test code supplied by SPEC, compilation and system tweaking handled by the companies owning the platforms being tested).
As far as the performance of the Alpha or an Athlon vs. the P4 goes... The P4 is still in the final debugging stages. Wait six months and look for SPEC marks.
Personally, I'd like to see SPEC marks for the G4. Apple has been allergic to SPEC of late.
The author of that quote was clueless and should have said "microarchitecture". The architecture of Pentium 4 is very similar to Pentium III, but the microarchitecture is 100% different, and is a complete overhaul.
1) Pipeline stalls / operand latency:
If the compiler and/or CPU is unable to reorder instructions effectively (or if a particular piece of code is not amenable to reordering), then an instruction in the pipeline may not have it's operands ready when it needs them and will stall the pipeline waiting for them. With a longer pipeline it will take more clock ticks for the necessarty operands to work their way thru the pipeline to clear the stall. Intel have added a double clock speed arithmetic unit (ALU) to the P4 to try to mitigate operand latency.
2) Branch mispredict penalty:
When a modern CPU such as the P4 encounters a branch instruction, it predicts whether the branch will be taken or not (by using the execution history) in order to be able to continue processing instructions through the pipeline. When the branch is finally evaluated near the end of the pipeline it may turn out that the prediction was wrong, and that all the instructions following the branch (now in the pipeline) should not ne executed. In this case the processor has to flush the pipeline and instead take the correct branch. This "pipeline flush" branch mispredict penalty is obviously higher the longer the pipeline is - a 20 stage pipeline means you are throwing away 20 instructions when a branch is mispredicted.
P4 was designed with a long pipeline so that each pipeline step could be very simple/quick and therefore the processor could have a very high clock rate. The downside of doing this is the above two problems, which mean that the average number of instructions executed per clock cycle (IPC - aka processor efficiency) gets reduced.
P4 at 1.4GHz may be faster than P3 at 1GHz, but because P4 will have a lower IPC than P3, it won't be as fast as a 1.4GHz P3 (if we ever see one) or 1.4GHz Athlon (which we will see).
The one area where P4 should excel is in SSE2 optimized floating point math intensive applications, which is why Intel are now trying to reposition the P4 as an Internet/multimedia CPU rather than a general purpose one. The fallacy of this is that once you can decode your DivX in real-time, you don't need to go any faster!
What you're missing is that the P4 is going to be a single-cpu part, so there's no reason to split up the bus. Even in a dual processor setup, each cpu isn't hitting the bus for it's full capacity anywhere close to 100% of the time unless it's running a loop or accessing memory that doesn't fit inside it's cache, in which case the software design is holding it back more than the system bus anyhow.
I don't think that many users would ever notice the difference, and intel probably can't afford to design it's next consumer level chip around a few percent of the market.
Worse: cold fusion. Guess their server needs more deuterium, or fantasium, or something.
---- ----
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
Hmm, it's there at the bottom of the page:
So, yes, you are right: they don't support SMP so why would they split the bus?
But I question your "intel probably can't afford to design it's next consumer level chip around a few percent of the market" comment.
First of all, if Intel can't afford it, who can?
But more to the point: Is it really only a few percent of the market? I've just ordered a dual PIII and I selected the chip specifically because I could get SMP support. Does anybody have any statistics on single- versus multiple CPU PIII systems shipped? Is it really only "a few percent"?
Hi!
From the CNET article:
> The chip also comes with 144 new multimedia instructions for better graphics and sound.
I'm weeping! I *know* that they're multimedia instructions and so on, and probably really useful, and that people aren't hand coding this stuff... but doesn't anyone else think this is ugly?
Whatever happened to RISC?
Mike.
Tales from behind the Lagom Curtain
I get a filter-errormessage when I try to access that page, and i'm not running any filter, so it seems their server has b0rked already.
:)
hmf, slashdotting is to powerfull.
--
"Rune Kristian Viken" - http://www.nwo.no - arca
I am tired of seeing Intel put out more and more vaporware. RDRAM, IA-64, etc, etc... I don't know of any other chip maker that puts out so much vapor. AMD's chips did what they were intended to do. DEC (compaq) Alphas haven't failed yet, (supposed to be 1.5GHz+ by the end of the year.)
I am willing to bet that AMD will have a 64-bit arch out (mainstream) before Intel.
IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha. (likely the kind that is 3x an Athlon or more for a P3)
Even a 2 year old alpha can beat most P3s (1.5 -2x P3 MHz = alpha MHz in performance)
Another thing 550 P-3 $159, 600 Duron $99 (or 109, can't remember exactly). Duron is not 2/3 a P3's performance. Is Intel too greedy? In SV, I talked to an Intel CAD engeneer and he said as long as it sold for a 24 or 26% profit Intel would make anything. I wonder what AMD's profit level is.
btw anyone ever looked at Alpha vs Intel's touted FP performance? hint, Intel is in the dust.
duh. is it just me, or is this just a load of crap. with the incredible tech available right now in 3d video cards, which are getting better all the time and will probably hit the ceiling pretty soon, why would any home user want 3d on their cpu? for the extra cash it would cost to get this feature, i'd rather spend on a kick-ass 3d card. cut the crap with all this hardware bloat and just give us a fast reliable chip! oh, and a motherboard with a reasonably fast bus would be nice as well, but let's not get started on that one...
Sweet! I've been waiting for features like that forever! Thanks Intel and thanks CNET! You guys rock!
CNET:
"The chip [...] represents the first complete architectural overhaul of the company's processor line since 1995, when the original Pentium emerged."
Erm. I've programmed for z80, 68K, Arm, C80-MP, H8, PPC, Axp, Sparc, HP-PA and the ubiquitous x86 (all varieties).
If the Pentium is a "complete architectural overhaul", then what the blazes does one call the Vax->Axp change, or the 68K->PPC change, or the C80->C6000 change?
Some people live in very sheltered worlds, evidently.
FatPhil
Also FatPhil on SoylentNews, id 863
Considering the P4 only does a single proc config...
A deep unwavering belief is a sure sign you're missing something...
Is it just me, or is the name not necessarily just superflous?
1) The P4 has very long pipes.
2) The P4 has small caches.
3) The P4 has huge bus bandwidth.
4) The regular FPU has been largely depreciated in favor of SSE2.
What does all this add up to? A chip to accelerate 3D. This feature list reads largely like the list of the Playstation 2. (Aside from the long pipelines thing.) You've got the small caches, high bandwidth, and the vector pipes. My guess is that Intel, seeing NVIDIA cramming more and more into the GPU, is trying to come back and troughly blow them out of the water. This chip might process slower per clock for many uses, but the high clock makes up for that. On the otherhand, things that are extermely regular without any branches (ahem, 3D geometry processing) will absolutely fly through this thing.
A deep unwavering belief is a sure sign you're missing something...
How about a filter that tracks referrals from Slashdot and bounces them beyond a certain load level ?
I've no idea if that's what happening, but it's something I'd want to have on hand, if I ran a site like Anand that was regularly whacked with Slashdot's million typoing monkeys.
so it'll be as much faster as when I put the 387 into my 386?
da w00t.
da w00t. mtfnpy?
I don't know about the rest of you but i don't have enough bandwidth for the text based internet as it is.(It really suck living at the end of a copper line, Max = 26.4 kbps)
In addition, the Pentium 4 will contain a 20-stage pipeline. The pipeline is a processor's assembly line. While this means the Pentium 4 will have a line twice the length of the 10-stage Pentium III, the longer pipeline will create room for speeding up the chip.
Could someone explain to me how having a longer pipeline speeds things up? this seems kinda counter intuative to me. Guess its like the pipelines in the 3D GPUs, but i don't see how that would work in a general purpose CPU.
It will contain 42 million transistors, compared with 28 million for the Pentium III.
Even with a smaller feature size won't this create a lot of heat, especially running at 1.4Ghz? IANAExpert but since PIII's run at 90C can we expect this CPU to run ultra hot as well?
Those who will not reason, are bigots, those who cannot, are fools, and those who dare not, are slaves. --George Gordon Noel Byron (1788-1824), [Lord Byron]
Um, I don't know if you're serious, but stuff like this works perfectly fine on earlier CPUs too, of course. ;^) I probably shouldn't be doing this, but Cycore have some neat tech for doing this. According to their download page, the plugin for their technology (Cult 3D) is available for Linux as well as the Other OS...
main(O){10<putchar(4^--O?77-(15&5128 >>4*O):10)&&main(2+O);}
Another processor from intel? Now damn't I just
gave them a bunch of cash for the PIII, just
like I did the PII, and just like the Pentium and
the Pro version.
I didn't really notice a big jump in performance
on the last buy, but what can I do...it is intel.
Why is this so difficult for people to understand? Unless you are doing something really hardcore, like lots of video work or heavy numerical analysis, you're not going to notice any performance benefit. Additionally, we've reached the point where rethinking or rewriting can pay off much, much more than incremental processor speed upgrades. For example, Borland's Object Pascal compiles 10-100x faster than gcc. If you use it, then you're getting an order of magnitude increase. Compare that to the benefit gained by going from a 400MHz Pentium II to 1GHz Pentium III (less than 3x).
On the other hand, note that Merced/Itanium/IA64/whatever seems to have gone away. The Register now points out that IA32 is 2x faster than IA64. Oops. Probably just as well; Merced was hell to program. VLIW architectures require miracles from the compiler.
Besides, even the 1GHz PIII is mostly vaporware. Try to get one. Yes, they exist, but there aren't many of them.
The Register has a nice anti-hype article about the P4.
My favourite is
Hi!
Were you actually planning on reading the article before speculating wildly?
You must be new round these parts...
I always read the articles first - don't you? :)
--
"Rune Kristian Viken" - http://www.nwo.no - arca
Hmmm... I wonder if their webserver is running on a Pentium 4?
The article says:
Hi!
Maybe they're still working on a click-through NDA...
Gav
"There's no such thing as data that can't be manipulated"
From the article:
IANACD (I am not a chip designer), but this seems to me like a major disadvantage compared with the Athlon. Am I missing something obvious?
Hi!
Try this link at CNET for more information.
---
Jon E. Erikson
Jon Erikson, IT guru