Slashdot Mirror


Architectural Difference Between The P4 And G4

homerJAYsimpson writes: "This article is a great refernce of the differences in the architecture of the P4 and the G4. What is nice is that it is not a holy war of who is better but an explaination of why Intel made its choices and uses the G4 as a point of reference. It has just tons of info on uPs, useful for everyone." Not for the techie novice, but its a well written piece if you're reasonably technical and want to understand more about two of the most important chips on the market.

24 of 78 comments (clear)

  1. The difference... by Anonymous Coward · · Score: 5

    is about 70 degrees F.

  2. Predictor of predictors by nadador · · Score: 3

    I'm interested to see what the influence of Alpha IP will be on Intel core designs. When I took computer architecture at CMU we spent a couple of lectures on why the Alpha was the best thing since sliced bread, as far as microprocessors go.

    One of the big things that the Alpha did that was so cool was the branch predictor, which actually implemented two branch prediction algorithms and then had a predictor that watched them both and picked the one that was recently the most correct. Some of that kind of deep knowledge of branch prediction and how to avoid having your long pipeline kill performance would be information that Intel could sorely use on the pentium 4 core, as well as on the Itanic, I mean Inanium, I mean *Itanium*. There we go.

    Is anyone else suprised that the G4 core seems so vanilla? The difficulty of making a 4 stage pipeline run at upwards of 733 MHZ on a .25 or .18 micron process is pretty amazing. I'm impressed. I suppose that the embedded focus at Motorola meant that bells and whistes weren't a high priority, but I wonder what kind of performance improvements G4e will demonstrate with a longer pipeline and all.


    --

    Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
    1. Re:Predictor of predictors by Wesley+Felter · · Score: 2

      In a talk at UT Austin, one of the Pentium 4 architects claimed that it has the best branch predictor of any CPU, but they haven't published the details yet.

  3. Clock Speeds by Midnight+Thunder · · Score: 5

    One thing that I got from this article is why we shouldn't be depending too much on clock-speeds for comparison, and thus the fact that PPCs aren't yet available at clock speeds of x86 shouldn't really matter. The wide and shallow approach of the PPC certainly means that less clock cycles are needed than the narrow and deep approach of the x86.

    Now I know that they only tests that really matter are the real world tests, simply because at a user level that's the only real place that I'll notice the difference.

    Of course another issue is going to be motherboard differences and how much I/O depends on the processor, but this is another story.

    --
    Jumpstart the tartan drive.
    1. Re:Clock Speeds by CBravo · · Score: 3

      I'll drop the cluebomb again:
      -it is not about processors/instructionsets
      -it is not about MHzs

      it is about e.g. compilers, parallellism, shortest path , bandwidth, technology and algorithmz. You _then_ work on the rest.
      Processors are only a means to what you want to accomplish. I've seen DSP's take a 4x MHz gap just because it had a good architecture. Deep down information processing (clocked or not) takes time to go through the logic.

      --
      nosig today
    2. Re:Clock Speeds by sracer9 · · Score: 2

      "The wide and shallow approach of the PPC certainly means that less clock cycles are needed than the narrow and deep approach of the x86."

      Remember, the article is about only one "x86" processor: the P4. Clock speed still matters, just not as much with that particular processor. There is a performance penalty to be paid due to this design philosophy. A similarly clocked P3 will eat a P4's lunch because of it. Let's not even get started with what a similarly clocked Athlon does :) I think that Intel's thinking was: "Who cares if it might be less efficient on a per-clock basis, this thing'll ramp up to such a high clock speed that it's inefficiencies won't matter anymore."
      By comparison, the G4 and Athlon are very efficient with their clock cycles.

      --

      No thanks. I don't smoke anymore.
    3. Re:Clock Speeds by Diomedes01 · · Score: 2

      Hmm... if you had actually read and understood the post you are replying to, then you should be ashamed of yourself. The poster even said that part of the design philosophy behind the P4 was so that it will reach higher clock speeds, and thus the Intel engineers figured that the benefits in clock speed outweighed the performance hits. Personally, I think that the more elegant chips are nicer, although perhaps not always faster. Sometimes, people can't realize that brute force is not always the way to go...


      -------

      --
      "To hope's end I rode and to heart's breaking: Now for wrath, now for ruin and a red nightfall!"
  4. Intel + Alpha IP = AMD Killer ??? by powerlord · · Score: 2

    I seem to remember reading that AMD had licensed a whole bunch of tech from DEC (bus archetecture etc.), and that their biggest issue right now was predictor logic. It seems that the main reason Intel would buy Alpha IP is to keep someone else (like their prime competitor) from doing it first.

    --
    This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
  5. Re:One thing that's always bothered me... by jezzball · · Score: 2

    Nice superlatives...

    "A 400 MHz G4 ... slower than ... 4GHz P4."

    Hmm. Slowest G4 ever released - 350. So a 400 is about the same as the baseline model. Of course it's going to be slower! :-P

    But to give you some of my own stats...

    My home computer is a G4/400. My work computer is a P4/1.7GHz. Pretty fair comparison in terms of age of chip (although I'd really think the 450 (top of initial line) would compare better to the 1.7).

    In RC5, no altivec optimization, the G4 is about half as fast (1.3Mkeys/s to 2.4Mkeys/s). This is with less than a quarter the clock.

    With altivec optimization, the puny G4 does 3.5Mkeys/s.

    Simple benchmarking, not necessarily too indictive of normal use, but thank you, move along now - nobody's saying that the G4 will always be faster because of more work per clock cycle, but that the speeds don't have to be so phenomenal on them. Mine's a lowly 400. Imagine a 733?

    Dan
    ls: .sig: File not found.

    --
    ls: .sig: File not found.
    (A)bort, (R)etry, (I)gnore?
  6. Re:G4 irrelevant by tbo · · Score: 2

    He said mass market.

    Apple will soon be the world's largest *nix vendor, thanks to OS X. How do you like them Apples, Tux?

  7. Re:mips + ARM x10 more chips than PCs by barracg8 · · Score: 2
    We should be more interested in Mips/Arm chips because there happen to be more of them in use?

    That's kinda like saying that food critics should spend their time reviewing McDonalds burgers.

    There are interesting things going on with Arm designs (jazelle hardware JVM, Amulet asyncronous designs, etc), but in general the Mips/Arm markets are all about taking simple RISC cores and producing them cheap and running at low power.

    Nothing wrong with being more interested in the high end of the market.

    G.

  8. One thing that's always bothered me... by dbarclay10 · · Score: 2

    One thing that has always bothered me is this nasty attitude towards clock speed. This article isn't too bad, but I've seen worse(and many comments here count towards that).

    Of course clock speed alone doesn't a benchmark make. It's just a number. But it STILL COUNTS. A 400MHz G4[e\+], even with its shallower architecture(accomplishing more work per clock) is *still* going to be a helluva lot slower than an insanely-clocked(4GHz) P4. And I mean a *lot* slower.

    Yeah, you heard me. Amazing, isn't it? Even though the P4 does less per clock, it can actually be FASTER than another chip, if its clock speed is high enough.

    Gee, do you think Intel's world-class design team might take that into account? You think that just *maybe* it might be more than a simple marketing gimmick?

    Let's take a look at this. The Pentium Pro core(which is what the Pentium Pro, Celeron, Pentium II, and Pentium III were based on) was designed with a lot of clockspeed headroom in mind - and lo and behold, it actually worked. By the time that core is retired, it will be orders of magnitude faster than the original cores.

    Can you say the same thing for the G4? No. Oh, sure, it might be two or three times faster than the original when it's retired. But nowhere near the improvement we saw with the PPro.

    So, what's my point? Here it is: yeah, you can't go around buying processors based on clockspeed. But please take it into account. It's not like you can say "a 1MHz G4 is faster than a 1GHz P4, because the G4 does more work per clock cycle." Thanks for listening :)



    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  9. The article is about chip architecture by artemis67 · · Score: 2
    This is basically another very pedestrian hate-on-P4 article with very little substance. P4 does have some performance problems (mostly to do with shifts and multiplies) and they're documented in the optimization manual, but this article does nothing to dig any deeper than what a dozen other pedestrian articles have said.

    No, he says quite plainly that the article is about a comparison of the two chip architectures, and not about which one is the fastest. I don't think there's any question right now that the fastest consumer desktop systems on the market are powered by x86 chips.

    However, reading the article gives one a good understanding of why a G4e running at the same mhz as a Pentium 4 will beat it every time.

  10. Re:The differences by Argy · · Score: 2

    > Think about how piss-poor spelling completely screws up the ability to search for anything, anywhere.

    Ah, but think about how piss-poor spelling will fuel ever-more-powerful search algorithms that can take into account mis-spellings! Goto.com already does a lot of this; try searching for "britteny spiers" or any reasonable variant. Their pay-per-search business model gives them a direct financial incentive to correct such errors.

    Heck, if everyone were as careless as Taco, spelling wouldn't even matter, because browsers would have Autocorrect(tm) built in to the rendering engines!

    All hale Cmrd Tacko! Hes not a looser!

  11. Re:Comparing cycle penalty times is meaningless .. by VAXman · · Score: 2

    Yeah, but the P4 is not available @ 1.8 GHz yet

    Yes it is. It was launched on Monday.

    And even with the faster P4 (woah, 6% higher clock), 20 cycles @ 1.8 GHz is more than 7 cycles @ 733 MHz.

    The point is, a 20 cycle penalty is not three times more expensive than a 7 cycle penalty, as the article implies.

    The article also fails to mention that Willamette has the most advanced BPU in the world, which minimizes the number of mispredicts greatly.

  12. Re:Comparing cycle penalty times is meaningless .. by VAXman · · Score: 2

    And the ! GHz P3 was launched weeks before it was available. So where can I buy the P4 1.8 GHz?

    Go to any major OEM website such as Gateway or Dell. They are all shipping 1.8 GHz Pentium 4 systems today, and are advertising them on their website.

    (As an aside: it's really funny how some /.-er's get really confused and don't understand why some processors are not listed on pricewatch and then accuse them as not being available).

  13. Comparing cycle penalty times is meaningless ... by VAXman · · Score: 3

    The P4's long pipeline means that bubbles take a long time to propagate off the CPU, so a single bubble results in a lot of wasted cycles. When the G4e's shorter pipeline has a bubble, it propagates through to the other end quickly so that fewer cycles are wasted. So a single bubble in the P4's 20 stage pipeline wastes at least 20 clock cycles (more if it's a bubble in one of the longer FPU pipelines), whereas a single bubble in the G4e's 7 stage pipeline wastes at least 7 clock cycles. 20 clock cycles is a lot of wasted work, and even if the P4's clock is running twice as fast as the G4e's it still takes a larger performance hit for each pipeline bubble than the shorter machine.

    What the author apparently fails to grasp is the only thing which matters is wall clock time. P4 may have a 20 cycle mispredict penalty, higher than G4e's penalty of 7, but it also at about triple the clock speed. 20 cycles @ 1.8 GHz is less than 7 cycles @ 600 MHz.

    This is basically another very pedestrian hate-on-P4 article with very little substance. P4 does have some performance problems (mostly to do with shifts and multiplies) and they're documented in the optimization manual, but this article does nothing to dig any deeper than what a dozen other pedestrian articles have said.

    Also ...

    Intel was definitely paying attention, and as the Willamette team labored away in Santa Clara they kept MHz foremost in their minds.

    Willamette was designed entirely in Oregon. Santa Clara had nothing to do with it, and has had nothing to do with IA-32 design since P5 (nearly 10 years ago).

  14. Intels aquisition of compaq alpha by YakumoFuji · · Score: 2
    it will be interesting to see if intel implement any of the design ideas + ip from the aquisition of Compaq/DEC's alpha architecture.

    ideas from the EV bus protocol to scaling. My guess is since processor design has such a long term for each cpu, that future designs are fairly well hard coded, intel couldnt just drop in any compaq IP 'just like that'.

    so now intel have alpha technology and arm technology.. imagine the combination of the two! what a hybrid processor that would be.

    aaah for those that dream anyway...

    i would be interested to read a form of comparision to sun's usIII on a technical design level.

    Write your Own Operating System [FAQ]!

    --

    no sig for you
  15. The differences by BadDoggie · · Score: 2
    "This article is a beginnurs refernce to thee impotence off english speling. What is nice is that it is not a holy war of who is better but an explaination of why speling is impourtint It has sum nfo on useige, useful for everyone." Not for the pedantic novice, but its not two badlee written piece if you're reasonably hyumin and want to understand more about sum things abowt you're langwidje.

    Sorry, Taco, but it's getting worse. Think about how piss-poor spelling completely screws up the ability to search for anything, anywhere. That ought to be reason enough for you and everyone else to at least CONSIDER spell-checking.

    We're geeks, and we all hate being judged on what we look like or what weird idiosyncrasies we have, yet many of us have also learned the hard facts of life: people jedge based on what they see. Bad spelling == worthless. If you can't be bothered to check what you write, why the hell should I be bothered to read it?

    Aren't you supposed to be a programmer or something? Yeah? Then how the fsck do you get anything to run (besides the debugger) if your syntax is even close to its English counterpart and your variable names never have the same spelling on any two lines?

    woof.

    Mod: -2 Pedantism, -1 Taco-spell-flames no longer amusing, +2 Interesting, +2 Insightful, etc.
    Total: +1, exactly what it would be posting after logging in, so don't waste your mod points here.

  16. Different Chips... by zephc · · Score: 4

    for different uses!

    The G4 is meant to be usable in embedded systems, while the P4 is meant to be usable as a space heater

    =P
    ----

    --
    "I would say that 99 per cent of what my father has written about his own life is false." - L. Ron Hubbard Jr.
  17. Learn to read (and I'll learn not to troll) by WIAKywbfatw · · Score: 3

    "2 of the most important chips on the market"

    Jeez, why do people have such a bad grip of the English language? Is it really that hard to understand?

    Yes, "two of". As in "not exclusively of". Yes, the Intel Pentium 4 is one of the most important chips out there. And yes, so is the AMD Athlon. But so it the Motorola G4, and so for that matter is the upcoming Intel Itanium.

    Now if the description of the article said "the two most important", I could understand your gripe. But it doesn't. And besides, haven't we already seen dozens of similar comparisons between Intel and AMD processor families?

    --

    "Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
  18. Conclusions & Questions by Marcus+Brody · · Score: 2
    I felt it necessary to post some conclusions to the article, as the author completely failed to do this. Otherwise the article was very well written - if you are fairly new to this stuff, don't be put off by Taco's assertation that this is Not for the techie novice. I know very little about processor architecture, but learnt allot from this article.

    Some of the /.ers out there who are more au fai with this stuff than myself may want to correct me on some of the following points.

    Basically, the clever folks at the Intel marketing department realised that the only thing the General Public know about processors is G/MHz. Therefore this is their only point of comparison between processors in the fragmented AT market (obviously, the G4 does not suffer from this competition, which reflects in the differences in architecture). Therefore the techies at Intel were given the orders: "make the clock speed as high as possible (and also make the processor fast!)".

    Clearly, the architecture of P4 was thus designed to break up long instructions into many shorter instructions (over-simplification) which which can each be completed in a shorter single clock cycle. This leads to a 'long-pipeline', of many instructions:

    Since each stage always lasts exactly one clock cycle, shorter pipeline stages mean shorter clock cycles and higher clock frequencies. The P4, with a whopping 20 stages in its basic pipeline, takes this tactic to the extreme

    However, using this longer pipeline leads to problems - especially when the processor doesn't have any instructions - thus causing a "bubble" which has to propagate right down the long pipeline, and also when the "branch prediction" (i.e. the prediction of what type of instruction to use on the data) is wrong - again causing a delay as the 'bad' instructions propagate through the processor.

    Of course the clever guys at Intel came up with some novel solutions to this. This includes:

    -Using larger Branch History Table - which includes record information about the outcomes of branches that have already been executed, which helps in branch prediction.

    -The trace cache - Which is used for storing translation or decoding logic for the L1 cache, which is particularly useful for blocks of code that are executed thousands and thousands of times.((this reminds me of MMX, although I think that worked in a different way. Any ideas why MMX isn't used anymore?????)) ...there's no delay associated with looking it up and hence no pipeline bubble

    -A special microcode ROM that holds pre-packaged sequences of uops so that the regular hardware decoder can concentrate on decoding the smaller, faster instructions. This stops these longer instructions from polluting the trace cache.

    -Some others that i forgot/understood even less well?????

    This all seems to be an interesting case of the public's perversion for clock speed subverting processor architecture (although not necessarily in a bad way).

    Would processors be faster "overall" (im sorry, that's terribly vague) if there wasn't such a push for faster clock speeds???

    --The real Marcus Brody doesn't have a Slashdot ID

  19. Clock speed by cosmo7 · · Score: 2
    From what i can understand, Intel has designed the P4 to be able to run at HIGH CLOCK SPEEDS regardless of the actual performance improvement. They astutely see consumers going for an easy metric.

    So, in the same spirit, i have my offering for cpu design: a simple divider on the clock input. This would only take two transistors and yet the processor would double in clock speed! The 3GHz chip is here already . Now, how do i patent the idea?

  20. x86 instructions are bytecodes of the future by Waffle+Iron · · Score: 5
    After reading this article I think that history is repeating itself. I've been scoffing at the P4, but now I think that Intel may be laughing at the end.

    If you remember then the Pentium Pro came out, people (including me) dissed it because it was years behind schedule, huge, expensive and hot. Actually, its architecture was just ahead of the process technology curve. With a few tweaks, the same CPU core came to dominate the world with the P-II and the P-III.

    Looking at the radical changes in the P4, including storing only uOPs in the instruction cache and reserving (currently useless) pipeline stages for speed-of-light cross chip delays, they are planning ahead for future realities. We can think of the current P4 as being like the Pentium Pro, just a short-lived beta release.

    The more interesting question is which approach to driving uOPs will win out: P4, Transmeta or Itanium. P4 and Transmeta convert legacy x86 opcodes to internal wide architecture on-the-fly (P4 in hardware, transmeta in software); Itanium makes the compiler generate wide architecture directly. Note that the original pre-translated instruction format (CISC, RISC, Java bytecodes, whatever) is now largely irrelevant.

    My view is that in the abstract, Transmeta has the best approach, followed by P4 and Itanium last. This is because the software approach is the most flexible and can even be upgraded in the field. In theory, it could detect and store the individual performance characteristics of each program on a user's machine. Granted, they currently focus on low-power, but if they retargeted their technology at high speed, it could be interesting.

    The P4 approach is hardwired, but at least it can adapt to local code characteristics and translate them to the current internal architecture version.

    The Itanium exposes low-level chip details to the compiler, and the decisions are cast in concrete from there on out. It doesn't seem very future-proof to me; if the IA64 architecture changes in the future, today's compiled code will suffer.