Slashdot Mirror


Intel to Increase Stages in Prescott

Alizarin Erythrosin writes "Further contributing to the MHz Myth, The Register and ZDNet are reporting that the new P4 core, codenamed Prescott, will have a longer pipeline then Northwood. No official numbers have been released, but The Reg is saying an Intel spokesman said that 30 stages seems to be a reasonable estimate. As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls. 'And just as the PIII proved faster than the early P4s in some applications, it's likely that Northwood will similarly prove faster than Prescott, which has clearly been designed for speeds of the order of 4GHz.'"

524 comments

  1. Holy pipelines by Breakfast+Pants · · Score: 3, Funny

    With all these pipelines you'd think intel was Bush and Prescott was Afghanistan.

    --

    --

    WHO ATE MY BREAKFAST PANTS?
    1. Re:Holy pipelines by k4_pacific · · Score: 5, Funny

      Recall that GW Bush's grandfather was Prescott Bush.

      --
      Unknown host pong.
    2. Re:Holy pipelines by Distinguished+Hero · · Score: 1

      Pipelines for what? Opium? There's no oil in Afganistan.

      --
      Uttering logically derived and empirically supported truths to the disciples of the orthodox establishment.
    3. Re:Holy pipelines by Anonymous Coward · · Score: 0

      do a google search for "caspian sea oil", or even better, "caspian sea natural gas". Then note the location of countries in that region relative to afghanistan.

    4. Re:Holy pipelines by Breakfast+Pants · · Score: 2, Informative

      There may be no oil in Afghanistan, thats why I didn't say oil wells. Also, you must have not heard about the huge plans for a giant ass pipeline that will pass right through Afghanistan.

      --

      --

      WHO ATE MY BREAKFAST PANTS?
    5. Re:Holy pipelines by Distinguished+Hero · · Score: 1

      Now that you mention it, I seem to recall it vaguely. Anyways, thanks for pointing that out.

      --
      Uttering logically derived and empirically supported truths to the disciples of the orthodox establishment.
    6. Re:Holy pipelines by Otter · · Score: 0, Offtopic
      The "giant ass pipeline" is being built in one of the other Stans -- Kazakhstan, IIRC.

      You must not have heard about that, which isn't surprising since the Chomskyite lunatic community isn't big on announcing corrections. After all, being a conspiracy nut means never having to say you're sorry.

    7. Re:Holy pipelines by Aardpig · · Score: 0, Insightful

      Whats your fucking point?

      A stab in the dark here, but maybe he's pointing out that the Bush has got where he is today via ill-gotten gains. Consider this: had Prescott Bush not profited greatly from business with the Nazis, would George Snr. have been able to buy a place at Yale for George Jnr.? Would George Snr. have been able to get George Jnr. out of the Vietnam war and into the Texas Air Guard (where Dubya proceeded to go AWOL for 2 years)? Would George Snr. have been able to buy the presidency for George Jnr.?

      Oh, the irony of it all: that the neo-cons, who are rabidly Zionist, have as their figurehead a man who got where he is today because his grandpappy profited from the extermination of the Jews.

      --
      Tubal-Cain smokes the white owl.
    8. Re:Holy pipelines by shokk · · Score: 1

      Errr...there are entire continents that are ill-gotten. Surely the relatively measly gains the Bush family is alleged to have pale by that. Believe it or not doing ill-gotten things seems to be a common human trait. It's the people who behave nicely that are out of the ordinary. What's with them anyway?

      --
      "Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
    9. Re:Holy pipelines by wwest4 · · Score: 3, Offtopic

      it's still on the agenda.

      a trans-afghan pipeline has been encouraged by the us for years preceding the latest invasion of the country. it may never be built, but it is still being pushed by the US. There has been news trickling in fairly steadily in the past two months about this. eg from times of india jan 12

      the kazakhs HAVE a good deal of oil/gas - it needs to get south and west. maybe you're referring to the BTC pipeline project that replaced the first trans-afghan pipeline plan.

      the idea put forth by the "conspiracy nuts" is that the US had an interest in occupying the region because their presence means they can fund and participate in the installation of new export infrastructure (like the BTC, in which US-based Unocal is involved). The war in Afghanistan meant bases in neighboring countries like Uzbekistan, Pakistan, and Kyrgyzstan, which allows for a permanent regional presence.

      it doesn't really matter where the pipeline runs -the US couldn't have participated as easily if it hadn't established a presence.

      maybe the conspiracy nuts should hold off on the apologies after unocal donates a portion of their profits to the poverty and war-stricken afghan people, or towards the all-too-modest $160 million reconstruction plan. (To put this into perspective: this doesn't even approach the size of the defecits some of the US' state budgets run).

    10. Re:Holy pipelines by Otter · · Score: 0, Offtopic
      maybe you're referring to the BTC pipeline project that replaced the first trans-afghan pipeline plan.

      You mean the "IT'S ALL ABOUT OIL!!" pipeline, the one Unocal is involved in, the "giant ass pipeline" the guy I responded to was talking about? Yeah, I might be referring to that.

      Look, it's over two years later, there's no pipeline. There might be a different pipeline someday? Well, that's a win-win scenario, isn't it? Afghanistan becomes a less impoverished wasteland and the people who fought to keep the Taliban in place get to keep their cognitive dissonance running for a few more years. I'm certainly not going to insist that no industrial development may happen in the region because it might feed the sort of idiocy the AC who responded to me laid out nicely.

      maybe the conspiracy nuts should hold off on the apologies after unocal donates a portion of their profits to the poverty and war-stricken afghan people.

      That makes absolutely no sense whatsoever. And who is giving out mod points on this thread, anyway? The article is about CPU development, for chrissake.

    11. Re:Holy pipelines by wwest4 · · Score: 1

      > That makes absolutely no sense whatsoever. And > who is giving out mod points on this thread,
      > anyway? The article is about CPU development,
      > for chrissake.

      typo. they should hold off UNTIL after unocal...

      i agree, the mods are off as usual, this is offtopic. it doesn't matter, mod points don't mean much, especially when they're so obviously misallocated. (at least to me, i browse at -1).

      i hope you're right and afghanistan does benefit someday, but saying it's "win-win" - that's like saying what doesn't kill you makes you stronger. is it ethical for me to dust someone proper until he cries and then tell him that it's win-win because he gets insurance money AND i made him my bitch?

      two years isn't really a long time - they've been pitching the pipeline for over 5 years. and like i said, the location of the pipeline doesn't make a huge difference - pipelines in that region would have to span several countries - the point is that the pipelines are coming sooner than later (when Russia or China would have eventually financed them) because the US has an established presence in the region... and the US has such a presence in the wake of the invasion and subsequent occupation. That's why people smell an ulterior motive, and I think common sense dictates that since most of the current administration is connected to oil and energy interests, it doesn't take a conspiracy nut to smell an ulterior motive.

    12. Re:Holy pipelines by mikeabbott420 · · Score: 3, Interesting

      Could we explain to people the differance between megahertz and performance by comparing it to cars? Sure the intel xxx does yyy but thats a 4 (IPC) cylinder that does yyy rpm vs a a 8 (IPC) that does zzz rpm but more horsepower. megahertz=rpm ips=horsepower if the general public understood that megahertz was rpm not horsepower intels talented engineers could build great things freed from the marketing departments focus on rpm

      --
      This program was made possible by a grant from the Ultra-Humanite, and viewers like you.
    13. Re:Holy pipelines by Anonymous Coward · · Score: 0

      So we should ignore these scumbags as they aren't as bad as other, previous scumbags???

      Enjoy another term of ramapant corporatism asshole.

    14. Re:Holy pipelines by Orion442 · · Score: 0

      giant ass pipeline

      That sounds like gay slang for something I really don't want to know about...

    15. Re:Holy pipelines by Ubermoz · · Score: 1

      Haven't AMD already done this? They started by disguising the clock speed with 3200+ style nameing. And now with the 64FX 52 they've got rid of any reference to megahertz (or rpm as you so naffly put it)

    16. Re:Holy pipelines by Anonymous Coward · · Score: 0

      The fact that he confused Iraq with Afghanistan is what *made* it funny. Pay attention. You're the dumbass - the moderator was just doing a good job.

    17. Re:Holy pipelines by shotfeel · · Score: 1

      I'd make it simpler.

      The engine on my weed-eater can go 10,000 rpm. The engine on my car only 6000. Which one delivers more power?

    18. Re:Holy pipelines by Darth23 · · Score: 1

      " With all these pipelines you'd think intel was Bush and Prescott was Afghanistan." *rimshot*

      --

      -------- In Soviet Russia, "Soviet Russia" sigs hate Slashdot.

    19. Re:Holy pipelines by Short+Circuit · · Score: 1

      Well, my AMD chainsaw certainly doesn't get caught on little things like branches...

      And my AMD sedan doesn't stall much at all.

  2. Bang for your buck by ObviousGuy · · Score: 5, Funny

    Northwood was really unsatisfying. I found that for the money, it was too short with too few stages. While gameplay was fine, the lack of stages simply made the cost not worth it for me.

    2 stars.

    --
    I have been pwned because my /. password was too easy to guess.
    1. Re:Bang for your buck by Anonymous Coward · · Score: 0

      I believe I speak for most people here when I say, WTF are you talking about? :)

    2. Re:Bang for your buck by johnnorthwood · · Score: 3, Funny

      Hey... I have had many ladies say i was satifying.

    3. Re:Bang for your buck by back_pages · · Score: 1
      no mod points but man that was funny

      if I was sober enough to change my sig i tw oudl say "something about drinkng and mod points"

    4. Re:Bang for your buck by Anonymous Coward · · Score: 2, Funny

      Yup, you served them food quickly and got their order right every time. That's not a small feat for a fast food worker though.. You wear that employee of the week badge with honour.

  3. Why? by homeobocks · · Score: 0, Redundant

    Longer pipeline + equal number of errors == more data need to be redone == slower!

    --
    MOUNT TAPE U1439 ON B3, NO RING
    1. Re:Why? by mabinogi · · Score: 1

      you forgot, * higher clock speed == faster.
      and + more clock speed headroom == faster again later.

      --
      Advanced users are users too!
    2. Re:Why? by Anonymous Coward · · Score: 0

      Holy shit dude, I heard Intel is hiring. Better get round there.

    3. Re:Why? by homeobocks · · Score: 1

      I'm not saying that the processor as a whole is slower, I'm just saying that their action is threatening to cripple the processor in the MHz race (because most people only care about the MHz).

      --
      MOUNT TAPE U1439 ON B3, NO RING
    4. Re:Why? by Anonymous Coward · · Score: 0

      No it doesn't. The clock keeps going up and down at 4 ghz regardless of how often the branch prediction misses. Thats all anyone cares about, and if they can keep the frequency going up faster than the IPC is going down, its all that matters.

    5. Re:Why? by slash-tard · · Score: 1

      Yes you are correct.

      But if intel can have a 20 stage chip that peaks at 3 ghz for example and does 30 mflops of work or have a 30 stage chip that peaks at 6 ghz and does 50 mflops of work then the 6ghz is better.

      These are just example numbers. Intel pays its chip designers good money and they arent stupid. They may favor one strategy over another and now have competition from AMD and IBM but im pretty sure it will be a decent chip.

    6. Re:Why? by AvitarX · · Score: 0

      Except that the the stradegy is working.

      The P4's were the fastest x86 for a while.

      solving a problem with a different approach is not wrong, it is just different.

      Branch mispredictions are rare (3 in 1000 I think, though it may be in 10,000)

      so if it stalls 20 or 30 cycles it still has 333 cycles to gain them back.

      all things being equal (they arn't though) I would take double the speed with triple the pipeline.

      if they can add 33% to the prossessor speed with a 50% increase in pipline it's not too bad. But when it allows them to go even faster it can be a big plus.

      the P4 is supposed to be capable if 10Ghz. that allows for a lot of wasted cycles while still being really fricken fast.

      --
      Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    7. Re:Why? by phorm · · Score: 4, Interesting

      Which basically means, Intel can release a CPU with a higher MHZ rating for those that fall for such things.

      In reality the CPU will be somewhat faster than current ones due to the higher clock, but much less efficient.

      Why not just dump MHZ as a rating altogether? Wouldn't FLOPS-based (Floating Operations Per Sec) or something similar be a better measurement? Maybe how far a simple program can compute PI in a second? We should really be looking at an operational-based measurement rather than a clock-based one.

    8. Re:Why? by Wanderer2 · · Score: 5, Interesting
      Why not just dump MHZ as a rating altogether?

      Didn't AMD try to organise this and recently concede it wasn't going to happen?

      As long as any metric favours one particular manufacturer, the rest will try to replace it with a new one. The result will be more FUD and ore confused users ("I've finally worked out what GHz are and you tell me I have to look at the number of flops?!?")

      </Pessimist>

      --
      I say we take-off and slashdot the site from orbit... it's the only way to be sure
    9. Re:Why? by Master+Bait · · Score: 1
      It all depends on your benchmarks. If you benchmark something that does simple math on a large set of streaming data, the the CPU will shine... and that proves that the length of piplines don't matter!

      --
      "Only in their dreams can men truly be free 'twas always thus, and always thus will be."
      --Tom Schulman
    10. Re:Why? by Witsu · · Score: 1

      Is/Will Intel improve the branch prediction capability in Prescott to try to counterbalance the effect of a longer pipeline?
      Assuming not, how much effect with the 13 new SSE3 instructions have on performance?
      Could Prescott be slower than Northwood at the same clock frequency?
      If so, by the time the Pentium 5 comes around, it will probably be doing negative work per clock cycle

    11. Re:Why? by Sivar · · Score: 4, Interesting

      More clockspeed = more sales. 95% of computer users (or is it 94%, with recent improvements in public education) believe in the MHz Myth mentioned on the front page.
      The MHz myth is the belief that the OneTrue measure of CPU performance is clockspeed. A 2GHz CPU is twice as fast as a 1GHz CPU. A 4GHz CPU is twice as fast as a 2GHz CPU.

      While it may not seem common to many of us, if you speak with a large number of average people about computer performance, you will quickly want to kill yourself. Or them. Or both.

      This isn't the fault of the general public, as Intel's marketing machine takes advantage of this common belief. Intel Pentium IV processors are some of the highest clocked processors in the world, and they benefit from everyone that thinks this somehow matters.

      --
      Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
    12. Re:Why? by Anonymous Coward · · Score: 0

      Because no one pays attention to benchmark comparisons. In fact I can't think of a single site that still uses them to produce gay little bar graphs. Oh wait....

      Large on chip caches, better branch prediction, and optimizing compilers can make a lot out of those extra stages. If the CPU finds it has to crap itself every once in a while, it's probably not a horribly big deal, since there is a pretty good chance the user is moving on to something else, and would at least tolerate it, if they even noticed.

      CPU's are like cars. People who don't know better buy into models withing brands. Those who do know better know MHz for what it is. Those who think they know better troll slashdot with more "BSD died on the Itanic" posts. Me, I buy a chip for a motherboard I like. I've found that's the best way for me to maximize my performace/hassle metric. (Hassle being the product of irritability and time, time being the sum of time spent working for the money to pay for it, and spent screwing around with it)

    13. Re:Why? by nelsonal · · Score: 1

      Why is it that we are willing to make very big decisions on so little information, a lot of people are heavily influenced by the horsepower number on their car (or even just the displacement volume of the engine) which is as bad or worse than just looking at the clock of the processor. It doesn't take a whole lot of research to get a torque curve for most performance oriented cars, why are we willing to make a decision on so little info?

      --
      Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
    14. Re:Why? by ProtonMotiveForce · · Score: 2, Insightful

      Out here in "Reality World", as I like to call it, it _does_ matter. You see - performance is performance, whether it comes via IPC or high clock speed.

      Until the Athlon64/Opterons AMD had no answer to the P4. They just couldn't quite keep up. And you people harped on the same thing "Ooh, it's a marketing gimmick!".

      You want a marketing gimmick? How about selling a 64-bit CPU to people who have like 512M of memory. There's your gimmick.

    15. Re:Why? by GerryGilmore · · Score: 1

      You want a marketing gimmick? How about selling a 64-bit CPU to people who have like 512M of memory. There's your gimmick.

      Damn! Now where *did* I leave my mod points now that I *really* need them!

    16. Re:Why? by GerryGilmore · · Score: 2, Interesting

      Before you run off blaming the evil Marketing demons, let me ask you this.....what readily quantifiable measure would you use instead to compare systems for the broad range of users and applications - all other things being the same? (memory, disk, etc.)

      Imperfect a measure that it may be, it's a hell of a lot easier to relate to and compare than "how many FPS of Quake3 can I get?" or "how quickly can it compile the 2.6 kernel?"

    17. Re:Why? by Anonymous Coward · · Score: 0

      i don't *know*. why don't you *ask* cum-taco

    18. Re:Why? by Sivar · · Score: 4, Interesting

      " Out here in "Reality World", as I like to call it, it _does_ matter. You see - performance is performance, whether it comes via IPC or high clock speed."

      Yes, high clockspeed "speed demon" chips can and often do outperform high-IPC "braniac" chips. Whether the final performance of the fastest Pentium IVs ends up being as high or even higher than the fastest competitor does not change the fact that Intel has made no effort to dispel the MHz myth--and it IS a myth, and have in fact encouraged it.
      I said nothing of final performance figures. I was stating that the marketing gimmick is that MHz is an accurate measure of speed, which it is not--even between different revisions of Intel's own Pentium IV core, let alone in comparison to their competitors.

      "Until the Athlon64/Opterons AMD had no answer to the P4. They just couldn't quite keep up. And you people harped on the same thing "Ooh, it's a marketing gimmick!"."

      Athlons and Pentium IVs have been leapfrogging each-other for years. If you believe that 32-bit Athlons were never competitive with Pentium IVs, you are quite mistaken. I would be happy to help you research the issue.

      You want a marketing gimmick? How about selling a 64-bit CPU to people who have like 512M of memory. There's your gimmick.

      You may not be aware of this, but it is actually an intelligent idea to fix problems before they become problems.
      --LBA-48 was introduces before more than a tiny fraction of people had hard drives that were larger than the 128GB limit. Is it a marketing gimmick that LBA-48 supports multi-petabyte drives? (2^48-1 512 byte sectors).

      --Serial ATA, and even ATA100 were introduced long before any hard disk drive could possibly approach 100MB/sec sustained transfer rate. Even today's world's fastest hard drive, the Fujitsu MAS3735, cannot quite reach 80MB/sec. DId you know, however, that the same situation occurred with ATA66, ATA33, ATA16, etc.? Perhaps engineers should have waited until the performance barriers were making drive upgrades pointless before introducing faster means of communication? After all, "no hard drive could possibly even approach 33MB/sec" --1995.

      The same applies to 64-bit processors.
      The average Dell comes with what, 256MB RAM? Probably 512MB now? That is 1/8 of the "4 GB barrier" of 32-bit pointers. Actually, that barrier is either 1.5GB, 2GB, or 3GB depending on your operating system.
      Now, let's think: Have you ever seen the average amount of RAM in a system double? I seem to remember 4MB being "plenty" and 16MB being "wastefull and rediculous". I seem to remember 32MB being the standard, and anything over 128MB was an unwise waste of money.
      Do you think that maybe, possibly, that pattern might repeat? Perhaps--since it has happened every few years for decades--the average amount of RAM in a system might increase? Applications might want more than 4GB of address space? Quake 5 may require 6GB RAM minimum (16GB recommended)?

      In case you were not aware, the 64-bit mode of the Athlon64 provides real performance benefits, whether software cares about the extra address space or not. Many algorithms, particularly encryption, data management, HL math, high precision math, media en/decoding, and compression can make use of the larger register size.
      The fact that there are double the number of GPRs (that stands for "General Purpose Register" Ohhh, ahhh) and that the amount of data that one can fit into those GPRs has quadrupled, helps ALL software that is more than a 20-line assembly language experiment. Hell, even having 16GPRs (twice as many as previous x86 chips), the AMD64 architecture is still considered register-starved. Look at the PowerPC, the IA64, the AXP, the UltraSPARC, and just about any other mainstream high-performance processor architecture.
      You may want to look at the reviews from reputable publications showing substantial performance gains from 64-bit Opteron software, including software that could not care less if you have >4GB of memory. Hint: Tom's Hardware is not on that list.

      Is a 10%-30% performance boost a gimmick?

      --
      Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
    19. Re:Why? by Sivar · · Score: 3, Informative

      Before you run off blaming the evil Marketing demons, let me ask you this.....what readily quantifiable measure would you use instead to compare systems for the broad range of users and applications - all other things being the same? (memory, disk, etc.)

      Imperfect a measure that it may be, it's a hell of a lot easier to relate to and compare than "how many FPS of Quake3 can I get?" or "how quickly can it compile the 2.6 kernel?"


      That very question has long been a topic of heated debate. Years ago, AMD launched an initiative to create a nonbiased (so they say), general purpose universal benchmark. It never went anywhere as far as I know.
      Overall, Winbench 'XX is a good benchmark because it shows actual performance in real-world applications (albeit somewhat old ones). For games, the only reliable means of benchmarking is to test those individual games, or at least assume similar performance across many games that use the same game engine. The game industry is converging because of the extreme difficulty of developing truly sophisticated 3D graphics engines. I predict that within 5 years, there will be at most 3-5 major game engines used by 90% of high-budget games. A general benchmark of these 3-5 engines (or however many there turn out to be) could be used, either taking their average and giving an overall "gaming score", or predicting the performance of the many games based on each engine based on extensive benchmarking of a few titles using each.

      Server benchmarking is not an issue, because those involved in the tests often know what they are doing.

      As far as unix benchmarking, well, that is a major pain in the ass. That certainly does not mean that we should rely on clockspeed, or god forbid on BogoMIPS. A standard benchmark based on the compilation time of a certain version of BASh was proposed not too long ago. Because many Unix geeks are developers, this would not be a bad start. As for pure CPU tests, perhaps a mix of BZip2, large-scale encryption, and ... other things might be good. As with any benchmark, there are always caveats and special conditions involved. If one simply averages the scores of many benchmarks things happen such as one candidate doing rediculously well one one (possibly unimportant) part of the benchmark, thus throwing the average way out of kilter.

      Benchmarking is a science, an art, and a rather large pain in the ass.

      Your point is well taken though.

      --
      Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
    20. Re:Why? by pastafazou · · Score: 2, Interesting

      The problem in killing the myth is the dominance Intel has in the processor market. The average Joe is force fed "Intel inside" everywhere he looks, and the sales people in most stores don't bother to explain the differences between different architectures (or they just don't know). Intel has capitalized on this by pushing their architecture heavily towards higher clock speeds, at the cost of many other efficiencies. It's simply MHz & GHz that everyone mentions. AMD, IBM, Apple, Sun, Motorola etc should start pushing something else that can be realistically measured. Maybe someone can do the conversion from clock speeds and GigaFlops to horsepower and Torque? Start talking in powertool talk, and a huge chunk of the population will suddenly start to understand a bit better.

    21. Re:Why? by egomaniac · · Score: 1

      Why not just dump MHZ as a rating altogether? Wouldn't FLOPS-based (Floating Operations Per Sec) or something similar be a better measurement? Maybe how far a simple program can compute PI in a second? We should really be looking at an operational-based measurement rather than a clock-based one.

      Unfortunately, these things are very, very easy to fake. One quarter after you switched over to a PI-digit-computation metric, Intel and AMD would both have the most heavily-optimized-for-PI-computation processors that you've ever seen. They might suck for everything else, but hot damn they'd compute PI fast.

      Plus, then you get into the huge gray area of "well, this processor can compute PI unbelievably fast when you use a good compiler with the right settings, but gcc with default settingsproduces lousy code for it...". What compiler do you use to compile the benchmark? Do you have to use the default settings for every option? How can you be sure that you're seeing actual differences between the processors, instead of just differences in how well the compiler optimizes for each processor? For that matter, maybe the source code itself is biased toward a particular processor -- just restructuring it so that the loop is a little tighter (and thus fits into L1 cache) might boost the performance on one processor by 5x, but make absolutely no difference on another processor.

      Benchmarking is very much a black art despite the huge number of research papers published on this subject. In short, there is no simple and reliable standard for judging a system's performance today. With no good objective standards to cling to, MHz are a simple and easy number to go by, and so that's what the market does.

      --
      ZFS: because love is never having to say fsck
    22. Re:Why? by confused+one · · Score: 1
      most heavily-optimized-for-PI-computation processors

      why bother optimizing to compute it... just store it internally out to 128 digits and when the processor identifies someone trying to compute Pi, spit out the stored value. Done.

    23. Re:Why? by planckscale · · Score: 1
      I like SuperPi as a good processor benchmark. It simply calculates Pi over and over. It does calculation of pi up to 33.55 million digits. Dual processors at 1ghz will of course be faster than one processor at 1ghz. Gaming benchmarks are okay but the video card has a lot to do with fps - not just the CPU.

      --
      Namaste
  4. Size of pipeline by odeee · · Score: 4, Funny

    It's not the size of your pipeline that counts... its how you use it.

    1. Re:Size of pipeline by Anonymous Coward · · Score: 0

      It's not the size of your pipeline that counts... its how you use it.

      I don't suppose then you're going to tell all of us who are aware of the performance shortcomings of the P4 that we're simply using it wrong...

      Face it. Intel's been doing whatever they can to sacrifice performance to get raw clock speed increases. It's the numbers that sell the chips, not whether or not they're actually efficient anymore.

    2. Re:Size of pipeline by sfraggle · · Score: 2, Funny

      I hear Prescott packs quite a punch.

      --
      were you expecting to see a sig here? perhaps you'd rather see the inside of an ambulance!
    3. Re:Size of pipeline by Anonymous Coward · · Score: 0

      repeat after me.

      I am a moron.

      I am a moron.

      Thank you

    4. Re:Size of pipeline by Hoser+McMoose · · Score: 4, Interesting

      Ironically enough, that's quite accurate for processors!

      A 6-stage pipeline with terrible branch prediction and all sorts of holes in it isn't going to do any good at all, while a 30 stage pipeline with great branch prediction (and the P4 does have great branch prediction) and few bubbles or holes (improved SMT, aka hyperthreading, is supposed to help here) will do wonders.

      Of course, the real question is now how long the total pipeline is, but the branch mispredict penalty. It should be noted that the "Northwood" P4 has a 28-stage pipeline, but only a 20-stage mispredict penalty. If the "Prescott" has a 30-stage pipeline with a 22-stage mispredict penalty, it isn't exactly a huge change.

  5. Re:Hmmm. by Rosyna · · Score: 1

    Isn't that what caused the Great Chicago Fire of 1871?

  6. I guess the home market rules... by ghostis · · Score: 4, Interesting

    I work at an engineering firm. The deep pipelines in the current P4 perform so poorly with general number crunching (e.g. matlab) we have almost completely switched to Athlons and are seriously considering Opteron.

    -ghostis

    --


    Computer Science is all about trying to find the right wrench to bang in the right screw. -T.Cumbo?
    1. Re:I guess the home market rules... by Silvers · · Score: 0, Flamebait

      If you were to use SSE2 you would see an incredible performance boost.

    2. Re:I guess the home market rules... by woodhouse · · Score: 1

      If you care about performance at all, why on earth are you using matlab?

    3. Re:I guess the home market rules... by LehiNephi · · Score: 4, Insightful

      I see this as a huge opportunity for AMD. They rate their processors based on how many times faster than a Duron 1 GHz runs. Thus, an AthlonXP3000+ runs three times as fast.

      However, Intel rates their chips by clockspeed, and with the less-efficient pipeline, a 3 GHz P4 is not three times as fast as a 1GHz P3.

      Thus, as chips get faster, AMD's chips will get better performance, not only cycle-for-cycle, but even rating-for-rating!

      --
      Help find a cure for cancer. Join the [H]orde
    4. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Yes, I work in an engineering firm also. We find the excess pipelines of the P4 are so bad for cache performance for CAD and numerical system dynamics simulations that we have gone back to using our Amigas.

    5. Re:I guess the home market rules... by Aardpig · · Score: 4, Insightful

      If you were to use SSE2 you would see an incredible performance boost.

      I doubt it, I really do. Present-day x86 chips aren't limited by their FP processing speed, the real problem is memory latency and bandwidth. For instance, my 1.8 GHz P4 regularly performs in excess of 1 Gflops when running benchmark tests for the ATLAS BLAS. However, these benchmarks are specifically designed to fit in cache, to have predictable branching, etc etc.

      Unfortunately, in real-world situations cache thrashing is difficult to avoid, and accurate branch prediction is a highly non-trivial affair. When a prediction turns out to be wrong, the cost of refilling a stalled pipeline increases in proportion to the pipeline length. The ever-lengthening pipelines of P4 chips means that, although its FP performance may r0x0r, the overhead of stalls makes production code run like treacle.

      --
      Tubal-Cain smokes the white owl.
    6. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Because they include application development time in their performance estimate?

      "The MATLAB code took 8 hours longer to run, but we finished writing it a week before we finished the C version"

    7. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Have you patched all the known bugs with MATLAB on the P4 (Sparse matrix slowness, slowness with NaNs, poorly optimized .mex files)?

    8. Re:I guess the home market rules... by EulerX07 · · Score: 3, Interesting

      Matlab can hardly be beat in speed when you need to produce custom software to crunch huges matrices full of number. You can have a GUI designed, working, put some code quickly together that can grab data from any txt format, run mathematical formulas on those data. Then you can do any operations you want on the matrices that are in memory and easily accessible. Want to throw your data into a chart? A few minutes of coding and you've got the perfect chart on there.

      Back in my days of internship at the canadian space agency, I'd program multiple custom apps to pre-process the data before it being fed to the mainframes of a contractor for finite element analysis. Matlab is the tool to use for anybody involved in scientific projects. Yes, your code in C will run much faster, but it'll take significantly longer to get it up and running.

      If you run a lot of loops and it's really bogging the performance down, you can program just those sections of code in C and compile with matlab libraries to be able to use it in Matlab like the native commands. I did one piece of code that took a finite element file and created the 3d model in matlab. Took 20 minutes to run the code in matlab, 3.45 seconds once I had compiled the tough part of the code in C.

      In the end it's all about using the right tool, and for engineering/matlab, Matlab is excellent.

    9. Re:I guess the home market rules... by Mr.+Frilly · · Score: 2, Interesting

      Just another (single) data point to add, for the image reconstruction software I use routinely, I get these performances:

      intel pentium IV, 3.2 GHz: 5.0 minutes
      athlon XP, 1.533 GHz: 5.7 minutes
      intel pentium III 733 MHz: 8.1 minutes

      From the PIII to the PIV, a 340% increase in processor speed, I get 60% increase in performance...

    10. Re:I guess the home market rules... by Cecil · · Score: 2, Interesting

      The deep pipelines in the P4 perform poorly, period. Even when running simple desktop apps on a Windows machine, I notice my P4-2.5GHz w/1GB RAM at work often jerks around or lags, while my Athlon 1900XP+ w/256MB RAM at home works like lightning. Obviously processor is not the whole story, but I think that under typical, multi-tasking usage, the deep pipelines are even more painful than benchmarks suggest.

      Disclaimer: I am not an EE, so I could very well be full of shit.

    11. Re:I guess the home market rules... by Jeff+DeMaagd · · Score: 1

      I am fairly sure that you aren't the only one to come across this. I don't think it is the deep pipelines...

      IIRC, the P4 has only one full standard FP unit (vs Athlon's & Opteron's 3 FP units?), the majority of the floating point was expected to be done by SSE2 optimized code but it kind of backfires a little if the number crunching isn't done by SSE2 code.

    12. Re:I guess the home market rules... by Anonymous Coward · · Score: 1, Informative

      They rate their processors based on how many times faster than a Duron 1 GHz runs. Thus, an AthlonXP3000+ runs three times as fast.

      Where the hell do you get that from? The "quanti" speed rating is supposed to compare directly to the P4.

    13. Re:I guess the home market rules... by MoronGames · · Score: 1

      No, the AMD PR rating shows how fast their processors run compared to the original Athlons. An Athlon XP 3000+ runs as fast as a 3GHz Athlon would.

      --
      hey!
    14. Re:I guess the home market rules... by woodhouse · · Score: 2, Interesting

      Each to their own I suppose. I admit I don't have much experience with Matlab (I'm planning on keeping it that way). As a college project, we were told to use matlab for a computer vision task. I tried everything to optimise it, followed all the guidelines on vectorising code and not using loops, and eventually found that the only way to do it was to write the critical code in C, as you suggest (this improved the speed by a factor of 100). In the end, there was almost no advantage from having used matlab and I would have been better to just write the whole thing in C.

      What baffles me the most is that people use it for image processing, of all things. Surely if performance is important anywhere, it's here? It doesn't help that Matlab 6.5 runs on a Java back end.

    15. Re:I guess the home market rules... by timeOday · · Score: 4, Informative

      No, surely AMD will simply change their metric to match whatever Intel is putting out. IMHO there's no way AMD will label something 4000 when it's faster than a PV 4400. That defeats the *whole point* of not using the real clock speed in the first place.

    16. Re:I guess the home market rules... by Laser+Lou · · Score: 3, Insightful

      However, Intel rates their chips by clockspeed, and with the less-efficient pipeline, a 3 GHz P4 is not three times as fast as a 1GHz P3

      I don't have hard data on this, but doesn't the impact of the pipeline depend on how the software it runs is compiled? If the object code is compiled to reduce branches, the longer pipeline should drastically speed up processing. That would theoretically make a 3GHz P4 MORE than three times as fast as a 1GHz P3.

      --
      No data, no cry
    17. Re:I guess the home market rules... by Silvers · · Score: 1

      I am well aware of cache and branch mispredict penalties.

      However, look at some data for non-SSE apps that have been patched to support SSE. The difference is incredible. We're talking >30% for heavily optimized applications.

    18. Re:I guess the home market rules... by RainbowSix · · Score: 1

      Note to mods: parent is clearly wrong. How did this get +5? As others have stated, the AMD rating is an estimation of how fast their processor is compared to an Intel Pentium 4 running at the PR speed in megahertz.

      --
      --------
      It's OK to be social, just don't tell anyone about it.
    19. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Yeah, but when it's equally fast as 4200 it might make sense.. You definitely wouldn't want to make it so that AMD 4000 is slower than Intel 4000.

    20. Re:I guess the home market rules... by buysse · · Score: 2, Interesting

      I thought that SSE and MMX both had significantly lower precision than standard IEEE floating point ops. If I'm wrong, please correct me, but if it is lower precision, it makes it useless for Real Work(tm).

      --
      -30-
    21. Re:I guess the home market rules... by Stigmata669 · · Score: 1, Insightful

      Not to burst your bubble, but those XXXX+ numbers are just a marketing scam to compete in the mhz myth against Intel P4s. the XP1800+ wasn't called that cause it's 1.8 times faster than a 1ghz duron, it's the 1800+ to compete with the P4 1.8 ghz proc.

      --
      Yawn.
    22. Re:I guess the home market rules... by Mauvious · · Score: 1

      No, the Athlon XP only performs marginally faster than the original Athlons, since basically the only addition is SSE. The Athlon XP 1600+ (at 1.4GHz) performed on par with the Athlon 1.4GHz. However, you should not that the original Athlons were up against PIIIs and were targeted as such. The new performance rating was to help boost the marketability of the Athlon against he PIVs which were clocking at very high speeds, but with marginally better or lower performance than the lower clocked Athlons. However, it should be noted that during the time the Athlon XPs were introduced, so was DDR memory, which helped with the bandwidth issues. That said, its possible that Athlon XP 3000+ 512KB cache and 333MHz DDR > Athlon 3000MHz 256KB cache and 133MHz SDRAM. But, the same would definitely not be said if you used 333MHz DDR with the 3000MHz Athlon I bet.

    23. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Actually, I thought it was a Java front end GUI, but that the engine was native code. Otherwise, I assume that I would have heard howling from users about speed dropping. If critical portions are native code through JNI, performance would be fine, even with primarily Java back end.

    24. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Honestly bro, check for spyware. Ad-Aware and Spybot S&D. Most of the Gator-ware will cause poor perceived performance.

    25. Re:I guess the home market rules... by nelsonal · · Score: 1

      The quick answer is that a longer pipeline lets a processor boost it's clock speed significantly. However, if you mis predict, you have to go back and search for the right data (if it's not in cache you have to go to ram, not in ram HD, then it takes roughly one cycle for each stage in the pipeline and you're back in action. I've heard branch prediction has an upper 90% hit rate, but that still means a few percent of the time you have to wait, in this case 30 cycles to accomplish the next step.
      In general Intel's design philosiphy has been to trade everything for a faster clock speed, even if it doesn't actually increase the overall system performance. I think it was on average about 40% that you lost going from P3 to PIV. Consider the Pentium M (it's an PIII done on a modern process with some optimizations for power). Intel doesn't like to mention it but I think they run at or under 2 Ghz and are quite competitive with the best PIVs out there.

      --
      Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
    26. Re:I guess the home market rules... by be-fan · · Score: 1

      You should try it. The P4's x87 FPUs are phenomenally weak, much moreso than the P6-era CPUs. And memory latency and bandwidth aren't a huge problem, because the P4 has one of the fastest single-processor memory busses you can find. 6.4GB/sec is nothing to sneeze at!

      --
      A deep unwavering belief is a sure sign you're missing something...
    27. Re:I guess the home market rules... by be-fan · · Score: 2, Informative

      SSE does standard IEEE754 signel or double precision math. The Pentium 4's SSE2 unit (actually its FPU, but thats a detail) can handle 4 single-precision or 2 double-precision operations per cycle.

      --
      A deep unwavering belief is a sure sign you're missing something...
    28. Re:I guess the home market rules... by tomstdenis · · Score: 5, Interesting

      It isn't just branches though. For example, a 32x32=>64 multiplication on the P4 can take upto 14 cycles [iirc] whereas on the Athlon it's 6-cycles. So for example,

      MUL EAX,EBX [DIMMMM]
      ADD ECX,EAX [_D___IE]

      So in total takes seven cycles.

      The same code on the P4 would take at least 15 cycles. What's worse is consider

      MUL EAX,EBX [DIMMMM_]
      ADD ECX,EBX [_DIE___]
      INC ESI [_DIE___]
      DEC EBP [__DIE__]
      ADD EBX,EDX [__D__IE]

      Again this takes seven cycles. Specially since instruction 1 and 2 can go start in cycle two in pipes 1/2.

      Compare that to the P4 which only has two ALU pipes [one of which is now stalled for 14 cycles for the MUL to finish].

      Tom

      --
      Someday, I'll have a real sig.
    29. Re:I guess the home market rules... by tomstdenis · · Score: 2

      My second example is slightly off. It would be

      MUL EAX,EBX [DIMMMM__]
      ADD ECX,EBX [_DIE____]
      INC ESI [_DIE____]
      DEC EBP [____DIE_]
      ADD EBX,EDX [____D_IE]

      [use a fixed-width font to read that...] for eight cycles not seven.

      [Where D = decode, I = issue, E = execute]

      Tom

      --
      Someday, I'll have a real sig.
    30. Re:I guess the home market rules... by tomstdenis · · Score: 3, Informative

      MMX doesn't do FP [it's int only].

      Both SSE and 3DNOW use formats the normal FPU can read so I'd say it's standard [hint: you can assign an array of two well aligned floats to a 3dnow 64-bit word and use it].

      SSE supports both double/float precision [as another poster pointed out]. Heck even the Athlon supports SSE [though I wouldn't use it. Hint: SSE reg == 128-bits and the Athlon CPU can only perform upto 64-bits of read per cycle...]

      Tom

      --
      Someday, I'll have a real sig.
    31. Re:I guess the home market rules... by zenyu · · Score: 3, Informative

      I thought that SSE and MMX both had significantly lower precision than standard IEEE floating point ops. If I'm wrong, please correct me, but if it is lower precision, it makes it useless for Real Work(tm).

      It performs precise math by default. You can only use 32 or 64 bit floats, the "long double" 80 bit floats are not supported. But this often isn't a problem. You can also turn off denormals, and with interupts on bad math (divide-by-zero type stuff). Turning those off hasn't given me any performance boost, but I still consider these things features not bugs. There are some low precision operations available, but no compiler I know of uses them unless you ask for em. I do in some cases but then I know what I'm getting.

      A math person may give you a better answer than me. I'm a graphics person, a field where SSE2 is a godsend compared to the stack based floating point units that came before.

    32. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      That's a load of crap. SSE is useful only in very limited cases where there is more math than memory access. Here's a real-world example from a well-known product.

      discreet worked closely with Intel to optimize the scanline renderer for 3dsmax, and the result was a little checkbox in the rendering options to "Enable SSE." Rendering images or animations can take minutes, hours, or days, so every little bit of speed helps.

      So, how did Intel do with their SSE optimizations?

      In the end, turning on that checkbox actually slows down the renderer by nearly 5%, and thus it is turned off by default!

    33. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      I call bullshit. Show me the data! The Discreet renderer may very well run slower but it is probably due to other factors than SSE. 3DSMAX is a shit of an application anyway.

      I have personally seen SSE enabled code rip the hell out of non-SSE code.

    34. Re:I guess the home market rules... by Eccles · · Score: 1

      Altivec, incidentally, the Mac "equivalent", is purely 32 bit. Otherwise, I might be playing with Altivec ops these days myself.

      --
      Ooh, a sarcasm detector. Oh, that's a real useful invention.
    35. Re:I guess the home market rules... by wmansir · · Score: 2, Insightful

      Don't you see, that is the entire point of moving to a longer pipeline: to inflate the MHZ.

      Intel don't care if a Prescott 4.0GHZ is twice as fast as a Pentium 4 2.0 GHZ. Just as a P4 2.0GHZ is not twice as fast as a PIII 1.0GHZ. They just want to get to 4.0GHZ.

      Intel doesn't care if AMD's 4000+ is actually faster than their 4000MHZ part, they just want to have a 4000MHZ part to market before AMD.

    36. Re:I guess the home market rules... by Saint+Stephen · · Score: 1

      Why don't you just use the profiling tools to make your code run faster? The DirectX people do it all the time, Intel's tools will profile your code and show you what to fix. Just like programming for SMP, you should program properly for P4. I know it's a bitch, but that's life.

      Code to fix == stalled pipelines. Surely there are general purpose math libraries optimized for P4 by now?

    37. Re:I guess the home market rules... by EvilTwinSkippy · · Score: 1
      There are limits to how much you can route around the brain-deadedness of a design.

      Having to deal with a 30+ stage pipeline makes the compiler VERY complicated, and it also magnifies any errata.

      Another thing to consider is that the processor is continually interupted by I/O tasks on a desktop PC. It has to drop what it's doing, respond to the interrupt, and then re-load the state from where it was at. Simple things like disk access and network packets generate interrupts.

      A P4 would work great in a supercomputer. And that's about the only application it would be good for. Well, decent at. A PPC would eat it for lunch because it has a few extra ALU's and FPU's.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    38. Re:I guess the home market rules... by Fulcrum+of+Evil · · Score: 1

      And memory latency and bandwidth aren't a huge problem, because the P4 has one of the fastest single-processor memory busses you can find. 6.4GB/sec is nothing to sneeze at!

      Are they still limited to 3 decodes/clock?

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    39. Re:I guess the home market rules... by Saint+Stephen · · Score: 1

      Well, I'm no compiler geek, but from my work on scaling up C++ apps on Win32, the goal is to do as much work in your quanta before the O/S tears town the registers to handle the interrupt, &c. You're going to get switched every quanta anyway, but don't artificially do it, then you are "doing the theoretical maximum". This is what the few apps that scale up on Win32 to multiple CPUs (SQL Server is the only example I know of) do -- the term is "compute-bound", or "cpu-bound", rather than "memory-bound" or "io-bound". Mpeg encoding is another CPU-bound task.

      Plus you would use at least a dual CPU so that 1 CPU basically gets I/O detail, and N-1 cpus do work.

      Someone other poster can amplify this -- I'm basically parroting what the guys on the NT perf time (for whom I was a consultant) told me.

    40. Re:I guess the home market rules... by dasmegabyte · · Score: 1

      Your machine would not "jerk" or "lag" because of poor performance on the chip. Even if the machine were taking twice as long to process data than your Athlon, the time in between cycles is so infinitesimally small that there is NO WAY this is causing your slow downs.

      More likely, your work machine has a) a slow hard drive b) a lot of network traffic, which is tying up your system bus c) poorly setup virtual memory. Try setting it manually to a minimum AND maximum of twice your ram, or 2048 meg. this will prevent it from ever resizing your virtual mem, also allowing you to defragment the paging file and move it closer to the inner rim of your hard disk. Just doing this doubled the speed of my compilations.

      --
      Hey freaks: now you're ju
    41. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Aaaaah...CHOO!

      http://www.apple.com/g5processor/architecture.ht ml

    42. Re:I guess the home market rules... by be-fan · · Score: 2, Informative

      The decode bandwidth is a single x86 instruction per clock, but that's not a huge problem because of the trace cache. The issue bandwidth is three u-Ops per cycle, but this isn't a huge limitation because the P4 is a relatively narrow architecture. Its only got two ALUs and two FPUs compared to an Athlons three ALUs and three FPUs.

      --
      A deep unwavering belief is a sure sign you're missing something...
    43. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      There you have it! Undisputeable proof that Intel's P4 is slower than AMD's Athlon. Slashdot reader 'Cecil' has noticed 'jerks' and 'lags' on his P4, but not in his Athlon! And whats more, even though he does not have an engineering background, he knows more about the optimal pipeline depth than 1000's of Phd Intel engineers! Fantastic!

    44. Re:I guess the home market rules... by WuphonsReach · · Score: 1

      The deep pipelines in the P4 perform poorly, period. Even when running simple desktop apps on a Windows machine, I notice my P4-2.5GHz w/1GB RAM at work often jerks around or lags, while my Athlon 1900XP+ w/256MB RAM at home works like lightning. Obviously processor is not the whole story, but I think that under typical, multi-tasking usage, the deep pipelines are even more painful than benchmarks suggest.

      Um, no...

      A far simpler explanation is that you're using a different motherboard with different hardware drivers. Some chipsets are "smoother" then others and it (mostly) has nothing to do with the CPU.

      I have half a dozen AMD systems, some motherboards run like silk when under load, others are quite jerky/laggy. (I still prefer my AMD systems over Intel, party because by supporting the competition I make sure that Intel can't rest on their laurels or unilateraly implement restrictive technology.)

      --
      Wolde you bothe eate your cake, and have your cake?
    45. Re:I guess the home market rules... by Dr.+Zowie · · Score: 1
      I posted this somewhere else -- but the problem with Matlab and all the other vectorized languages is that they work in the wrong order to preserve cache. Ideally you want to arrange your instructions so that your list of operations is in the innermost loop, and pixels are in a larger loop (because looping over pixels usually requires you to access RAM, while each pixel's working memory can usually fit in cache). But the list of operations is handled by the slow interpreter, rather than fast compiled code -- so you're hosed.

      One day someone will write a properly optimized vector language. PDL started going that way (with lazy evaluation and queued dataflow) but it hasn't materialized.

    46. Re:I guess the home market rules... by Hoser+McMoose · · Score: 1

      The idea of a longer pipeline is that it makes it easier to clock the chip to higher speeds. This is obviously successful for the P4. The highest Intel was able to push the P6 core of the Pentium 3, when using a 180nm manufacturing process, was 1.13GHz. Even then they had to recall their first attempts at 1.13GHz chips and it took them nearly a year to get them working.

      On the exact same 180nm manufacturing process Intel was able to crank out 2.0GHz P4s with no trouble at all.

      As for 3 times the performance, have you checked out the SPEC numbers? Intel's 1.0B GHz PIII processor managed 457 CINT2000 and 310 CFP2000, while the P4 3.0C GHz managed 1265 CINT and 1229 CFP. So the P4 is actually 2.77 times faster at integer code and 4.08 times faster at floating point code when compared to the PIII. Ok, maybe not a 100% fair test since different compilers were used, but even if you used the newest compilers for the old PIII's you would still get more than a 3 times speed up in floating point code for a 3GHz P4 vs. a 1GHz PIII.

    47. Re:I guess the home market rules... by Lars+T. · · Score: 1

      I'm sure the Intel compiler does everything it can to prevent or delay branches, but the old 20-stage P4 already taxes that ability close to its limits.

      --

      Lars T.

      To the guy who modded me down from perfect to terrible Karma - Apple haters still suck

    48. Re:I guess the home market rules... by Nurf · · Score: 2, Informative

      Note to mods: parent is clearly wrong. How did this get +5? As others have stated, the AMD rating is an estimation of how fast their processor is compared to an Intel Pentium 4 running at the PR speed in megahertz.

      No. You are clearly wrong. The PR rating is relative to an AMD Thunderbird Core. If you don't know what you are talking about, you should just shut up. Here is a link and here is another.

      Intel are shouting about megahertz because its all they have. For most real world applications (ie. Not encoding video) the Pentium 4 cores are abysmally inefficient. Anything that is branch heavy (such as a compiler, for example) is a complete nightmare for a P4.

      For that matter, I'm writing a video encoder in my spare time, and the AMD chips are still a better match for the sort of stuff I am doing.

      --
      ---
    49. Re:I guess the home market rules... by gjm11 · · Score: 5, Funny

      "DIMMMM / DIE / DIE / DIE / D_IE" ... You aren't an employee of Rambus Inc. by any chance?

    50. Re:I guess the home market rules... by JamesP · · Score: 2, Informative

      Present-day x86 chips aren't limited by their FP processing speed

      The problem with x87 is not speed. It uses an antiquate programming model, using a stack. So you have to shuffle things in the stack to make it work, and this takes a lot of time.

      SSE2, OTOH, is very easy and fast. 2 calculations at the same time, and in the format A+B=C

      --
      how long until /. fixes commenting on Chrome?
    51. Re:I guess the home market rules... by kimmo · · Score: 1

      Well, the SSE didn't/doesn't do much unless properly used. It wasn't/isn't in even most of the current software, not to mention SW few years back. The real difference between "classic"/XP athlons is the hardware data prefetch in XPs.

    52. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      I would.

      3dnow is indeed slightly faster, but you'll eventually get problems with register shortage. For example, an alternating set of pfadds and pfmuls has 4 cycle latency but throughtput of 2. This means that you'll have to have 8 things going on in parallel.
      8 things, 8 registers.. I think I could have some use for free slots too.

    53. Re:I guess the home market rules... by tomstdenis · · Score: 1

      Perhaps but 3dnow instructions are mostly directpath. They won't stall all three decoders. So while SSE lets you process twice as much data it incurs at least two additional cycles of penalties (the manual states it can be upto 4 cycles!!!).

      The trick would be to mix the appropriate amount of ALU code [e.g. pointer updates, counters, whatnot] between 3dnow instructions.

      Tom

      --
      Someday, I'll have a real sig.
    54. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      Lucky I wasn't drinking anything at the moment..

    55. Re:I guess the home market rules... by Zak3056 · · Score: 1

      Disclaimer: I am not an EE, so I could very well be full of shit.

      I'm not an EE either, but yes, you are full of shit. "Typical desktop apps" leave your CPU spending most of its time in an idle loop. Blaming "poorly performing deep pipelines" for whatever problem you're having is like assuming that the missing TYPE-R stickers on your Honda Civic are responsible for the strange sound your engine is making.

      --
      What part of "shall not be infringed" is so hard to understand?
    56. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      The decoding part is right and unfortunate. Luckily athlon64 fixes that.
      But 4 /additional/ cycles? Never seen that. In fact I've managed to get very closely the same performance out of both 3dnow and SSE code in trivial stuff (e.g scale&translate at ~1 element/cycle).

    57. Re:I guess the home market rules... by Darren+Winsper · · Score: 1

      It's not really a scam, since it's normally pretty accurate. Thus, people have a quick-and-dirty way of getting a rough comparison. AMD do have to deal with dumb consumers, who seem to think that a higher MHz means it's faster than any other processor of less MHz.

      I'm quite surprised Apple stuck with GHz, since the G5 does really well against P4s with a 50-60% higher clock speed.

    58. Re:I guess the home market rules... by Anonymous Coward · · Score: 0

      It's not a mtter of how much memory bandwidth you have. It's how you use that bandwidth.

      The original P4's had as much or more memory bandwidth than the equivalent Athlon's. However, Athlon's did more work, clock cycle for clock cycle, because all that memory bandwidth was spent refilling the cache rather than delivering instructions for processing. Most of that was because the pipeline depth was optimized for hyperthreading with extra arithmetic units but they canceled the extra arithmetic units and decreased the cache size due to what I believe were economic reasons.

    59. Re:I guess the home market rules... by cheezedawg · · Score: 1

      but I think that under typical, multi-tasking usage, the deep pipelines are even more painful than benchmarks suggest.

      Ok- lets say that you get really unlucky and you miss 100 branch predictions in a row. If each branch miss costs you 30 cycles at 2.5 GHz, those 3000 cycles cost you about 400ns. You are not going to notice anything under several milliseconds. Thats only 6 orders of magnitude different. In other words, that has nothing to do with your CPU.

      Intel's pipeline depth was a conscious design choice. In fact, they have done research and concluded that the current P4 pipeline depth is not deep enough (source source). According to this research, the optimal pipeline depth for x86 is between 40 and 50 stages. There are tradeoffs to a deeper pipeline, but rest assured that Intel has considered them in their design decisions...

      --
      "The defense of freedom requires the advance of freedom" - George W Bush
    60. Re:I guess the home market rules... by Endive4Ever · · Score: 1

      Well, I suppose Apple could have tried to keep the clock speed of their system a trade secret, but someone would have eventually figured it out.

      Although you probably mean 'GHz' on their marketing materials.

      Me, I've got things like static 12-bit Harris 6100 processors that run the PDP-8 instruction set. Since they're all static registers inside, I can run them at .05 Hz if I wish. Or just sit there at the table by the circuit board using a knife switch to clock through the instructions at an arbitrary rate.

      You try that with any of the fancy-schmancy new-fangled processors and they stall, with their dynamic registers.

      --
      ---
    61. Re:I guess the home market rules... by ImprovOmega · · Score: 1

      Having personally implemented several algorithms in SSE on the job, I can attest to a halving of the time that portion of the code takes versus regular optimized C code (executed on the FPU). This, however, is operating on four 32-bit floats at a time, so memory bandwidth is still a limiting factor.

    62. Re:I guess the home market rules... by ElliotLee · · Score: 1

      Then isn't it a good thing (tm) that Intel is choosing to use MHz/GHz (clock speed) to boost the performance of its processors? Imagine if both AMD and Intel decided to drop the MHz myth. What would you compare to? There would be no reference point, no way to name the performance of the processors (unless artificial or arbitrary, or something completely new altogether). Hmm! With no more MHz, what would that new reference point be?

    63. Re:I guess the home market rules... by EvilTwinSkippy · · Score: 1
      I am a computer guy.

      MPEG is a memory bound task. It's also a disk bound task. I can't think of many computers that can keep gigabytes of data in RAM, let alone in the registers of the processor. The hardest part about playing DVD's was getting the information across the IDE bus in real time. Ok, and before we had 266Mhz memory busses, getting the data through to the processor in real time.

      SQL is slaved to the network card. It too is disk and memory dependent. Sure it will try to cache as much of the stuff stored on disk in memory, but RAM is finite and fleeting. Plus it has to store the data somewhere.

      Now SQL can scale to multiple processors because you are almost never running only one query at a time. If 2 or more queries are running, each processor can do one in parallel.

      As far as "optimizing" your c++ code, don't bother. Yes, there are inefficient algorythems out there that should be avoided. In general, the compiler takes care of the optimizations for you. The compiler also takes care of grouping code together, and the OS controls when it's swapped in and out of the processor. About the only thing I've seen come out of developers writing "optimal" code is undecipherable code.

      And no, it didn't really run any better.

      You have to have an intimate understanding of your processor architecture before you can make decisions about quanta. And then, you are generally coding in assember.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    64. Re:I guess the home market rules... by akuma(x86) · · Score: 1

      Unfortunately, in real-world situations cache thrashing is difficult to avoid, and accurate branch prediction is a highly non-trivial affair. When a prediction turns out to be wrong, the cost of refilling a stalled pipeline increases in proportion to the pipeline length. The ever-lengthening pipelines of P4 chips means that, although its FP performance may r0x0r, the overhead of stalls makes production code run like treacle.

      The cost of refilling the pipe is measured in time and not pipe-stages. If you're running a processor with 5 stages of misprediction penalty at 1 GHz and another processor with 10 stages of misprediction penalty at 2 GHz, then they will refill their pipes in the same amount of time.

      The penalty of the P4 mispredicts relative to the penalty of Athlon mispredicts is NOT the ratio of pipe-stages because the P4 runs at a much higher clock by design.

      On another note - Scientific FP code tends to be heavily loop based which means that branch prediction is very accurate and longer pipes don't hurt as much. If your memory access pattern is predictable, prefetching can solve many problems with the cache.

    65. Re:I guess the home market rules... by Anonymous Coward · · Score: 0
      Why did you write it in C? Assembly has much higher performance.

      High level langauges are very useful. Matlab is a higher level language than C. I have no problem getting the same performance from Matlab as C, since most of my problems spend 99% of the CPU time doing FFT's and other matrix operations. Our Matlab code is much easier to read, write, and fix than C. A skilled programmer will rewrite a slow Matlab function in C, keeping the rest of the setup code in Matlab. I've never had to do this (for performance reasons), but it's the same trick I use in C if it's too slow vs assembly.

      I guess my point is, that you never really learned to use Matlab properly, so you really don't know what you are talking about.

      BTW, when I say Matlab, I really mean Octave.

  7. History repeats itself..... by Selecter · · Score: 5, Interesting
    I guess Intel's short term game plan is to keep the Mhz game going yet again until they can get something going on the 64 bit front worth having.

    I suspect AMD and even Apple are going to shrink Intel's bragging rights in that same time frame unless Intel gets their act together. From AMD's recent earnings report it sure seems somebody is buying Athlon 64's.

    Intel blew it when they made the decision to let 32 bits ride for another 2 to 3 years. They look like old fuddy-duddys now. It's AMD and Apple via IBM thats has the cool shit.

    1. Re:History repeats itself..... by Slack3r78 · · Score: 1

      Er, where are you getting your data? Virtually every benchmark I've seen has the Athlon 64's thrashing the P4EE in everything except video encoding in 32 bit mode, something the P4 has always been traditionally very good at.

    2. Re:History repeats itself..... by dpilot · · Score: 4, Insightful

      Intel has backed themselves into a bit of a corner, in the process of repeating history. With Itanium, they've proven that they're more concerned with their own strategies than they are with delivering solutions to their customers. But they've sunk so much money and image into Itanium that they can't back out, yet. No doubt there's someone inside the company, probably a wild duck, working on the right time to jump ship and how to spin it.

      In the meantime, Intel has the one-two bait and switch with P4-Celeron and the true P4. If they didn't have a TON of money and market clout, they'd be in big doo-doo right about now. As it is, AMD is the one in big doo-doo, not because they have the lesser product, but because of Intel's clout.

      Listen to any computer commercial, and they pretty much all have those 5 co-advertising tones at the end. That's monopoly power, that's market clout. (If I were in charge, the antitrust penalty would ratchet up every time those tones sounded.)

      Maybe Intel blew it, but they'll survive.

      --
      The living have better things to do than to continue hating the dead.
    3. Re:History repeats itself..... by Pieroxy · · Score: 4, Insightful

      Dude, it's the same with any innovation. You have to wait for the software to follow. Why are you making a big fuss out of it? When they introduced P4 with their new architecture, tests shown that it wasn't all that faster than a good old P3. Then compilers and software in general adapted and it became faster.

      Same with the P3, the P2, the Pentium, the 486, 386, 286 (Even though no one adapted to this shit) and the 086. So yes, history repeats itself, and it is for good (at least on this one).

    4. Re:History repeats itself..... by Anonymous Coward · · Score: 0

      not true. Windows suffered with 286 influence for a long long time. far ptrs???? ugh

    5. Re:History repeats itself..... by Toraz+Chryx · · Score: 1

      actually, it's pretty much only DiVX encoding that the P4 wins big on, for other codecs, it's either a wash or the Athlon pulling ahead.

    6. Re:History repeats itself..... by mrm677 · · Score: 1

      Why don't you have a look at the latest SPEC benchmark scores? The numbers speak for themselves. Itanium scores are very impressive. I was once a doubter too.

    7. Re:History repeats itself..... by Anonymous Coward · · Score: 0

      Actually, the Intel "song" is 4 note long, not 5.

    8. Re:History repeats itself..... by jtosburn · · Score: 1

      You don't know you're history so well.

      When the 386 was introduced it was, clock-for-clock, twice as fast as the 286, no recompiling necessary. But it helped...the move to 32 bit code had begun. When the 486 was introduced, it smoked the 386, once again by being just about twice as fast as it's predecessor, again clock-cycle for clock-cycle. Without recompiling. But it helped, now that an fpu was standard (ignoring the crippled 486sx).

      The Pentium was the first to really require recompiling to fully take advantage of the new architecture, but even without that it was not quite twice as fast as the 486. Intel claimed that an additional 30% or so gain (IIRC) could be had by recompiling, but no one really bothered. Why lose the legacy market? And remember that being optimized for the Pentium was originally one of Mandrake's reasons for being.

      Well, then the Pentium Pro (later devolved to the Pentium 2) was introduced, and the increase broke stride a bit. Faster but maybe by only 1.5x or so. The P3 wasn't a new architecture so much as a new marketing exercise and a chance to unveil SSE, which provided some flashy performance increases in Photoshop, but that's about it. But that did require a recompile.

      Keeping score? The Pentium 4 was the first new Intel architecture that didn't increase performance accross the board, clock-for-clock. Of course it's clock cycles faster, so it is faster, but no real drama in it. And you're claim that, "Then compilers and software in general adapted and it became faster" is baloney. So little software is specifically optimized for the P4 as to be negligible outside of specific vertical markets. And, even with optimized code, clock-for-clock, the P3 is still faster.

      Whether revisionism or ignorance, you're wrong.

      The parent post's comment about history repeated itself was regarding the intro of the P4, not any generation previous. Hence the criticism then and now of Intel for going for the marketing "Gee Whiz!" effect of maximum gigahertz, rather than actual superlative performance.

    9. Re:History repeats itself..... by ProtonMotiveForce · · Score: 0, Troll

      What a load of shit. If you were in charge, we'd be in Soviet America right now.

      Anybody that is successful would become a monopoly. Microsoft - monopoly (though there are about a dozen viable alternatives to Microsoft). McDonalds (they do have a monopoly on the BigMac, you know). Intel (though AMD is very good competition and there are other competitors in the market).

      Just admit you're a fanboy and be done with.

    10. Re:History repeats itself..... by mixmasta · · Score: 1

      Right, the itanium is a nice machine, but I wouldn't want one at home.

      --
      #6495ED - cornflower blue
    11. Re:History repeats itself..... by Selecter · · Score: 1

      Ok, I'm a Fanboy. Thats wasnt so hard. ;>

    12. Re:History repeats itself..... by Jerf · · Score: 4, Insightful

      Maybe Intel blew it, but they'll survive.

      We don't want them to die. We want them to pass through it and come out an older and wiser company, less inclined to pull shit it has learned the hard way it can't get away with, no matter how big it is.

      Compare the IBM of 2004 to the IBM of 1984.

      If Intel were to "die", the resulting market would have lost the wisdom that Intel is likely to learn over the next couple of years, barring some technical miracle.

    13. Re:History repeats itself..... by parlyboy · · Score: 1

      Maybe Intel blew it, but they'll survive.

      Remember that Intel as a company is doing just fine, thank you very much. In particular, the new Centrino line is without any serious competitors in the laptop space--it's specs are impressive, and it's making Intel money hand over fist.

    14. Re:History repeats itself..... by dasmegabyte · · Score: 1

      Uh, Intel is facing pretty extreme pressure from AMD, IBM, Via and in some sectors from Motorola and even Transmeta.

      That's NOT a monopoly.

      Just because they have a large share of the marketplace doesn't make them a monopolistic entity. And just because they put more money into advertising than into R&D doesn't make them an anti-trust suit waiting to happen. Nike's been doing this for years, and they've yet to be sued for unfair marketting practices by Reebok, Adidas and New Balance (nothing else will TOUCH my duck feet).

      Intel has made very few great decisions in the past ten years, and all of them were related to marketting. Great marketting and cunning FUD have made them the market leader. And when AMD gets around to spinning some of their own...watch the fuck out.

      --
      Hey freaks: now you're ju
    15. Re:History repeats itself..... by Loki_1929 · · Score: 1

      "even with optimized code, clock-for-clock, the P3 is still faster"

      This may not be true in specific applications that support both SSE-2 and hyperthreading. Though I don't have a spcific head-to-head matchup, one particular program I might expect to see higher clock-for-clock performance on the P4 than the P3 is Cinebench 2003.

      This merely shows that under absolutely perfect circumstances, the P4 can perform well. What it also shows is that the P4 requires those perfect conditions to go much of anywhere.

      You also forgot to mention that there was no "086". :)

      --
      -- "Government is the great fiction through which everybody endeavors to live at the expense of everybody else."
    16. Re:History repeats itself..... by PD · · Score: 1

      When the 386 was introduced it was, clock-for-clock, twice as fast as the 286, no recompiling necessary.

      That actually wasn't true. The 386 might have been able to move 4 bytes at a time rather than two bytes, but not many apps could use that kind of speedup. And, if your app benefitted from that kind of speedup, you were better running it on an Amiga or Atari that had a hardware blitter chip.

      Besides, the 386 ran DOS mainly, and therefore was in 16 bit mode. The 286 and the 386 were very comparable at the same clock speeds running DOS. By the end of their lives, you could get a 20 Mhz 286 chip from Harris, or a 40 Mhz 386 chip from AMD I think. It was all about clock speed.

    17. Re:History repeats itself..... by Anonymous Coward · · Score: 0

      Intel has backed themselves into a bit of a corner

      Is a bit of a corner anything like an edge?

    18. Re:History repeats itself..... by Zork+the+Almighty · · Score: 1

      When the 386 was introduced it was, clock-for-clock, twice as fast as the 286, no recompiling necessary.

      No, quite the opposite, actually.

      --

      In Soviet America the banks rob you!
    19. Re:History repeats itself..... by vrt3 · · Score: 1

      Can you be a little more specific? Nowhere on that page do I see something that contradicts that statement. I also don't see anything that supports it , though.

      --
      This sig under construction. Please check back later.
    20. Re:History repeats itself..... by ratamacue · · Score: 1
      Listen to any computer commercial, and they pretty much all have those 5 co-advertising tones at the end. That's monopoly power, that's market clout.

      Market influence, sure. Monopoly power? Give me a break. Last I heard, Intel has about 75% market share (correct me if I'm wrong). That's 25 points short of my definition of monopoly. What exactly is your definition of monopoly?

    21. Re:History repeats itself..... by dpilot · · Score: 1

      I agree with you 100%.

      I just hope AMD (and others) survive, too.

      --
      The living have better things to do than to continue hating the dead.
    22. Re:History repeats itself..... by dpilot · · Score: 1

      Actually, when a company becomes too focused on its own plans, things can unravel faster than you'd believe possible. They may be raking money in hand-over-fist, but they're spending it only slightly less fast. (The difference is, obviously, Profit!)

      When a customer base shifts revenue can drop quickly, but expenses don't. Someone else mentioned IBM - it happens.

      I don't expect Intel's difficulties to be as spectacular. Microsoft may be another story.

      --
      The living have better things to do than to continue hating the dead.
    23. Re:History repeats itself..... by dpilot · · Score: 1

      In the desktop market, Intel wields monopoly power. Margins in the PC marketplace are so thin that the 'rewards' for Intel and Microsoft loyalty can make the difference between profit and loss. In other places, I've seen that monopoly power can be exerted with market shares down in the 70% range.

      But you're right, in that Intel doesn't have monopoly share in other markets.

      --
      The living have better things to do than to continue hating the dead.
    24. Re:History repeats itself..... by dpilot · · Score: 1

      (reference forgotten)
      I've read that monopoly power can be exerted with market shares in the 70% range. Take a look at the desktop PC marketplace, for a moment. The profit margins are soooo thin that the 'loyalty rewards' from Intel (and Microsoft) can make the difference between profit and loss.

      Because of the low profitability, the desktop PC marketplace is unusually sensitive to supplier monopolies, and permits such low 'monopoly share'.

      --
      The living have better things to do than to continue hating the dead.
    25. Re:History repeats itself..... by dpilot · · Score: 1

      I didn't say Itanium wasn't capable. I have three objections to it.

      1: For the sheer amount of money expended on it, I would have expected better. For billions of dollars and the better part of a decade of focused development, they should have been able to come up with something better than this.

      2: Itanium is *the most proprietary* architecture on the planet, bar none. The IP is in a separate company, licensed to Intel and HP. This is to exclude it from any cross-license agreements Intel and HP may already have.

      3: People have been saying for years that X86 is ugly, and we need to move away from it. If we really want to do so, Itanium isn't a particularly clean architecture to move to. Not only that, but the compilers were, and will likely continue to be, a sonofagun to write. Moreover, it's anticipated that in order to get further performance gains, they're going to dilute the 'architectural purity' of their EPIC (VLIW) to add OOO features. Perhaps necessary, but it's going to end up being another bastard architecture like the X86. Why move from ugly to ugly.

      What's happened to X86 reminds me somewhat of "The Soul of a New Machine" (Tracy Kidder) where new versions clean the architecture as they extend it, and the old warts become minor, seldom-used features for backward compatiblity.

      --
      The living have better things to do than to continue hating the dead.
    26. Re:History repeats itself..... by Anonymous Coward · · Score: 0

      Indeed there was an 086; The first pc's used an 8088, later moving on to the 8086. There was even an 80186, the Tandy 2000 is the only machine I've heard of using it though.

    27. Re:History repeats itself..... by jtosburn · · Score: 1

      This may not be true in specific applications that support both SSE-2 and hyperthreading. Though I don't have a spcific head-to-head matchup, one particular program I might expect to see higher clock-for-clock performance on the P4 than the P3 is Cinebench 2003 [aceshardware.com].

      Which is exactly what I said re. very specific, vertical market apps. I wasn't trying to diss the P4, so much as point out that it's a big break with how Intel went about processor design previously. It's not so much what's better, but that they're different.

      You also forgot to mention that there was no "086". :)

      And the 80186 ;)

    28. Re:History repeats itself..... by jtosburn · · Score: 1

      Maybe my mempry is fuzzy. OTOH, neither your link nor a quick Google search turn up anything substantive.

      However, you're argument re. 386's running mostly DOS is questionable...Windows 3.0 could take partial advantage of the 386, and Win3.1 could, after a year, utilize Microsoft's brand spanking new win32s addition. And DOS 5.0 could access the greater memory allowed by the 386, so that's in the timeframe. Games sure as shit used all they could get, often making you use boot disks that provided an environment optimized for them (or at least not running anything else you might care to run).

      Intel made 286's up to 25 Mhz, and 386's only up to 33 Mhz. And I'll stick by my recollection that the 386 was substantially faster out of the gate.

    29. Re:History repeats itself..... by PD · · Score: 1
      Windows 3.0 could take partial advantage of the 386, it is true. That was the ability of the 386 to address more than 1 meg of RAM. But, 32 bit code was thunked down to 16 bits, because underlying all of Windows was DOS.

      Intel made 286's up to 16.5 Mhz, not 25 Mhz. Siemens licensed the 80286 and pushed it to 20 Mhz. Harris got all the way to 25 Mhz.

      The average instruction timing for the 80386 was about 3 clocks per instruction, and for the 80286, it was about 5 clocks per instruction. But, that's averaged over the entire instruciton set as listed in the manual. They also used the best timings available to the CPU. For example, an integer ADD instruction takes 2 clocks on a 386 when you add two registers. But when you add a register and a memory location, the ADD instruction takes 7 clock cycles. Real world code produced by compilers doesn't use the instruction set uniformly, and real world code results in a lot of the slower usages of the instructions. The result was that the 80286 kept up very well with the 80386.

      Take a look at:
      http://www.intersil.com/data/an/an111.pdf

      Here's a quote I found. It's really hard to find instruction set timing comparisons between the 80286 and the 80386. It basically says that the 80286 can run the code in 15 clocks, and the 80386 can run the code between 11 and 16 clocks. The variation on the 80386 is probably because the 386 started introducing pipelines in the processor. Obviously, the technique wasn't that effective for increasing speed, other than clock speed.


      The following statements produce the same results, but take between 74 and 81 clocks on the 8088 or 8086 processors. The same statements take 15 clocks on the 80286 and between 11 and 16 clocks on the 80386. (For a discussion about instruction timings, see "A Word on Instruction Timings" in the Introduction.)

      mov bl, 2 ; Multiply byte in AL by 2
      mul bl


      And, I found these things in a 80386 programming reference manual. One is in the section about the differences between an 80386 and an 8086. The second one is about the 80386 vs. the 80286.


      14.7 Differences From 8086

      In general, the 80386 in real-address mode will correctly execute ROM-based
      software designed for the 8086, 8088, 80186, and 80188. Following is a list
      of the minor differences between 8086 execution on the 80386 and on an 8086.

      1. Instruction clock counts.

      The 80386 takes fewer clocks for most instructions than the 8086/8088.
      The areas most likely to be affected are:

      * Delays required by I/O devices between I/O operations.

      * Assumed delays with 8086/8088 operating in parallel with an 8087.


      The part about the 286 doesn't specify anything about the instruction set.


      14.8 Differences From 80286 Real-Address Mode

      The few differences that exist between 80386 real-address mode and 80286
      real-address mode are not likely to affect any existing 80286 programs
      except possibly the system initialization procedures.


      So, I hope I've met the burden of proof. I've showed that 386 computers running Windows 3.0 were running DOS underneath, that they had to thunk down to 16-bit code for all the operating system stuff, and that from the instruction set timing information (with references) an 80286 and an 80286 running 16-bit code at the same clock speed were very close in performance.

    30. Re:History repeats itself..... by Anonymous Coward · · Score: 0

      When they introduced P4 with their new architecture, tests shown that it wasn't all that faster than a good old P3. Then compilers and software in general adapted and it became faster.

      No, no, NO! The current crop of p4's have not had any compilers or software adapted to them! There have been significant internal hardware changes in the P4 that brought this about.

      For one quick example, consider all the PhotoShop benchmarks out there that compare exactly the same version of PhotoShop on different model P4's (google for it). The software has not changed; the hardware has changed!

    31. Re:History repeats itself..... by Zork+the+Almighty · · Score: 1

      The CPUs are listed approximately in order of introduction. Granted, most of it is anecdotal, but differences in processor performace was so much more pronouced back then.

      386SX-16
      The 386 was a huge advance but you'd never know it from one of these - they were usually out-performed by the better 286s.

      386SX-20
      These were a little faster than an SX-16. But then, so was custard. We vividly remember how they seemed to be slower than a 286-20 too...

      286-25
      We only ever saw two of these. They flew! We had one in the shop for a while, and used to ask friends to guess what was inside it without looking. Most people thought it was a 386DX-40.

      --

      In Soviet America the banks rob you!
    32. Re:History repeats itself..... by jtosburn · · Score: 1

      You did great! More time than I might have spent on it actually, but good info. It's been a few years! My only gripe has to with the 32 bit vs 16 bit thunking merely because something was running over DOS. Many, many applications of the time sidestepped DOS in order to get better performance, and thus didn't necessarily thunk to 16 bits for all operations. The win32s API is one example, and another is the Western Digital 32bit hard drive driver released at a similar time.

      And yes, the 286 and 386 may have been close in performance, but my point remains: that the P4 was the first Intel processor to actually run slower, clock-for-clock.

      As for who took the 286 to 25 Mhz, I bow to your resesarch; one of the links I had read made the claim that Intel did so, but they could have been total chuckleheads:). Maybe that makes me one too, for taking their word for it.

  8. So What ? by El+Cabri · · Score: 4, Interesting

    I'm kind of tired of the perpetual whining of armchair hardware designers. So the happy few, highly paid architects, 30 years-experience in the industry, hundred-published scientific papers at Intel decide that the next gen chip will have more stages and they have to be called morons ? How do you know better ? Hasn't intel produced the fastest chips on the market with each and every micro-architectural generation ? Long pipelines = costly branch mispredicts, whoooaah, you're so bright why don't YOU have the job leading the prescott team ? branches can be predicted. Long pipelines can improve throughput. Microprocessors are all about trade-offs. Let the pros do the work and go back playing Quake.

    1. Re:So What ? by fredmosby · · Score: 2, Insightful

      I agree with the argument you are trying to make. But it would probably work better if you were less condescending.

    2. Re:So What ? by Anonymous Coward · · Score: 0

      Hey, everyone has to start small. I bet if the now highly-paid architects had just shut up and gone back to playing Pong when they were young and inexperienced, they wouldn't be highly-paid today. And it's not like it has never happened that a bright young mind enters some big company and shows the old farts how things are really done in the modern times.

    3. Re:So What ? by Anonymous Coward · · Score: 0

      Hmmm, someone at Intel doesn't take kindly to criticism.

    4. Re:So What ? by addaon · · Score: 5, Insightful

      Right, Intel always has had the fastest chip, if you ignore things like Alpha, Athlon, Opteron, Power, PowerPC, and others.

      And of course, Intel's motivations are entirely performance, or at least price/performance, not marketing.

      The fact that every other company has chosen a different design decision and has made better chips as a result is just an illusion foisted on us by those who think there own thoughts.

      --

      I've had this sig for three days.
    5. Re:So What ? by afidel · · Score: 3, Insightful

      Intel's engineer's didn't decide the direction of the processor. The whole direction of Intel's desktop line has been controlled by marketing concerns since the initial stages of development on the P4. The engineers got to do as they wished with the Itanium but unfortunatly they went too far the other way and completely forgot about marketing concerns like running legacy code.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    6. Re:So What ? by Anonymous Coward · · Score: 0

      Yeah... like they say, I a little bit of knowledge is a dangerous thing.

      I love the line "As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls."

    7. Re:So What ? by Anonymous Coward · · Score: 0

      Don't tell me you didn't know that!

    8. Re:So What ? by harlows_monkeys · · Score: 3, Informative
      Right, Intel always has had the fastest chip, if you ignore things like Alpha, Athlon, Opteron, Power, PowerPC, and others

      Intel P4 and Xeon beat 4 of the 5 you name on SPEC.

    9. Re:So What ? by stevesliva · · Score: 5, Funny
      I'm kind of tired of you armchair OS coders. So the happy few, highly paid Microsoft employees, 20 years experience in copying IBM, thousands of stock options in Redmond decide the next gen OS will have some wack FS and they have to be called morons? How do you know better? Hasn't Microsoft produced the best selling OS on the market for 15 years? Why don't YOU have the job leading the Longhorn team?

      Oh. Yeah... LINUX.

      Nevermind-- go back to writing the best OS there is.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    10. Re:So What ? by jcbphi · · Score: 1

      I like the cut of your jib, sir.

      -J

    11. Re:So What ? by drinkypoo · · Score: 2, Interesting
      obviously branches cannot always be predicted, and intel has traditionally (not a long tradition, OoO is relatively new, but still) been poor at it. Witness the amazing slowness of the P4 compared to the P3, clock for clock. Some of those pipeline stages in the current P4 are already there for signal propagation, I suspect more of them in this core will be so-called "Drive" stages in which the CPU is doing nothing but waiting for signal propagation.

      Intel has the fastest chips (by a fine RCH), but AMD has consistently produced the best price:performance ratio and since the K6 faded over the horizon, AMD has got its act together WRT chipsets and compatibility, to the point where there is no longer any reason to get intel over AMD. AMD has realized that since CPUs are usually doing many things at once, it is better to be broad than deep.

      Intel is going to have to do something really spectacular soon or continue to lose market share to AMD. Personally I hope they blow it, because I'm so much happier with Athlons than with any intel CPU. AMD's only black mark is the K6, which until the K6/3 has only 24 bit FPU, and as such has many compatibility problems. Of course, if you're running linux, you'll never see them, so the faster K6s are not useless yet. (Cobalt Raq3 owners rejoice.)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    12. Re:So What ? by TheUnknown · · Score: 1

      If I remember correctly, as far as 386 were concerned, Intel didn't make the faster one. I am pretty sure AMD did.

    13. Re:So What ? by Anonymous Coward · · Score: 0

      Hasn't intel produced the fastest chips on the market with each and every micro-architectural generation ?

      No, they haven't.

    14. Re:So What ? by Anonymous Coward · · Score: 0

      Didn:t Athlon XP increase stages c/f Athlon? Nobody complained there.

    15. Re:So What ? by adrianbaugh · · Score: 4, Insightful

      We're supposed to be impressed by Intel's latest and greatest chip beating Alphas that aren't even produced anymore?
      I'm not wishing to knock Intel but it seems that these days whoever has the newest fabrication plant. Intel brings out a new line of chips: they're faster. So AMD brings out a new line of chips later on: bang! they're faster still. And so the merry dance goes on.
      Of course, this is all to the consumer's good as it means there's far more competition. But as far as the consumer is really concerned it doesn't matter so much who currently has the fastest chip as whose chip currently offers the best value while still being "fast enough". For my money that's been AMD for a while now.

      --
      "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
      - JRR Tolkien.
    16. Re:So What ? by Anonymous Coward · · Score: 0

      Didn:t Athlon XP increase stages c/f Athlon? Nobody complained there.

      No, that didn't happen.

      The Opteron has two more stages than the Athlon XP if I recall correctly, but the main change from the Thunderbird Athlon to the XP was data prefetching.

    17. Re:So What ? by nomadic · · Score: 1

      Nah, I think he got the right tone. It needed to be said.

    18. Re:So What ? by lakeland · · Score: 1
      Intel P4 and Xeon beat 4 of the 5 you name on SPEC..


      They do? The project I'm working on runs faster on a "AMD Athlon(tm) MP 2000+" than it does on a "Intel(R) Pentium(R) 4 CPU 2.80GHz". Just normal code, in a number of different languages.

    19. Re:So What ? by Anonymous Coward · · Score: 0

      sounds like SOMEONE baught a p4. athlon is a way better processor. i work with both and the p4 hangs doing things alot more than the p4 machines. it even hangs more than the p3 machines.

      also, dual athlon == 1000$ 2proc, mobo, ram. 2 xeons (sorry no p4 dual :() 2000$+

      its scary how brand loyal people are in this day and age.

    20. Re:So What ? by bhtooefr · · Score: 2, Informative

      There wasn't much difference on IPC, but AMD did make a 386DX/40, whereas Intel only made a 386DX/33. 8088 was identical IPC and clock (4.77MHz, Intel design, Intel and AMD build), but 80286 wasn't on clock (was on IPC) (6 to 25MHz, Intel design, Intel (6-12MHz), AMD (6-20MHz), Harris (6-25MHz) build).

    21. Re:So What ? by addaon · · Score: 1

      If your p4, p3, or athlon hang, at all, something is wrong. Hardware should never, in this day and age, cause hangs, unless you're using your computer in a harsh environment (inside your microwave, perhaps?).

      --

      I've had this sig for three days.
    22. Re:So What ? by EmagGeek · · Score: 2, Insightful

      A brief history of microprocessor development:

      The company I work for invented the first 16-bit microprocessor EVER, the CP1600 (ok, to be fair, it was a joint effort between us and a partner company), which was released in late 1974, when Intel was a scant 6 years old and PC meant "Pissing Clear." Intel was still a long 4 years away from introducing the 8086, which was only an 8-bit CPU anyway.

      Nobody ever talks about the CP1600 because it was not oriented toward "personal" computers. After all, why the hell would anyone want their own computer? The CP1600 was designed and later integrated into Honeywell's TDC2000 distributed process control system, the very first distributed digital process control system.

      Chances are, the gas that is sitting in your car was refined using a TDC2000 or descendant control system, so the CP1600 lives on in all of us just about every day.

      Intel just got lucky with marketing, and it was the old consortium, LIM, that made the PC a reality. Those of you who were born before the 80's probably remember first hand what LIM was, but I'll leave it to exercise for you newbies to find out. You'll be amazed at who used to be bedfellows...

    23. Re:So What ? by VeeCee · · Score: 1

      You know what, a game of Quake sounds like a great idea, thanks!

    24. Re:So What ? by PlazMan · · Score: 2, Interesting

      How about some whining from a real hardware designer?

      I used to work at Intel designing micros, and I can assure you that there are several highly-qualified and brilliant people in the microprocessor architecture and design teams. Unfortunately, Intel management directed them to trade performance for MHz about seven years ago and now they're finally paying for that foolishness. Lots of really good people have either left the company or drifted away from the project teams to the labs.

      Most of the people that I know who work or worked on the Prescott team say that it was probably the worst managed project ever at Intel. Take two (rival) divisions and tell them to work together, combine that with a design-by-committee mentality, and throw in a completely unreasonable schedule (imagine being in "crunch mode" for 2 years straight).

      Intel has succeeded in staying ahead by virtue of brute force. They have the resources to make diving save after diving save. The manufacturing and process engineers are unbelievably resourceful. The Northwood team has saved their bacon for the past two years as Prescott has missed deadline after deadline. It will be interesting to see if the behemoth can change its course and use its huge amount of engineering talent more efficiently in the future.

    25. Re:So What ? by stevesliva · · Score: 1

      You're completely correct. My mistake.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    26. Re:So What ? by Anonymous Coward · · Score: 0

      Funny thing is, if AMD had chosen the deep pipeline architecture and sold them just as cheap as they do their current CPUs, all the fanbois would be still be touting the same message. The #1 reason why people like the Athlons are the price. It has little to do with performance until the Athlon64s came out.

    27. Re:So What ? by dasmegabyte · · Score: 1

      go back to writing the best OS there is.

      Slashdot people wrote the Amiga Workbench?!?!

      I never knew!

      --
      Hey freaks: now you're ju
    28. Re:So What ? by dasmegabyte · · Score: 1

      AMD also made quad clock multiplier (aka DX4) 486 chips. They went faster than 120 MHz in some cases...granted, they didn't have the FPU performance nor the memory bandwidth nor the 32 bit performance of the pentiums, but when it came to REALLY WORKING in a Dos based environemnt, a 486 DX4/120 was a smokin' chip. I got most of my freshmen level CS work done on one of those (then graduated to a K6 233...in fact, I've only owned one Intel based system, a dual Celeron 366 oc'd to 550).

      --
      Hey freaks: now you're ju
    29. Re:So What ? by mabinogi · · Score: 1

      proof?
      On either claim?

      --
      Advanced users are users too!
    30. Re:So What ? by mabinogi · · Score: 1

      The thing is that Linux is _NOT_ written by Slashdot reading armchair critics...

      There may be a few Linux kernel hackers reading slashdot...but the vast majority of people commenting on stories on slashdot really don't know anything...about any of the topics, so the trick is to wade through the crap from those that take yesterday's headline as gospel today to find the good posts from those that really do know what they're talking about.
      Chances are, if there are a lot of early posts repeating the same thing in a slightly hysterical manner, ("Intel's doing this for marketing purposes", "Windows is always insecure and always will be", "Nintendo's new game console will fail, it's Virtual Boy 2")that they're all uninformed...not necesarily wrong...but very likely uninformed

      --
      Advanced users are users too!
    31. Re:So What ? by Hoser+McMoose · · Score: 1

      The Opteron increased the number of total stages but didn't increase the branch misprediction penalty.

      There's no word yet on whether or not the branch misprediction penalty has changed on the "Prescott" vs. the "Northwood" P4. The "Northwood" has a 28-stage pipeline with a 20-stage branch misprediction penalty. This "30+" stage pipeline of the "Prescott" could refer to either number.

    32. Re:So What ? by bhtooefr · · Score: 1

      YHBT by Intel, AMD, AND Cyrix.

      DX: x1 clock multiplier, bus speed = CPU speed
      25, 33, 40 (AMD only) MHz
      DX2: x2 clock multiplier, bus speed * 2 = CPU speed
      50 (25 FSB), 66 (33 FSB), 80 (40 FSB - AMD, Cyrix) MHz
      DX4: x2.5 (only on paper) or x3 clock multiplier, bus speed * (2.5|3) = CPU speed
      75 (25 FSB), 100 (33 FSB), 120 (40 FSB - AMD, Cyrix) MHz
      5x86 (AMD): x4 clock multiplier, bus speed * 4 = CPU speed
      100 (25 FSB), 133 (33 FSB) MHz (AFAIK on these)

      Cyrix also made a 5x86, however, it actually had some Pentium instructions (the MediaGX was a faster 5x86 with an integrated memory controller, video card, and sound card). Also, I'd rather have a DX4/100 than a 5x86/100 or a DX4/120 over a 5x86/133 (however, my board would have to be REALLY good to take a 40MHz FSB CPU - the 120 was the best of those, however).

    33. Re:So What ? by SpinyManiac · · Score: 1

      Let's not forget the IBM Blue Lightning chips.
      486-33 based. BL3 was 99MHz, BL4 was 132MHz.

      I've no idea if they were actually launched though...

      --
      It's never too late to have a happy childhood.
    34. Re:So What ? by drsmithy · · Score: 1
      AMD also made quad clock multiplier (aka DX4) 486 chips.

      The DX4 chips, despite their name, were actually clock tripled, not quadrupled. So a DX4/75 was 3x25, DX4/100 was 3x33 and DX4/120 was 3x40. Only AMD made the 120Mhz chips, but intel had 75 and 100Mhz parts as well. Most intel DX4/100s would overclock to 120Mhz and some (very) rare DX4/100 and 120 CPUs would overclock to 150Mhz (I recall only ever seeing two or three, out of a *lot*, that managed to achieve this feat - of course, this was back before massive heatsink+fan combos and water cooling).

      A DX4 was only marginally slower clock for clock than a pentium, unless the code was optimised/recompiled for the Pentium CPU, which improved its speed significantly. However, the DX4 was significantly cheaper and hence more popular (until the Pentiums got to 133Mhz+).

    35. Re:So What ? by drsmithy · · Score: 1
      Intel was still a long 4 years away from introducing the 8086, which was only an 8-bit CPU anyway.

      The 8086 was a 16 bit CPU.

      Those of you who were born before the 80's probably remember first hand what LIM was, but I'll leave it to exercise for you newbies to find out. You'll be amazed at who used to be bedfellows...

      Lotus, Intel, Microsoft, I can only ever remember the TLA being associated with the expanced memory spec, but presumably there was more ?

      The only people who would be surprised are probably the ones who think IBM is a warm-fuzzy corporation because they're supporting Linux ;).

    36. Re:So What ? by glsunder · · Score: 1

      "the K6/3 has only 24 bit FPU"

      The only thing I can imagine is that you, or someone else has confused 24 vs 32 bit color graphics with the FPU. If so, they're not related.

    37. Re:So What ? by Endive4Ever · · Score: 1

      Well, for most consumer purposes, the HP desktop machines with 400 MHz Pentium II processors that I bought at auction last month for $15 each were 'the best value.' For people who don't hang out at used hardware auctions, 'the best value' is letting nimrod dorks buy the 'latest, greatest' bullshit and buying their old boxes on eBay.

      And that used gear is almost never running an AMD part, because fanboy gear is vastly outnumbered by the mainstream gear that gets decomissioned and sold off in cubicle-at-a-time quantities.

      --
      ---
    38. Re:So What ? by Endive4Ever · · Score: 1

      'Write' is a misnomer. The best software projects start out with a design, not a bunch of hackers slinging code, with a vague sense of how it should work, who actually refuse to pay for a printed copy of the POSIX standard.

      I'm not talking about the Amiga Workbench in preceeding paragraph, in case someone misinterprets it that way.

      --
      ---
  9. Wait and see by ill_mango · · Score: 1

    Just because the early northwoods proved to be slower than the PIIIs doesnt mean the Prescott will be slower than the Northwood. Intel may very well have devised a way to yield better branch predictions or something of the sort. I definately won't buy one right away, but I wouldn't be surprised if they dont have the problems of the earlier Northwoods

    1. Re:Wait and see by WoTG · · Score: 1

      Well, I'm sort of on the fence on this too. I'll believe it when I see it.

      I doubt we'll see any big improvements from branch prediction, really, how much better than 80-90% can you get?

      Perhaps they'll play around with cache sizes, or hyper threading will really mature.

    2. Re:Wait and see by Anonymous Coward · · Score: 0

      You do mean Willamettes and not Northwoods. The first Northwoods generally did well against PIIIs, it was the Willamette core that was murdered in the benchmarks.

  10. Silly intel by g-to-the-o-to-the-g · · Score: 0, Troll

    It seems like intel is continuing to make poor decisions. Their Itanium processor is lame, the p4 is rediculously overpriced, and now they're planning on making more expensive yet less powerful processors. I personally suspect intel as a 80% marketing, 20% product sort of company. I think i'll stick to AMD for now.

    1. Re:Silly intel by homeobocks · · Score: 1

      Silly intel . . . pipelines are for AMD!

      --
      MOUNT TAPE U1439 ON B3, NO RING
    2. Re:Silly intel by Anonymous Coward · · Score: 0

      The Itanium is far from lame... The architecture is a lot better.. you try programming x86 assembler or debugging x86 applications thru gdb compared to debugging on a RISC machine like an Itanium or SPARC and you tell me which one is easier to work with...

      I'd have no problem with a desktop machine based on the Itanium, if all, or most, of the applications (and the OS, of course) I were to run are compiled to run natively. At least you get away from the mess that is x86. This was the whole idea behind the Itanium, to shed the x86 mess off of a CPU architecture.

      For IA-64 native applications, the performance is excellent.. and suprirsingly enough, I read a couple weeks ago that an Itanium 1.4 runs 32-bit code on average at the speed of a 1.5 GHz Xeon... which is 1/2 the speed of the latest and greatest Xeon (3.2 ghz), but clock for clock, its actually better.

    3. Re:Silly intel by Anonymous Coward · · Score: 0
      The Itanium is far from lame... The architecture is a lot better

      Dude, i havent laughed like that in *weeks*. thanks.

      I'd rather play with ia432. sparcv9 gives me wood, ia64 makes my puppies cry. Intel deserves every bit of the firey painful death it's acting out for trashing alpha. If sun dumps sparc in favor of x86-64, they'll be an even more entertaining uncontrolled orbital re-entry. If only that wouldnt leave us resorting to 286-LOADALL hack levels of creativity trying to get shit done with our computers...

    4. Re:Silly intel by cujo_1111 · · Score: 1

      Rediculously overpriced?

      I recently bought an Intel P4 2.8GHz for only $15 more than an Athlon 2800XP. I don't call that rediculously overpriced.

      If AMD could pull their thumbs out of their arses and get a 300mm fab up and running, they would be able to undercut Intel by a huge margin and grab more market share.

      Maybe then you could afford to buy the big heatsink required for the Athlons :)

      --
      If I point out that you are incorrect, making me a foe does not make you any more correct.
    5. Re:Silly intel by Anonymous Coward · · Score: 0
      I recently bought an Intel P4 2.8GHz for only $15 more than an Athlon 2800XP. I don't call that rediculously overpriced.

      I call bullshit.

      A P4 2.8 is $50 more than an XP 2800 (don't believe me? Check pricewatch or pricegrabber for yourself). When that is a 40% increase, I would tend to call it overpriced.
      Maybe then you could afford to buy the big heatsink required for the Athlons

      Maybe you could join us in the year 2004; P4s are hotter than both Athlon XPs and 64s. Prescott is rumored to be even hotter.
    6. Re:Silly intel by toddestan · · Score: 2, Informative

      AMD's higher end is a bit pricy, but that's to be expected. Intel can't compete in the mid range, and is getting totally killed on the low end. An Athlon XP 2200 is around $60. That's more expensive than the slowest P4 - the 1.4Ghz. It's even cheaper than the lowly Celeron 2.2Ghz. In that sense, Intel is way overpriced. By the way, Intel's latests chips run just as hot as their AMD counterparts. The days of the cool running PIII are over.

    7. Re:Silly intel by cujo_1111 · · Score: 1

      I call bullshit.

      No bullshit. Maybe living in Australia makes AMD prices artificially high, but I am not going to order a CPU over the net from some dodgy store on PriceWatch.

      I am really happy with my new system so it matters not now. BTW the system is: P4 2.8/1GB DDR400/Gigabyte 8IPE100Pro2/GexCube 9600XT. It runs a treat.

      My next purchase is going to be a laptop/notebook. Show me a cool running Athlon laptop that runs Photoshop really well for doing digital image stuff out on the road and I might consider it. At present the Centrino package is my preference, the only prob is the wireless shitfight... Damn Intel and their incompetence sometimes.

      --
      If I point out that you are incorrect, making me a foe does not make you any more correct.
    8. Re:Silly intel by StarWreck · · Score: 1
      Show me a cool running Athlon laptop that runs Photoshop really well for doing digital image stuff out on the road and I might consider it.
      AMD's answer to Centrino is This. With an ultra-low powered Athlon 1600+ you get very simmilar speed, very simmilar battery life. All the "Centrino" extras like Wi-Fi. Plus it weighs about 2 pounds less.
      --
      ... and in the DRM, bind them.
    9. Re:Silly intel by Hoser+McMoose · · Score: 3, Insightful

      WTF? Please, just have a look at some IA-64 assembly code! It's NOT pretty, especially if you want it to go fast. You've got to do the whole explicitly parallel thing, manually pack together independent instruction according to what pipelines you want to run them in.



      Itanium is NOT a RISC machine like Sparc, not in the least. Sparc is much more closely related to x86 than it is to IA-64. The Itanium is a VLIW chip, or EPIC in Intel-speak. It's a whole different animal altogether.



      FWIW, here's a brief article where Intel talks about implementing a bubble-sort in IA-64 assembly vs. the original C. In particular, they start with the code that the Intel C compiler generates and optimizes it. Their final, optimized version of the algorithm is on page 5, and it's anything but easy.


    10. Re:Silly intel by Hoser+McMoose · · Score: 1

      AMD's mobile chips are nice, not quite as fast as the Pentium M and slightly higher power consumption (they're pretty close though), but cheaper. However.. umm.. "weighs about 2 pounds less"? 2 pounds less than what exactly? Dell's Inspiron 300m notebook with a Pentium M/Centrino setup weighs only 3.0 pounds, 1.3 pounds less than the notebook you linked to. On the other hand, the Inspiron 8600, using the very same Pentium M processor, weighs in at a hefty 6.9lbs.

      The processor has basically ZERO connection with the weight of the notebook. The battery, the chassis and the screen all factor in here, but the processor? Those things are only a few grams!

  11. Pipeline stalls by k4_pacific · · Score: 4, Interesting

    When the processor branches, all the partially executed instructions in the pipeline are lost.

    They could minimize this by creating two different conditional branch instructions for each condition. One for cases where the programmer expects the branch to occur most of the time, and one for where the branching rarely occurs. They could then optimize the pipeline behavior for each case. If its a 'likely branch' instruction, it could start fetching commands from the branch. If its an 'unlikely branch' instruction, it could prefetch the next instructions after the branch.

    This would work well in loops where every time but the last, the processor branches back to the top.

    --
    Unknown host pong.
    1. Re:Pipeline stalls by ParisTG · · Score: 1

      This is mostly what branch prediction accomplishes, except that it bases its prediction based on historical data.

      Besides that, the P4 (and other processors) already supports prefixes onto branch instructions, which tell the processor how likely the branch is to be taken.

    2. Re:Pipeline stalls by learza · · Score: 1
      An 'unlikely' conditional branch could be useful for error checking, where the error is unlikely to occur, and performance is irrelevant if it does. Takes the mantra of "don't optimise for errors" a bit further.

      Of course, this would yield only a small performance gain since who checks for errors in performance critical code anyway? ;)

    3. Re:Pipeline stalls by bmorris · · Score: 2, Interesting

      Read up on predication. http://www.geek.com/procspec/features/itanium They do some cool stuff with it in Itanium.

    4. Re:Pipeline stalls by jmv · · Score: 1

      There is currently something like that. When the processor has to guess the branch with no prior data, it'll assume backward branches (loops) are taken and forward branches aren't. With this in mind, I think you can control the prediction behaviour by inverting the "then" and the "else".

    5. Re:Pipeline stalls by qbwiz · · Score: 2, Informative

      This was already implemented on the PowerPC 601 and 603 (and possibly others, my book is getting rather old). Additionally, the Alpha 21064 and 21064a processors could optionally guess a branch as taken if it went back(loops), and not taken if it went forward(ifs).
      Most processors nowadays use dynamic prediction, basing current predictions upon whether earlier branches were taken or not taken. The branch unit on the P4 predicts with an accuracy of about 95%.
      One more interesting way of doing it is to try executing both paths at the same time, and throwing out the one that is incorrect. This requires a lot more logic (although pentium 4's already include "hyperthreading", and this is somewhat similar), and with such high accuracies probably would actually be much worse than the current way of executing.

      --
      Ewige Blumenkraft.
    6. Re:Pipeline stalls by Valar · · Score: 1

      Well, if it is fast, who cares if it works? :/

    7. Re:Pipeline stalls by batura · · Score: 1

      Holy shit, you wouldn't be talking about branch prediction, would you?

    8. Re:Pipeline stalls by KFK+-+Wildcat · · Score: 1

      That's done already. There are sophisticated algorithms at work to calculate branch prediction (for instance, a simple algorithm is to use the last branch result, as branches are most often loops of some sort) and fetch the next instruction according to that prediction. If the prediction was wrong, then the entire CPU has to stall a few cycles to flush the pipeline.

    9. Re:Pipeline stalls by Anonymous Coward · · Score: 0

      Performace critical code is already tested that it works. There's no need to test in each innerloop. Try "man assert" some day.

    10. Re:Pipeline stalls by Valar · · Score: 1

      Oh, sounds like someone just learned C. *pity clap*

  12. It;'s not that it'll be slower... by Lothsahn · · Score: 5, Informative

    It'll most likely be slower per clock cycle.

    What this means, is that it will take a faster clock cycle (4GHZ, for instance) to do the same amount of processing as the Northwood core. However, increasing the pipeline should allow Intel engineers to achieve higher clock speeds, as the longest transistor path will likely be shorter (faster switching times).

    In essence, Intel is attempting to increase the speed of their CPU's by focusing on increasing the clock speed (P4), while AMD is focusing on increasing the amount of calculations per clock cycle (Hammer).

    Of course, there are a lot of more complex tradeoffs that factor in (ie. branch prediction). I highly recommend reading a computer architecture book if you're at all interested. It's really facinating stuff.

    --
    -=Lothsahn=-
    1. Re:It;'s not that it'll be slower... by BigKato · · Score: 1

      Can you recommend any good books for a total noob to computer architecture who is not in the computer industry? Just curious.

      --
      So we beat on, boats against the current, borne back ceaselessly into the past.
    2. Re:It;'s not that it'll be slower... by edrugtrader · · Score: 5, Funny
      I highly recommend reading a computer architecture book if you're at all interested. It's really facinating stuff.


      dude, i don't even read the articles.
      --
      MARIJUANA, SHROOMS, X: ONLINE?! - E
    3. Re:It;'s not that it'll be slower... by philthedrill · · Score: 3, Interesting

      It'll most likely be slower per clock cycle.

      Yes, I agree. My guess is that they're trying to achieve higher absolute performance. What surprises me is that this is still considered a P4 core, since adding pipeline stages (even 1 stage) is a very non-trivial task.

      This'll also kill the benefits of reduced power consumption of 90 nm technology (increase in area from the additional pipeline registers, increase in frequency), which is important in server design. An argument about the benefits of having a trace cache is the reduction in power consumption since you can remove some decoders (x86 decoders are horribly complex, yet having enough to feed the rest of the processor is critical for high performance). The P4 only has one x86 decoder (plus the uROM) and is able to perform well in general.

      It'll be interesting to see the power consumption numbers (average and max) as well as the die size. Also, I wonder how AMD's CPU rating system will change as a result of this.

    4. Re:It;'s not that it'll be slower... by edrugtrader · · Score: 1

      to let the sarcasmless individuals like yourself make an ass out of themselves.

      --
      MARIJUANA, SHROOMS, X: ONLINE?! - E
    5. Re:It;'s not that it'll be slower... by h2odragon · · Score: 0, Troll

      right. see something people are enjoying, possibly even learning from; and piss on it. Aren't you poroud? aren't you clever? ... would you like a cookie?

    6. Re:It;'s not that it'll be slower... by maraist · · Score: 1

      However, increasing the pipeline should allow Intel engineers to achieve higher clock speeds, as the longest transistor path will likely be shorter (faster switching times).

      While this is possible, don't forget that adding more stages means adding more inter-stage buffers.. These buffers are pure wasted processing / switching time.

      Since each instruction now requires passing through more total buffers, the amount of work performed each clock tick is even further reduced (compared to simply dividing the problem by the new ratio of stages).

      Next, there is the classic trade-off of keeping those acursed pipelines full.. I'm not aware of us achiving pipeline saturation yet, so adding more stages merely means an even more sparse pipe. The only situation where more stages == more throughput is if most of the stages are busy.

      And finally there's data-dependencies.. With a move to C++ languages (much of microsoft) and perl/java/CLI VM's running amuck, the number of instructions until a conditional branch / data-dependent-lookup is declining, and thus the performance hit of longer pipelines is sky-rocketing. Sure you can create a contrived nested loop benchmark that can fill those pipes, but that's not very practical.

      IF the pipelines were fully decoupled (non interlocking), and the virtual threading (Xeon) worked well with it, then I could see these extra stages being of use. Otherwise, pure marketing.

      --
      -Michael
    7. Re:It;'s not that it'll be slower... by Hoser+McMoose · · Score: 1

      The Prescott doubles the size of the L1 data cache (from 8K to 16K), increases the size of the trace cache (from 12K uops to 16k uops if my memory serves me correctly), doubled the size of the L2 cache (from 512K to 1024K), eliminates the integer multiply penalties and probably implements a barrel shifter.

      It also makes improvements to the TLB and the load and store buffers, it beefs up the SMT (aka "hyperthreading") performances and adds a few extra SSE instructions. Ohh, and it has more rename registers and can have more instructiosn in flight at any given time.

      All of the above changes are designed to INCREASE the per-clock performance of the processor. Most of these won't make a big difference, but the bigger caches definitely will help, and help a lot in many applications. A longer branch misprediction penalty will hurt performance a bit, and there are likely a few other changes made that can hurt performance every now and then.

      In short, whether the chip is faster or slower will depend a lot on the exact application being used. My guess, after looking at all the data I can find on the chip, is that the Prescott will be about 5% faster, clock for clock, than the Northwood. This is, of course, just a guess. We'll have to wait for another two or three weeks to find out just how it performs.

    8. Re:It;'s not that it'll be slower... by Hoser+McMoose · · Score: 1

      As you correctly guess, the Prescott is a VERY major redesign of the P4 core. This article has pictures of both a Prescott and a Northwood die, and they are VERY different (side note: I'm not sure I buy his whole Prescott = 64-bit thing, but the article does have some useful data about the two chips).

      In any case, while the power consumption numbers are all over the place, we do have some firm die-size numbers for the Prescott. If you look at Sandpile's P4 page, you can see that the Prescott is listed as having a 112mm^2 die (90nm node) with ~125M transistors. For comparison, the Northwood has a 131mm^2 die (130nm node) with 55M transistors.

      A large chunk of the extra transistor budget is going towards the extra cache (the extra 512KB of L2 cache is a good 30M transistors all on it's own), and cache transistors pack in tightly, so the die size/# of transistors ratio ends up being a bit lower than you might expect from just a straight process shrink.

    9. Re:It;'s not that it'll be slower... by dnoyeb · · Score: 1

      Another analogy would be

      Intel is shooting for high horsepower, while AMD is shooting for high torque.

      This suggests that while intel will have high speeds on their CPUs, the temperature underload vs no-load will be very similar. But the AMD temperatures should vary wildly under load vs. no-load.

      And another facet is that the intels will benefit hardly at all from overclocking, while the AMD should benefit nicely.

      This leads me to a strange issue. What kind of spacing will be between each intel CPU? Its going to require huge spacing to show a performance difference, but at such high GHz to begin with this should be hard. But AMD can have low GHz, but smaller spacing will produce a nicer performance difference.

      AMD will have a larger line of chips than intel.

      Is my logic off anywhere in here?

    10. Re:It;'s not that it'll be slower... by Anonymous Coward · · Score: 0

      And on top of that, it's tough to balance out the design so that each pipeline stage has the same worst case delay. Each stage has a worst case delay, and ideally you want to have all stages have essentially the same worst case delay. Since you have to wait for the worst case delay of the slowest stage, you might as well have the rest of the stages taking just about as long to complete, otherwise they would just be sitting idle instead of doing work while waiting for the slowest pipe stage to finish.

      In general, I think this gets tougher with more stages, since instead of balacing 28 stages (not exactly easy), you're now balancing 34 of them - about 20% more work for your engineers/timing tool.

      Finally, to the parent that mentioned transistor switching times as a critical path : for at least a generation of CPUs (maybe even 2 or 3) it has been interconnect delay instead of transistor switching speed which tends to dominate the critical path of designs.

    11. Re:It;'s not that it'll be slower... by edrugtrader · · Score: 1

      i'm very poroud, indeed.

      --
      MARIJUANA, SHROOMS, X: ONLINE?! - E
  13. Intel bit by their own tricks? by lambadomy · · Score: 4, Interesting

    Assume for a second that Intels P4 design was really meant to boost GHz numbers easily (to guarantee victory in the GHz war if not the performance war). If so is the Prescott design now due to having to keep up with themselves? Obviously they could design a chip that is "faster" but runs at a lower clock speed than the P4s, but they've pushed the GHz number so much that now they're kind of hamstrung in their design options.

    1. Re:Intel bit by their own tricks? by Breakfast+Pants · · Score: 1

      Yep look at the difference between the mobile pentium 4 and the pentium 4 m(centrino's processor). Way lower clock speeds.

      --

      --

      WHO ATE MY BREAKFAST PANTS?
    2. Re:Intel bit by their own tricks? by bhtooefr · · Score: 2, Informative

      Mobile Pentium 4: Cooled-down P4
      Pentium 4-M: Redesigned cooler yet P4
      Pentium M (Centrino): Redesigned Pentium III to take advantage of modern technology (400MHz bus, SSE2, etc.) and be cooler yet.
      Celeron M: Pentium M failure/economic bin. Half the cache.

    3. Re:Intel bit by their own tricks? by Anonymous Coward · · Score: 0

      > Assume for a second that Intels P4 design was really meant to boost GHz numbers easily

      The Inq also reported that a major design goal of the Prescott is to be CHEAP to make. Quite frankly, Intel's main market (corporate desktops) really doesn't care about raw performance.

      This could be a big problem for AMD because while the Hammer is a great chip, apparently it's fairly expensive to produce.

      Of course, you might not see the difference in retail pricing, but that's why Intel makes money and AMD doesn't.

  14. What? by Anonymous Coward · · Score: 0

    The Prescott will be faster than the Northwood clock for clock. The choice to make the CPU with a longer pipeline was obviously an engineering choice and not a marketing one.

  15. Re-read the article the reg is GUESSING 30 by uarch · · Score: 5, Informative

    Re-read the register article. Its not the Intel guy who said 30 stages, its the Register who is guessing. They're assuming that since it went from 10 to 20 before it'll go from 20 to 30 now. Its not likely to end up being more than a few extra stages.

    1. Re:Re-read the article the reg is GUESSING 30 by Anonymous Coward · · Score: 0

      Wrong, samples are out to the reviewers and its 11. Not 1, 2 or 10, 11.

  16. Slower than Northwood? by StarCat76 · · Score: 4, Interesting

    Although the Prescott core will have a longer pipeline, it will proboably end up performing a bit better clock-per-clock against Northwood. This is due to a couple reasons. Firsly, Prescoot has 1 MB on-die L2 cache. That's a good bit, and one could see how the P4 was helped by the 2M L3 cache in the P4 "EE". Secondly, the new P4 will have improved hyperthreading. It will also have somewhat improved branch prediction and implements PNI(Prescott New Instruction) which will require a recompile to help things out. All in all, I see the Prescott as being just as fast or faster per clock as Northwood, mostly due to the doubled L2 cache.

    1. Re:Slower than Northwood? by ameoba · · Score: 1

      "one could see how the P4 was helped by the 2M L3 cache in the P4 "EE"."

      Umm... not much at all, really.

      --
      my sig's at the bottom of the page.
  17. Low-power consumption devices by johnthorensen · · Score: 4, Interesting

    So, since Prescott has approximately a 30 stage pipeline, I guess Intel has decided to continue to ignore the low-power consumption market, leaving it open to people like VIA and Transmeta. This is really disappointing to a lot of folks in the embedded markets, who would really like to see Intel ship something with significant horsepower that doesn't require a heatsink with the mass of a black hole to keep running.

    Word has it that VIA is readying a new x86 processor to their line that supposedly has P3-class FPU performance while maintaining the same levels of poser consumption as its predecessors. It is expected that this processor may actually have a big win in front of it for DirecTV boxes. With the extra CPU horsepower, it should be exciting to see what nifty features come out of this, especially considering most set-top CPUs generally just act as "traffic cops" for the data moving between ASICs. If they're really making the move to this class of processor, perhaps they've got more in mind.

    --JT

    1. Re:Low-power consumption devices by johnthorensen · · Score: 1

      Sorry, should have read, "while maintaining the same levels of *power* consumption as its predecessors"

    2. Re:Low-power consumption devices by Wesley+Felter · · Score: 3, Insightful

      Hello, Pentium M?

    3. Re:Low-power consumption devices by Pyro226 · · Score: 2, Interesting

      ...would really like to see Intel ship something with significant horsepower that doesn't require a heatsink with the mass of a black hole to keep running. Aside from the whole Earth getting sucked into oblivion thing, a black hole would make an excelent heat sink. I mean, not even light can escape its gravity - heat wouldn't stand a chance.

      --
      This message is encrypted with Quad ROT-13 to protect the author's copyright under the DMCA.
    4. Re:Low-power consumption devices by ceallaigh · · Score: 1

      You have heard of the XScale, formerly known as the DEC StrongARM? ARM is quite popular in embedded development. http://www.intel.com/design/pca/applicationsproces sors/index.htm

    5. Re:Low-power consumption devices by ottffssent · · Score: 2, Funny

      > ...the same levels of poser consumption...

      Think what that would do for the world! Poser-powered PCs? They'd absolutely *FLY* off the shelves. e=mc^2 says I could stop worrying about the electric bills and heat he house with computers. One poser a decade would more than do it.

      Utility computing my arse! What we really want is computing *without* using utilities, and this is it, folks, the real deal. Buy your poserPC today! ;)

    6. Re:Low-power consumption devices by Sivar · · Score: 1

      In addition to the Pentium-M (low power consumption, low clockspeed, high performance) mentioned in above posts, Intel still sells ULV (Ultra Low Voltage) Pentium III's.
      Either one of the two above options would be sufficient on their own, but Intel also largely owns the ARM architecture (not x86 compatible, but VERY low power consumption).

      VIA processors to date have been crap, and Transmeta's have been great on a per-watt metric, but otherwise suck. Both are soon to release new products which will hopefully be competitive.

      It is AMD that has a weak low-power portfolio, having only the Geode processor (correct me if I am mistaken)--sufficient to power terminals and perhaps small routers, but little else.

      --
      Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
    7. Re:Low-power consumption devices by Anonymous Coward · · Score: 0

      Intel do not own ARM. They acquired StrongARM, an ARM implementation, when they purchased DEC. They rebranded this as XScale and have continued development. And this is the kind of processor you should be talking about if you are talking low power embedded. Pentium M is for laptops that need compatibility with x86. Who still needs that? Embedded systems can definitely do without the burden.

    8. Re:Low-power consumption devices by Rob+Simpson · · Score: 1

      Right! I'll just go buy a Pentium M and plug it into my socket 478 motherboard! Oh wait, last time I checked, I could only buy it as a laptop (usually as a "Centrino"). Unless things have changed? Any links to a desktop system, or components that can *easily* be placed in one, would be appreciated.

    9. Re:Low-power consumption devices by Anonymous Coward · · Score: 0

      Where did you get that IBM article suggested more then 20 pipleines???

      IBM abstract:

      The impact of pipeline length on both the power and performance of a microprocessor is explored both theoretically and by simulation. A theory is presented for a wide range of power/performance metrics, BIPS m /W. The theory shows that the more important power is to the metric, the shorter the optimum pipeline length that results. For typical parameters neither BIPS/W nor BIPS 2 /W yield an optimum, i.e., a non-pipelined design is optimal. For BIPS 3 /W the optimum, averaged over all 55 workloads studied, occurs at a 22.5 FO4 design point, a 7 stage pipeline, but this value is highly dependent on the assumed growth in latch count with pipeline depth.

      As dynamic power grows, the optimal design point shifts to shorter pipelines. Clock gating pushes the optimum to deeper pipelines. Surprisingly, as leakage power grows, the optimum is also found to shift to deeper pipelines. The optimum pipeline depth varies for different classes of workloads: SPEC95 and SPEC2000 integer applications, traditional (legacy) database and on-line transaction processing applications, modern (e. g. web) applications, and floating point applications.

      AFAICS they say about 7 which would increase with clock spead somewhat (assuming the same speed of memeory interface). It seems Athlon64 with 12 stage pipeline is optimal in that sence.

    10. Re:Low-power consumption devices by Hoser+McMoose · · Score: 1

      VIA's chips offer some nice performance on a per-watt metric. Same power consumption as Transmeta but higher performance. They are also DIRT-CHEAP, which is something that Transmeta hasn't caught on to yet and the reason why VIA continues to gain market share while Transmeta continues to lose market share.

      As for AMD, they have a few chips. First off there is the AthlonXP-M chips, the best of which have only just slightly higher power consumption (less than 10% higher) and slightly lower performance than the Pentium M, but they sell for a lot less. They also have their Alchemy line of MIPS chips which compete with Intel's StrongARM/XScale chips, albeit not all that successfully (performance and power consumption of the two are very similar, but ARM seems to be really beating out MIPS in most markets).

      The Geode is a brand new addition, really just an old Cyrix design that has been sold and resold. It's pretty much unchanged from the old MediaGX that it started it's life as 6+ years ago. It remains to be seen what AMD will do with it. Hopefully they have a few tricks up their sleave.

    11. Re:Low-power consumption devices by forgotmypassword · · Score: 1

      We will have to suspend the chip in the center of a toriodal blackhole.

      That way the chip won't fall in too.

    12. Re:Low-power consumption devices by confused+one · · Score: 1

      Actually you can buy it; it's just not mainstream. You need to look at embedded architecture boards used in industrial applications. You'll find boards and Pentium M's in various form factors.

    13. Re:Low-power consumption devices by Luminous+Coward · · Score: 1
      I'll just go buy a Pentium M and plug it into my socket 478 motherboard!
      (As far as I can tell, the Pentium M has 479 pins.) Here is a good starting point.
    14. Re:Low-power consumption devices by Rob+Simpson · · Score: 1

      Thanks! The LS855 looks like the only one that's a conventional motherboard (not mini-ITX, but with AGP/PCI slots and multiple memory slots) but no sites seem to sell it or offer prices.

  18. compilers by Mieckowski · · Score: 4, Informative

    I suppose that this makes having a good compiler a little more important. Compiling the same program for a G4 on a compiler other than GCC gave me a 100% speed boost. I don't know if branch mis-prediction came into play, but it had a conditional in its inner loop (it displayed the mandelbrot set).

    1. Re:compilers by addaon · · Score: 1

      Were you appropriately tuning GCC's (nearly infinite) number of optimization flags?

      If you did make an effort to optimize with GCC, I'd be very interested in seeing the program, and knowing what compiler you ended up with... I'm in the mood to add some more optimizations to gcc ppc, and I finally have a G4 to profile with, instead of having to chud everything.

      --

      I've had this sig for three days.
    2. Re:compilers by kc8apf · · Score: 1

      Put on OS X, CHUD includes one of the best tools for profiling, Shark.

      --
      kc8apf
    3. Re:compilers by addaon · · Score: 1

      As I mentioned, I use chud... and it's great for profiling, but being able to run the code for real is nice, too.

      --

      I've had this sig for three days.
    4. Re:compilers by AArmadillo · · Score: 1

      GCC is useful in the fact that it has been ported to so many systems, but it has always produced rather slow machine code -- most likely because it has to support so many architectures it can't super-optimize for just one.

  19. Sounds Like Marketing by Anonymous Coward · · Score: 4, Interesting

    It sounds like Intel has totally given up on efficiency, and has the Marketing department doing processor requirements now... (has to clock to xGHZ!)

    I've been working with Dual Opterons for a few months now, and have been very impressed as to their speed, heat dissapation, and bang for the buck.

    A large data transformation job (really doing a scrape of a mainframe report for data) on the order of 1.1GB processed much faster on an IBM E325 Dual Opteron 2.0ghz running 32bit Windows (ack) than my Dual 2.4ghz Xeon (w/HT) running Windows (double ack)....

    Yeah- it's not a benchmark, but it is real world performance.

    1. Re:Sounds Like Marketing by be-fan · · Score: 1, Interesting

      P4s aren't designed for efficiency, but raw performance. The long pipeline is an engineering decision. Consider, what market really pushes CPU performance? In the consumer arena, its games and media applications. They are "streaming" (predictable branching) type applications, and the pipeline latency has a lower cost than the benefit of the higher clock-speed.

      So comparing a 2.0GHz Opteron to a 2.4GHz Xeon is not a fair comparison. You have to do a price/performance comparison. A 2.0GHz Opteron costs $700. A 2.4GHz Xeon costs $200. A 3.0GHz Xeon will be more comparable to the 2.0GHz Opteron and costs about $500. The Opteron is faster than the fastest Xeon, but on a price/performance standpoint, the Xeon is still competitive.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:Sounds Like Marketing by ProtonMotiveForce · · Score: 2, Insightful

      I'm confused. How is it marketing only when you produce a faster chip?

      That's like saying a gold medal winner in the olympics only ran for the monetary value of the gold they receive. It's actually quite freaking stupid.

      The chip is faster. There are many ways to get faster chips that generally boil down to high IPC or high clock. Why do you nimwits insist on bleating that if you go the high clock route you're only catering to marketing?

    3. Re:Sounds Like Marketing by Anonymous Coward · · Score: 0

      Consider, what market really pushes CPU performance?

      The idiot market. Those who think 2 Ghz = 2x1Ghz (CTOs anyone?) Those who don't understand pipelining, branch prediction, invalidation.

      I didn't mention earlier, but I also have "whitebox" Opteron 240/1.4ghz boxes (sub $200 procs), also beat the Dual Xeon 2.4ghz in the same process (not by nearly the same margin though). They just aren't as exciting as the IBM e325. The memory bandwidth (Dual Channel 333mhz per proc, 2GB per proc) leaves the Dual Xeon with DDR wanting.

      Where the real improvement comes though, is with installation of Suse Enterprise Linux for AMD64. The problem is in migrating to it from a 99% Windoze shop... (And my confidence in Windoze/64 being a smooth road isn't quite there).

  20. Prescott vs. Northwood - Insides exosed by metlin · · Score: 2, Informative

    I had found an interesting article exposing the innards of the 775 pin Prescott -- see it here

    (Credit: Got it off The Register from this article)

  21. Myth? by The+Bungi · · Score: 5, Funny
    Alizarin Erythrosin writes "Further contributing to the MHz Myth ...

    Let me guess - 'Alizarin Erythrosin' is Cupertinus Elvish for 'Mac User', right?

    1. Re:Myth? by Anonymous Coward · · Score: 0

      Yes, myth. A 33 Mhz. Cray can eat your 3 Ghz computer for breakfast.

    2. Re:Myth? by Alizarin+Erythrosin · · Score: 1

      Actually, I'm an AMD fan-boy in a way. And I don't use a Mac.

      The name is a combination of 2 pH indicators, Alizarin Yellow, and Erythrosin Blue (I think it's blue). I saw it in my chem book back in high school and thought it sounded cool together.

      --
      There are only 10 kinds of people in this world... those who understand binary and those who don't
  22. ummm... by circletimessquare · · Score: 2, Funny

    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls.

    no, i didn't know that

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
    1. Re:ummm... by glwtta · · Score: 4, Funny

      Are you most of us?

      --
      sic transit gloria mundi
    2. Re:ummm... by addaon · · Score: 2, Funny

      I are.

      --

      I've had this sig for three days.
    3. Re:ummm... by shplorb · · Score: 1

      no, i didn't know that

      The fact that C:\> appears in your signature indicates why you didn't know that.

  23. Pipelines != Math Performance by TubeSteak · · Score: 3, Interesting
    My understanding was that AMD has 3 FPUs to Intel's 2. Oh, and AMD has 3 AGUs (integer units) compared to Intel's 2+2 (two of them also do other things). Anyways, most users, @ the Ghz speeds this proc is coming in at, will never notice the difference. For the people who care, they'll figure out what the proc can and cannot do... then use it accordlingy. Unless you guys really want to run windows, why not compare the Opteron to a Dually Mac? After all, the PowerPC is really good at number crunching.

    How come your computer takes seconds to multiply two 400 digit #s, but ages to factor them?

    --
    [Fuck Beta]
    o0t!
    1. Re:Pipelines != Math Performance by tomstdenis · · Score: 5, Interesting

      More specifically the Athlon has three ALU/IEU pipeline pairs, 1 FADD, 1 FMUL and 1 FLOAD pipeline [e.g. you can't do 3 FP muls at once].

      The decoder can send upto three instructions into the pipeline per cycle. Actually that's only for directpath instructions [e.g. simple ALU/FP]. Vector instructions stall all three decoders.

      The ALU scheduler is fairly strong but it does have several weaknesses. from the manual I can't see that it can resolve dependencies from other pipelines. For instance,

      ADD EAX,EBX [DIE ]
      ADD EBX,EAX [D IE ]
      ADD ECX,EBX [D IE] - critical path
      INC ESI [ DIE ]

      D == decode, I == issue, E == execute [pp.. 227 of the athlon opt manual].

      So the fourth instruction will always start on the second cycle despite the fact that ALU1/2 are blocked.

      Similarly the Athlon memory ports are a bit weak. There are read/write buffers but you still can only issue two reads or one write per cycle which is annoying.

      However, the strength of the Athlon ALU over the P4 ALU is that for the most part it can keep all three pipelines busy even if they are blocked at some stage [e.g. it can decode/issue even if blocked]. It doesn't say in the documentation but I could swear the Athlon can cross-pipe things too. Cuz sometimes I can mess the order of ops [e.g. create a dependecy] and it executes in the same time regardless.

      Anyways, yeah it's all about the 3 ALUs and a decent scheduler. Something the P4 does not have.

      Tom

      --
      Someday, I'll have a real sig.
    2. Re:Pipelines != Math Performance by bhtooefr · · Score: 2, Informative

      I would, but I could just get the PCWorld Athlon FX-51@2.2GHz (almost identical to an Opteron 148) vs. 2xOpteron 246 (2.0GHz) vs. Athlon 64 3200+ vs. P4 3.2 vs. 1.8 G5 vs 2x2.0 G5 benchmarks, and see that in all benchmarks except Photoshop (on the dual G5), Quake III on the A64 and O246 (probably the SMP), and Word on the O246, the x86 CPUs *MURDERED* the Macs. Yes, even the P4. BTW, the AMD CPUs did well against the P4, except in the Quake III and Word benchmarks (Intel optimized code, maybe - Q3 is definitely Intel-optimized, but WORD?)

    3. Re:Pipelines != Math Performance by Anonymous Coward · · Score: 3, Insightful

      Ok, so they benched Premiere 6, Photoshop 7, Microsoft Word, and Quake 3.

      Please tell me you have at least the 2 brain cells required to know that this benchmark is far from accurate.

      Anyone who does ANY form of editting on a Mac wont touch Premiere 6 with a 100-foot pole. Why? Because Final Cut Pro smashes it to little tiny pieces you could use to flavor your coffee.

      Microsoft Word? Tell me you're kidding. The benchmark was doing search-and-replaces. This is dependent on so many things ranging from hard disk caches to Microsoft's optimizations that its almost not funny.

      And Quake 3. Almost entirely dependent on the graphics card and the drivers written for it.

      Nothing to see here, move along.

      (yes, I know I shouldn't feed the trolls)

    4. Re:Pipelines != Math Performance by Anonymous Coward · · Score: 0

      The decoder can send upto three instructions into the pipeline per cycle. Actually that's only for directpath instructions [e.g. simple ALU/FP]. Vector instructions stall all three decoders.

      Yup. E.g. splitting movps -> movlps+movhps does indeed make a performace gain.

      So the fourth instruction will always start on the second cycle despite the fact that ALU1/2 are blocked.

      Not sure I got this.. You mean it cannot schedule stuff to happen before the last instruction scheduled on a given pipeline? Well, this is what manual/compiler scheduling is for;)

      Similarly the Athlon memory ports are a bit weak. There are read/write buffers but you still can only issue two reads or one write per cycle which is annoying.

      This is true. You need to keep constants in registers to get the optimal speed.. for example GCC doesn't realize this in SSE code and it often produces about twice as slow code as hand optimized assembly.

      However, the strength of the Athlon ALU over the P4 ALU is that for the most part it can keep all three pipelines busy even if they are blocked at some stage [e.g. it can decode/issue even if blocked]. It doesn't say in the documentation but I could swear the Athlon can cross-pipe things too. Cuz sometimes I can mess the order of ops [e.g. create a dependecy] and it executes in the same time regardless.

      Possibly, but for integer stuff there is so much resources that it hardly matters and when there's a single right pipeline (like fmul), it doesn't make much sense so detecting it is hard.

      One thing you didn't come up with is that athlon is relatively sensitive to aligning. Instructions that span 16-byte lines don't always get decoded optimally. Exact details of this are hard to find as well..

    5. Re:Pipelines != Math Performance by tomstdenis · · Score: 2, Interesting

      "Vector instructions stall all three decoders.

      Yup. E.g. splitting movps -> movlps+movhps does indeed make a performace gain."

      I meant VectorPath instructions like DIV, LGDT, etc... ;-)

      They stall all three decoders. As for alignment the trick is to pack as many instructions into 8-byte aligned windows. According to the manual it fetches 24-byte windows and performs one [or two I forget... PDF is so far away] of scan/early decoding.

      So the trick is to organize your code so that each 8-byte segment has as many directpath instructions in it. That will minimize the decode latency [depending on the instructions may minimize issue/execute latency].

      The problem though is most ALU opcodes are at least two bytes [except for things like INC/DEC] and worse yet things like

      00000000 89D8 mov eax,ebx
      00000002 8B00 mov eax,[eax]
      00000004 8B0418 mov eax,[eax+ebx]
      00000007 A100040000 mov eax,[0x400]
      0000000C 8B8000040000 mov eax,[eax+0x400]

      So really offsets/constants are horrible [the last two instructions are 5 and 6 bytes each].

      If you have to step through arrays I think the idea would be to use the middle, e.g.

      00000012 03040B add eax,[ebx+ecx]
      00000015 81C100040000 add ecx,0x400
      0000001B 03040B add eax,[ebx+ecx]
      0000001E 81C100040000 add ecx,0x400

      Which takes 18 bytes. [four windows]. Another trick is to use a register for the step size...

      00000024 BA00040000 mov edx,0x400
      00000029 03040B add eax,[ebx+ecx]
      0000002C 01D1 add ecx,edx
      0000002E 03040B add eax,[ebx+ecx]
      00000031 01D1 add ecx,edx

      [16 bytes, 3 windows, ignore stalls.... ;-)]

      Tom

      --
      Someday, I'll have a real sig.
    6. Re:Pipelines != Math Performance by cbreaker · · Score: 1

      (yes, I know I shouldn't feed the trolls)

      Me neither, AC!

      Premiere has gotten a lot better since version 3, ya know. FCPro is very nice, professional, etc. But for 99% of the editing out there, Premiere 6 or Premiere Pro handles your work exceptionally well. If you've got some Premiere compatible RT gear, you're rockin.

      Adobe makes good software, and with every release it gets a lot better. They listen to their customers, they have open forums on their web site, and they add the features, UI enhancements, and lots of other goodies based on what the customers want.

      I will be the first to admit that Premiere 7 is lacking in some areas (speed in some cases) but it's not a toy, it's a professional NLE and costs a fraction of what the big names charge.

      Do don't dis my Premiere! =)

      --
      - It's not the Macs I hate. It's Digg users. -
    7. Re:Pipelines != Math Performance by SiMac · · Score: 1

      Word is definitely a different codebase on both platforms. Premiere is definitely running in Classic mode. This is definitely a pathetic excuse for a benchmarks, when some machines have RAID and others don't.

    8. Re:Pipelines != Math Performance by daVinci1980 · · Score: 1

      Well, I don't switch because I don't consider Photoshop a game. No matter how much you mac-heads believe it. ;)

      --
      I currently have no clever signature witicism to add here.
    9. Re:Pipelines != Math Performance by Anonymous Coward · · Score: 0

      SSE Vector instructions are vectorpath;)

    10. Re:Pipelines != Math Performance by yabos · · Score: 1

      Adobe hasn't updated Premier on the Mac in ages. Hense the reason it's so slow and unoptimized.

    11. Re:Pipelines != Math Performance by TubeSteak · · Score: 1
      ummm... yea! what you said! Tx for spelling out my point.
      Now, assuming the P4 line (and beyond) actually managed to beef up their math crunching powers (with more/better XXUs), would the longer pipeline help in crunching big numbers? Especially considering the massive quad-pumped bus speeds?

      The question @ the end of my original post wasn't a sig. Really: how come a puter can multiple two 400 digit #s in seconds, but takes forever to factor them?

      --
      [Fuck Beta]
      o0t!
    12. Re:Pipelines != Math Performance by tomstdenis · · Score: 0

      "The question @ the end of my original post wasn't a sig. Really: how come a puter can multiple two 400 digit #s in seconds, but takes forever to factor them?"

      If you knew the answer to that you'd be world famous.

      Tom

      --
      Someday, I'll have a real sig.
    13. Re:Pipelines != Math Performance by Anonymous Coward · · Score: 0

      Multiplication is easy and straightforward. Factoring is not. You have to locate the lowest valid factor each time before performing each division on the remainder.

    14. Re:Pipelines != Math Performance by tomstdenis · · Score: 0

      That makes no sense whatsoever.

      Trial factoring amounts to starting from the square root and trial factors from there. Which grows into the fermat sieve [requires fewer hard ops but takes the same asymtotic time] then into -rho methods then into quadratic sieves and then into field sieves...

      However, as nice and cool as all that shit is none of it accounts for a proof of how hard factoring is. It is quite possible that tommorow a trivial algorithm for factoring is found that renders all IFP systems insecure. It's not likely to happen but it is certainly not impossible and in fact more likely to happen as time goes on.

      Tom

      --
      Someday, I'll have a real sig.
    15. Re:Pipelines != Math Performance by Anonymous Coward · · Score: 0

      u r teh suck.

    16. Re:Pipelines != Math Performance by tomstdenis · · Score: 0

      yeah and you fail at the internet.

      Take that.

      --
      Someday, I'll have a real sig.
    17. Re:Pipelines != Math Performance by cbreaker · · Score: 1

      Ohh shoo.

      Premiere Pro (7) only very recently came out for Windows, before that it was version 6 just like the Mac. I honestly don't know why there's not a newer mac version out there yet, but whatever.

      Plus, they ran benchmarks with the same version, so your point is moot. In fact, adobe even has released core updates for the new G5, technically it should be faster since it's 64 bit code in there huh?

      --
      - It's not the Macs I hate. It's Digg users. -
  24. grammar error by Anonymous Coward · · Score: 0, Flamebait

    will have a longer pipeline then Northwood
    it should be <b>than</b>, no then...fix it people

  25. Doesn't matter to me... by TitusC3v5 · · Score: 4, Insightful

    ...since my next computer is going to house a G5.

    Personally I'm tired of trying to keep up with the gHz war between AMD and Intel. With our current technology, the only areas really pushing processing speeds are gaming and video/image applications(that I'm aware of). My grandmother doesn't need a P5 4gHz to check her email, and neither do I if I simply want to write a paper.

    --
    And the masses cried out, "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0!"
    1. Re:Doesn't matter to me... by Quarters · · Score: 1

      If you really believe that, and all you are doing is writng a paper, then why aren't you getting a more affordable G4 iMac or laptop?

    2. Re:Doesn't matter to me... by TitusC3v5 · · Score: 1

      Actually, I already have a low-end iBook. It's basically my entry into the Mac arena(love it so far...OS X is great), and I want to get used to the Mac World before I go all out with the G5. :-)

      --
      And the masses cried out, "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0!"
    3. Re:Doesn't matter to me... by Anonymous Coward · · Score: 0

      Personally I'm tired of trying to keep up with the gHz war between AMD and Intel.

      I got tired of it years ago, back when they had the K6 and K6-2 vs the P2/P3 or whatever. Since then I have just purchased AMD only. The way I determine what to buy is just find the best price point in Mhz vs $ (or respective currency). For instance, according to pricewatch.com, right now the best AMD chip to purchase (for general usage) is the Athlon XP 2000-2400+. There are factors not taken into effect, such as having to price out the other parts, or building it yourself. But unless you are really craving a Apple, you can't go wrong with AMD.

    4. Re:Doesn't matter to me... by Anonymous Coward · · Score: 0

      neither do I if I simply want to write a paper.

      Might I suggest an electronic typwriter.

    5. Re:Doesn't matter to me... by rhuntley12 · · Score: 1

      You don't need the speed? Don't get a fast computer then. Me, I'll continue having a cutting edge desktop and laptop to play my games. You can thank gamers that you are not still on a P1.

    6. Re:Doesn't matter to me... by dj245 · · Score: 1
      Doesn't matter to me......since my next computer is going to house a G5.

      That makes absolutly no sense at all. The G5 is widely touted by Apple enthusiasts as being the fastest thing for "video/image applications". I know that it isn't, Joe the computer guy know that it isn't, but it is comparable. And you're saying that you want to avoid the fastest computers out there because you don't need it and get a G5? Do you hear yourself? Either you think G5's suck donkey for computing power, or you have no idea what kind of loads word processing and e-mail actually put on a machine.

      I would go with an Athlon XP 1800 or so with 256mb of ram. Performance like that is more than adequate, I even use an 1800 (loaded with hard drives) for MPEG-2 encoding, DVD authoring, and burning. You can build a box like that for less than $400 (minus the 6 hard drives I have). It regularly takes half or a third of the time encoding MPEG2 files that my 2ghz P4 laptop takes. I don't have benchmarking conditions, but still..2 hours instead of 4 or 6 is a hefty time savings. The cheapest G5's are ridiculously overpriced, and run in the thousands.

      --
      Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    7. Re:Doesn't matter to me... by Anonymous Coward · · Score: 0
      ...since my next computer is going to house a G5.

      Going with the chip that powers the #3 most powerful supercomputer on the planet? You're a wise one. :^)

      Personally I'm tired of trying to keep up with the gHz war between AMD and Intel.

      I prefer the term "processor speed circle jerk" but that's not nice in polite company like Slashdot.

  26. Have you met these engineers? by downix · · Score: 1

    A friend of mines husband works for Intel. In fact, he was in the FPU division last time I checked.

    This man I wouldn't sign on to design me a doghouse!

    His checkbook was a horrid mess, he got basic math wrong.

    Yet, he is desiging critical areas of Intel's high-end chips?

    --
    Karma Whoring for Fun and Profit.
    1. Re:Have you met these engineers? by adrianbaugh · · Score: 1

      Lots of good mathematicians are rubbish at arithmetic. That may be a stereotype but I know several very good mathematicians who fit it well. In any case I suspect designing chips has as much to do with spatial awareness as arithmetic.

      --
      "'I pass the test,' she said. 'I will diminish, and go into the West, and remain Galadriel.'"
      - JRR Tolkien.
    2. Re:Have you met these engineers? by jabberjaw · · Score: 1

      ... he got basic math wrong.
      Was it wrong, or just sloppy? A great many physicists I know and study under are extremely sloppy with basic math, mainly because they can get away with it as long as it is not being published. This does not mean that they are not competent, just that they are sloppy. If the basic math was wrong alltogether that is another story, but sloppy math is quite common in certain fields.

    3. Re:Have you met these engineers? by Crazy+Eight · · Score: 1

      Bertrand Russell's wife once had to leave the man very explicit instructions on how to make tea. I'm not talking about some colorful Martha Stewart-esque tip about how to save time or make it perfect. I'm talking about directions on how to boil water. Seriously, that level of detail.

  27. Re:OPEN YOUR EYES by Chuck+Bucket · · Score: 0, Offtopic

    yep, that would be _way_ less. of course Iraq *needs* democrazy so it's for their own good! after all, it's what we pay taxes for...

    CB

  28. There's a difference by Anonymous Coward · · Score: 0

    It's the difference between killing during war and a mafia hit.

  29. Scientific work on optimal pipeline depth by Wesley+Felter · · Score: 5, Informative

    In case anyone wants some hard facts:

    A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor, ISCA 2002.

    M.S. Hrishikesh, Norman P. Jouppi, Keith I. Farkas, Doug Burger, Stephen W. Keckler, Premkishore Shivakumar (UT Austin, Compaq): The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays, ISCA 2002.

    Eric Sprangle , Doug Carmean (Intel): Increasing Processor Performance by Implementing Deeper Pipelines, ISCA 2002.

    A. Hartstein and Thomas R. Puzak (IBM): Optimum Power/Performance Pipeline Depth, MICRO 2003.

    What all these papers have in common is that they find that increasing the pipeline depth past 20 stages increases performance.

    1. Re:Scientific work on optimal pipeline depth by Hawkxor · · Score: 1

      More specifically, from reading these articles it seems that while increasing pipeline depth from 1 to 20 does decrease performance, there's a kind of negative parabola effect - for some reaons, once you hit 20, increasing the pipeline depth has the opposite effect. So with Prescott, we should be seeing the best of both worlds - with more pipeline depth (apparently a good thing) - and better clock speed and everything else.

      Of course, I'm no expert so maybe the results of these links sway in Intel's favor only because of Access bias.

    2. Re:Scientific work on optimal pipeline depth by mrm677 · · Score: 1

      Oh, you say that all these researchers say that increasing the pipeline depth past 20 stages increases performance?! How can that be so when so many slashdotters claim that Intel is big, bad, and evil by trying to trick everybody buying an inferior chip?

      Hey armchair architects, have a look at the SPEC benchmarks. Intel knows what they are doing...even with Itanium.

    3. Re:Scientific work on optimal pipeline depth by salimma · · Score: 1
      they find that increasing the pipeline depth past 20 stages increases performance.

      While that means Prescott would be a decent CPU, it just highlights how the P4 design was driven mostly by marketing... of course these research papers were published post-P4, with the benefit of hindsight.
      --
      Michel
      Fedora Project Contribut
    4. Re:Scientific work on optimal pipeline depth by Rufus211 · · Score: 1

      because we know that SPEC and stuff like 3dmark is the reason we buy computers. If fact, that's all I ever use my computer for, to make myself feel good about having the fastest one.

    5. Re:Scientific work on optimal pipeline depth by -tji · · Score: 3, Interesting


      > What all these papers have in common is that they find that increasing the pipeline depth past 20 stages increases performance.

      Is that a typo, or am I misinterpreting the papers you liked above?

      In all but the Intel paper, it looked to me like they were saying the optimal pipeline depth was somewhere between 6 and 20 (depending on workload).

      In the introduction of the Intel paper, it says "Focusing on single stream performance". So, basically they are focusing on artificial benchmark performance.

    6. Re:Scientific work on optimal pipeline depth by Hoser+McMoose · · Score: 1

      Have a look at the list of applications that make up SPEC CINT2000 sometime. I think you'll find that many of those applications ARE what people use in their every day computing.. err, at least if they run *nix. The tests include compiling with GCC, compression with GZip and BZip2 and running Perl. There is also a database test and a couple Place and Route tests, perhaps not normal workloads but certainly things that some people buy faster computers for.

      CFP2000 tests are not nearly as common, they are mostly scientific computing tests. Some fluid dynamics stuff, neural net simulations and equation solving libraries of various types, though Mesa (software OpenGL) does get thrown in there.

    7. Re:Scientific work on optimal pipeline depth by bmoore · · Score: 2, Informative

      In addition to these, there is a paper coming out in the next ISPASS conference from some researchers at Notre Dame which looks at the effects of increasing the pipeline depth on the memory subsystem. It turns out that as you crank up the pipeline depth, you decrease the amount of "work" that can be done in a single cycle (obviously). The papers from ISCA fail to fully take the memory subsystem into consideration.

      Now, for the most part, Comp. Sci and Eng. majors assume L1-caches to have 1-cycle latenies. Most current "real" processors do NOT have 1-cycle latencies, because it takes too long to access a cache of any useful size. As the pipeline depth increases, it gets much more difficult to have large L1 (or L2) caches.

      Using the cache design simulator Cacti, we were able to get data on the approximate maximum sized cache, based off of pipeline depth (yes, this is fab-tech independant, check the paper for details). For example, if you consider a 5-cycle L1 delay (this is for a hit, not a miss), the maximum cache size you can get for a 10-stage is 512KB (as a 256K Instruction and 256K Data), for a 15-stage is 128, for a 20-stage is 32K, and a 25-stage would be 4K!

      We simulated up to a 50-stage pipeline (the Intel paper above claims that a 50-stage pipeline is best), and the fastest cache we could simulate at that speed takes 8 cycles to read from the L1. This is for a 4K cache! (2K instruction, 2K data).

      As anybody who has studied Computer Architecture before knows, caches need size to be effective. There are going to be some serious memory issues with these deeply-pipelined processors!

    8. Re:Scientific work on optimal pipeline depth by Cyno · · Score: 1

      If increasing the pipeline depth past 20 stages increases performance then why not just increase it to 100 or 1000 stages? Why wait? It will obviously be faster, right?

      I think there's more to this than the number of stages in the pipeline.

    9. Re:Scientific work on optimal pipeline depth by akuma(x86) · · Score: 1


      In the introduction of the Intel paper, it says "Focusing on single stream performance". So, basically they are focusing on artificial benchmark performance.


      Ummm no...

      At Intel we simulate real benchmarks - things like Unreal Tournament and Adobe Acrobat.

      Single stream performance refers to a single thread executing in the processor core at once. Of COURSE you can have a context switch and go to another OS thread.

      Dual stream (Hyperthreading/SMT) has different performance characteristics for pipeline depth. SMT refers to having 2 or more simulataneous thread in the pipeline - simultaneously...at the same time...

      Not speaking for Intel...

  30. Most of us know by scrote-ma-hote · · Score: 2, Funny
    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls
    Yeah, um who here actually knew that. I'm struggling to believe it's anywhere near 1/2. I'm sure a poll would clear this up.
    1. Re:Most of us know by Anonymous Coward · · Score: 0

      Next /. poll: Did you know a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls? Yes No It's how you use it I want Coyboyneal 's pipeline

  31. Extended pipelines can be faster.. by Anonymous Coward · · Score: 1, Informative

    Read "Understanding Pipelining and Superscalar Execution" http://slashdot.org/articles/02/12/19/1810214.shtm l?tid=137 . Extended pipelines _can_ improve performance. However, the compiler _needs_ to understand how to take advantage of it. Otherwise, you could end up with slower code.

  32. Re:OPEN YOUR EYES by Chuck+Bucket · · Score: 0, Offtopic

    oh yeah, a site that talks about COLUMBO visiting Clinton? Yep, that does seem like a relavant site! But it's on the internet, so it must be true!

    Please go back to watching Fox news now, catch ya later!

    CB

  33. Obvious by Anonymous Coward · · Score: 0

    The Intel Fanboy Handbook. It's similar to The Republican Scumbags Handbook(called The Republican Handbook for short).

  34. One-off number crunching... by Goonie · · Score: 3, Interesting
    In some situations, this kind of number-crunching is done with a custom program that is only run a few times. In such situations hacking something together in Matlab is quicker to get up and running than a full-blown C++ or, god forbid, FORTRAN program.

    Programmer time is much more expensive than faster machines.

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
  35. Redundundundant! by irhtfp · · Score: 1
    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls.

    Well, like... Duh!

    Geez! Who doesn't know that?!?! Why even mention it.

    --
    I've made up my mind and now I've got to lie in it.
  36. 4-stage pipeline by mosb1000 · · Score: 3, Funny

    Gosh, I'm feeleing really left behind, my G4 400 only has 4 stages in it's plpeline. At least it's build on a .22 micron process as apposed to the Pentium's measly .13 micron process. Yes, that was a joak

  37. MOD PARENT UP by Slowping · · Score: 1

    Thanks for the references. At least it'll look like I'm working, though I'll still be procrastinating.

    --
    (\(\
    (^.^)
    (")")
    *beware the cute-bunny virus
  38. I doesn't take much experience to notice flaws. by qortra · · Score: 2, Insightful

    I've not helped to design an operating system or really any part of an operating system, but I can damn well tell you that Windows ME was a shitty OS. It doesn't take any experience for me to tell this; I can determine this by simple observation.

    When the tire of my car explodes in an open road, it would not take much expertise on my part to diagnose it as a problem with my tire (they really aren't supposed to explode). And, when it happens to many other people with the same tire, it wouldn't take any expertise on my part to determine that it is probably a flaw in that tire design.

    If indeed long pipelines make non-predictable/chaotic software cause more mispredicts, and I notice that those applications do indeed run more slowly (or fail to see a speed improvement) on a new, more expensive, Intel processor, then I can assume without expertise that the design of the processor is not fitting for those applications.

    Also, when Intel's experienced engineers make a design decision, it might not be with the purpose of speed. In fact, I think few decisions there are. Intel, like Microsoft, is a marketing company. They like big numbers because they attract customers. Customers don't necessarily want really fast matlab, they want to be able to say "4 Ghz" because it makes them feel special.

    So, please don't be frustrated with people for making simple, astute observations. Intel engineers (with over 30 years' experience) don't neccessarily have our best interests in mind.

  39. hmmm... by rebelcool · · Score: 2, Informative

    Generally one of the best processor architecture books out there is Computer Organization and Design. It does assume an amount of digital logic design (flipflops, clock, multiplexors and other basics) though it does have an appendix which briefly glosses over those. Honestly, to really "get" it you need an education in it.

    --

    -

    1. Re:hmmm... by geekee · · Score: 4, Informative

      Yes. Hennessy and Patterson (or in reverse, I have Stanford bias :-)) is the bible of computer architecture. They invented the RISC processor independently at Stanford and Berkeley. Their processors evolved into MIPS and SPARC.

      --
      Vote for Pedro
  40. Is this the right move? by Zebra_X · · Score: 4, Interesting

    Intel has shown no real interest in joining the 64-bit fray. Indeed, they don't have much choice. To release a 64/32-bit chip at this point would truly create an Itantic out of the Itanium. Microsoft would have more or less wasted it's time producing low volume products such as SQL Server 64 and XP 64 (different than XP 64-bit extended which is as yet to be released). Other consequences for such a shift in strategy would include, a number of people investing in the itanic platform who would be the proud owners of an all but useless, but very expensive hardware platform on their hands.

    Most real world tests point to AMD chips being faster. The Int and Floating Point Tests still belong to the P4 3.2, but the P4 is having to pass the 1st place troughy to AMD when it comes to games and office productivity.

    And then there is price. For $320 you can get $700 worth of Intel performance. Mind you this is the AMD64 running in 32-bit mode.

    It would appear that all that is really needed to justify mass market adoption is a consumer OS, that would be Windows XP 64-Bit extended. Currently in Beta. The only delay there is that the .NET framework is not 64-bit ready. We can probably expect it's release with VS.NET Whitby, a.k.a. .NET 2.0.

    After that - we just need to see some AMD adoption in the mainstream pc builders.

  41. Do you know what you're talking about ? by vlad_petric · · Score: 4, Interesting
    Matlab is mostly loops. Loops generate branches with high predictability, and as a consequence deep pipelineing won't incur much performance loss. Furthermore there's a lot of parallelism in those loops, and the out-of-order execution engine is quite good at exploiting it (i.e. hide the long latency of FP ops by overlapping them)

    It's much more likely the size of the L2 cache is affecting you (i.e. your working set does not fit into P4's L2 cache but it does in Barton's).

    If you don't believe me, try the demo version of Intel Vtune performance analizer on matlab running one of your programs.

    How well your caches perform is probably the most important thing for a processor today, as the speed of the main memory is a couple of orders of magnitude under the speed of the processor. It takes a couple of hundred cycles to service an L2 miss, while a long FP operation takes at most 20 cycles.

    --

    The Raven

    1. Re:Do you know what you're talking about ? by Luminous+Coward · · Score: 1
      It's much more likely the size of the L2 cache is affecting you (i.e. your working set does not fit into P4's L2 cache but it does in Barton's).
      Barton and Northwood have the same amount of L2 cache (512 KB). However, Barton's L1 data cache (64 KB) is indeed much larger than Northwood's (only 8 KB).
      It takes a couple of hundred cycles to service an L2 miss
      Several high-end x86 processors (e.g. the Xeon) sport an L3 cache ;-)
  42. What you meant to say was... by eyegone · · Score: 1

    It's not the length of your pipeline; it's the thickness.

    --
    "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
  43. Effective pipeline by jmv · · Score: 3, Interesting

    I read somewhere that on the P4, when an instruction is already in the L1 cache, the pipeline gets shortened. That's because the L1 instruction cache stores pre-decoded instructions (micro-ops). This means that when the instruction is reached again, the decoding (and branch prediction?) steps are already done, shortening the pipeline. When the instruction is not in cache, there's already a big hit anyway. With that in mind, we'll need to see whether the extra pipeline stages in Prescott will still be there when the instruction is in the L1.

    1. Re:Effective pipeline by addaon · · Score: 1

      It's extremely unlikely that prescott will need any additional stages in decode. It's feasible, but not likely, that it will need a delay stage between decode and issue, but there's already one around there. So these extra stages are probably all visible even to instructions already in the icache.

      --

      I've had this sig for three days.
    2. Re:Effective pipeline by Rufus211 · · Score: 1
      You are sort of correct. It's actually that when it's not in cache the pipeline gets *lenghtened*. From this post over at Ace's:
      P4 Willamette and Northwood have a 20 stage pipeline when seen from the trace cache. However, including the decode path which stores its results in the trace cache, the pipeline is 28 stages.
    3. Re:Effective pipeline by glsunder · · Score: 1

      "P4 Willamette and Northwood have a 20 stage pipeline when seen from the trace cache. However, including the decode path which stores its results in the trace cache, the pipeline is 28 stages."

      Going from 20 to 30 stages sounded extreme. Maybe it's simply an addition of 2 stages -- going from 28 to 30. This wouldn't hurt performance much at all on stalls.

  44. what is "processor speed"? by rebelcool · · Score: 3, Informative
    Are you referring to "clock speed" perhaps? Clock speed is only one part of what determines performance, along with about a dozen other things I can think of.

    No processor, barring a complete architecture change (in which case its a different processor entirely) will double its performance simply by doubling the clock speed.

    It really depends on how you define performance too and what your software is doing. Doing heavy I/O? Processor has little to nothing to do with I/O - it just hands it off to the bus and I/O controllers to take care of and then does something else while waiting for the interrupt.

    --

    -

    1. Re:what is "processor speed"? by addaon · · Score: 1

      Many PICs and other processors with memory entirely onboard (or even just static memory offboard that has a maximum speed higher than the processor) will show a linear relationship between performance and clock speed.

      For high end processors, of course, you are correct.

      --

      I've had this sig for three days.
    2. Re:what is "processor speed"? by Liebre · · Score: 1


      Processor speed is what sell to the masses, to the regular user. So the strategy is: Increase the speed no matter what so we can market it.

  45. This is just for marketing by Anonymous Coward · · Score: 0

    Even if we know that this could end up making the processor slower, it doesn't really matter because Joe Idiot sees 4.0Ghz and thinks "Boy, this is the fastest thing out there". I have a friend just like that, who is somewhat more into computers than most but still not totally involved, back when the Athlon-XP came out, and the processor faster than the P4 a couple of hundred megahertz above the XP's clockrate, and the benchmarks proved this as well as the real world use. Well, he didn't believe me, he thinks that the clockrate is everything, bigger is better to him. He probably still thinks this, but I don't argue because it's a waste of my time. I doubt many of us will be able to change other people's minds about it either. Let's face it most of the people in the world buy form Dell, IBM, Gateway, etc, and that's what their selling, so that's what people are getting.

    The only thing really holding AMD64 processors back is lack of the supported 64-bit Windows edition, sure for those of us that run Linux/*BSD it's greats, but since the majority runs Windows, sales won't increase dramatically until XP 64-bit comes out, so people will be buying Intel until then.

  46. Pass the Crack Pipe, Please... by Anonymous Coward · · Score: 1, Interesting

    Yeah, we all know that Q3 and MicroSoft Word are the best methods of testing platform-independent CPU (note: not GPU or GPU driver) performance... You should really lay off the crack pipe. For someone who wants to know how number crunching compares on either platforms, Q3 and Word (and Photoshop) aren't going to tell them squat about how something like MatLAB will perform. Q3 is mostly going to tell you the state of GPU tech and GPU drivers than integer ops, and MS Word is obviously going to be better (supported) on MS Windows than on Apple anything. Photoshop is only relevant to people who work a lot with Photoshop, like desktop publishers. That PCWorld benchmark is the most worthless piece of garbage that somehow gets linked to each time people bring up performance comparisons between x86 and PPC, even though it has no bearing on the performance of processes being discussed.

    1. Re:Pass the Crack Pipe, Please... by dasmegabyte · · Score: 1

      I'd like to see cross-platform performance benchmarks in crunching audio, making MPEG-2 video (and not using some hacked codec, but an actual fucking product from a company that produces reference quality video. I don't want to hear about how much better DivX 5.0.1.2.1 encodes Simpsons episodes than the Sorenson codec). Or how well they render scenes in Blender. Or how many files, web pages, or streaming media packets they can serve per decade. Or anything that, you know, mattered.

      Honestly, MS Word performance? Macintosh word is so different from the PC version it's ridiculous. And I've never heard about anybody requiring a high end workstation to run Photoshop filters slightly faster...shit, I was editing a 4'x8' 300 dpi banner on my damn g3 600 today...it was slow, but not $3000 worth of slow.

      Anyway...can you see why a magazine called PCWorld, who sells its subscriptions to the type of xenophobic consultants who still think that the vesa local bus was a pretty neat idea, would publish benchmarks that created an artifical boost for x86 performance? I knew you could.

      --
      Hey freaks: now you're ju
    2. Re:Pass the Crack Pipe, Please... by bhtooefr · · Score: 1

      Three words: real world benchmarks. These are some of the things that these systems are going to be used for. If Word performance sucks on a Mac because of poor optimization, then it sucks, as people are going to end up using Word on either platform. Also, it wasn't just PCWorld who did the tests (or so they claim). MacWorld (the Mac version of PCWorld) helped develop the tests (as PCWorldBench wouldn't run on a Mac, after all), and also handled running them on the Mac. However, if you want (say) AnandTech to put a G5 2x2.0GHz up against an identically configured Opteron 246 2x2.0GHz, go for it. Just be sure to donate to the cause.

    3. Re:Pass the Crack Pipe, Please... by dasmegabyte · · Score: 1

      It's not "poor optimization," Word for Mac is an entirely different fucking program. Programmed by different people using a different codebase with a markedly different interface, and some differing features. Therefore, it's as useful to do any sort of test comparing them as it is compare Word to OpenOffice.

      And for what it's worth, nobody gives a shit about Word "performance." I used Office X on my 300 MHz G3 with no problems, no lag, no nothing. The G5 is definitely faster by a factor of at least 10. Therefore, any "benchmarks" are measuring imperceptable differences in usability. If you have fewer imperceptable slowdowns than I do, why should I care?

      --
      Hey freaks: now you're ju
  47. Re:Linux Cost Tax Payers at least $410M...nothing by Anonymous Coward · · Score: 0

    First paragraph of linked story contradicts your whole argument:

    SANTA CLARA, CA -- NVIDIA Corporation (Nasdaq: NVDA), the worldwide leader in visual processing solutions, yesterday announced that NASA is using its technology to reconstruct Martian terrain from transmitted rover data in photorealistic virtual reality under the Linux operating system, allowing scientists to explore Mars in 3D as if they were actually moving freely on the planet's surface.

    it's not controlling the rover, it's for reconstructing data sent back.

  48. Smart Business Move by m3j00 · · Score: 2, Insightful

    Intel is trying to move chips. One way to improve your sales is to drum up higher GHz for the uninformed masses. If you can do this while still producing competetive chips, you will outsell a similar performing chip that's runs 700MHz or so slower than yours.

  49. memory by Anonymous Coward · · Score: 0

    and yet memory isn't decreasing in latency of access @ the same rate. Intel should consider adding a pair of independent memory controllers to the CPU, or at the very least, to the MCH. and with the move to 64-bit pointers and math, doubling the cache line size to 128bytes.

  50. Re:Hmmm. by bhtooefr · · Score: 1

    AMD vs. Intel:
    Barton AXP: 75W (power-cutoff overheat protection)
    Northwood P4: 90W (underclock overheat protection)

    Clawhammer A64: 85W ("Cool & Quiet" - underclock)
    Prescott P4: 100+W (underclock)

    I think AMD's heat management isn't the problem.

  51. thats branch prediction... by rebelcool · · Score: 2, Informative
    and is now common. These days it usually works by maintaining a history table of past branch behavior. Generally if you've had alot of branches before, you're in a loop, and statistically are likely to stay in the loop.

    You can also go back and "fix" instructions to an extent (and not in all cases) while in the pipeline in case of incorrect branching. x86 sort of sucks for this though because of the variable length instructions.

    Alot of computer science is based on those kind of statistics. You see it in memory management as well. Most data structures are created and quickly destroyed. But those that aren't tend to stay around for a very long time and not point to quickly created and destroyed ones.

    --

    -

  52. Marketing influence on engineering by alphorn · · Score: 1

    It is reasonable to assume that pipeline length is influenced by marketing. More MHz mean a competitive advantage, even if they yield no extra CPU power. Athlon 64 shows that a 2.2Gz 12 stage pipeline can be equivalent to the 3.2 GHz 20 stage pipeline of a P4 (the AMD model naming then tries to hide the lower MHz numbers).

    This demonstrates that performance can be achieved in different ways, and intel went for "speed demon" - throw MHz at the problem. There are good reasons for this, but for example the "brainiacs" with lower MHz usually have lower power consumption. Maybe a more brainiac approach was opposed by marketing? They could never have pushed through a clearly inferior solution though.

    Finally, I don't think that the Prescott pipeline will be quite as long as 30 stages, but time will tell. The architectural differences between Prescott and Northwood (the current P4) will be a lot smaller than those between Northwood and the P3. Therefore I expect that the number of apps that will run slower on a Prescott (which, of course, needs a somewhat higher clock frequency) will be very small. So move along guys, there's nothing to see.

    1. Re:Marketing influence on engineering by Cecil · · Score: 1

      Maybe a more brainiac approach was opposed by marketing? They could never have pushed through a clearly inferior solution though.

      Hahaha. Clearly you have never seen a marketing department.

  53. yep by rebelcool · · Score: 4, Insightful
    MIPS is a nice architecture to learn. Clean and simple. Useful, too, if you get into game design (sony uses MIPS based chips in the playstations)

    Stay away from x86 if you're just starting out...

    --

    -

    1. Re:yep by globalar · · Score: 1

      In fact, the Xbox is the only popular console in recent memory that has an x86 chip. The Nintendo 64 and Playstation series (as mentioned), have all sported MIPS architectures. The Dreamcast and previous SEGA consoles were (Hitachi SH)RISC.

      MIPS is used in routers (especially high-end, like from Cisco), broadband/wireless adaptors, TV's, and TV set-tops.

      MIPS is commonly taught to CS / programming students to introduce them to assembly and applied architecture concepts. A common and short book for the beginner is "Introduction to RISC Assembly Language Programming" by John Waldron. It uses the SPIM emulator.

  54. Personally.... by HotNeedleOfInquiry · · Score: 1

    I thought his tone was perfect.

    --
    "Eve of Destruction", it's not just for old hippies anymore...
  55. Re:Linux Cost Tax Payers at least $410M...nothing by bhtooefr · · Score: 1

    They could have gone with... VxWorks

    And they did, dipshit.

  56. Care to Expound? by Anonymous Coward · · Score: 0

    AMD's only black mark is the K6, which until the K6/3 has only 24 bit FPU, and as such has many compatibility problems. Of course, if you're running linux, you'll never see them, so the faster K6s are not useless yet. (Cobalt Raq3 owners rejoice.)

    I've got a bunch of old K6s at home: 450s overclocked to 500 in Asus TX97-LE's [Intel TX chipset] and laptop 550's in Epox MVP-3G5 motherboards [Via Apollo MVP-3 chipset]. Care to describe which K6s in combination with which software packages have "compatibility problems"?

    At the moment, I'm most interested in LabView 7.0 and Java 1.4.2 running on Windows 2000 servers, although I may soon upgrade to Visual Studio .NET on Windows 2003 [still in conjunction with LabView 7.0].

    I'm doing some pretty heavy duty math, and I better know if I should expect some lousy round off error [or whatever].

    Anyway, thanks for any insight you can provide!

    1. Re:Care to Expound? by drinkypoo · · Score: 1
      I would guess that any modern operating system's math libraries are smart enough to know that the K6/2's FPU has only 24 bits of FPU. Back in the day it was a serious problem though, and you needed special drivers (Cards which were an issue at the time included the Matrox Mystique, for example, and the NVidia Riva 128 and TNT) because they expected 32 bit responses.

      The K6 is a great chip, it's actually RISC internally for example. When you optimize for the K6 (say, using GCC 3.2 with the -march=k6, k6-2, or k6-3 flag) it flies, especially since it has 128kB of L1 cache (64k each instruction and data.)

      I would expect that the K6/2's FPU is not even used when an application demands a 32 bit FP result, which should hurt performance considerably. K6/3 does not have this problem.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Care to Expound? by mosel-saar-ruwer · · Score: 0

      Thanks!

      Here's what I've discovered: If you're on a Windows platform, invoke Start | Run | winmsd.exe, then click on System Summary to get the Family, Model, and Stepping of your K6. [It beats the hell out of unscrewing the case, prying off the heat sink, and reading it directly from the processor.] Anyway, it looks like we have [and these are PDF documents, by the way]

      Family 5, Model 7 == K6
      http://www.amd.com/us-en/assets/content_type/white _papers_and_tech_docs/21641.pdf

      Family 5, Model 8 == K6-2
      http://www.amd.com/us-en/assets/content_type/white _papers_and_tech_docs/21641.pdf

      Family 5, Model 9 == K6-III
      http://www.amd.com/us-en/assets/content_type/white _papers_and_tech_docs/22473.pdf

      I discovered that I'm running a combination of Model 8's and Model 13's [the laptop K6-III+ that I slapped in the EPoX motherboard] here at home, so I'll have to remember to use the Model 13 for my calculations. [Actually, I can't find any official info on the Model 13, so I'll have to hope that it received the 32 bit FPU.]

      Anyway, thanks again!

  57. Hey! I gotta git me one of those! by Anonymous Coward · · Score: 0

    I'd eat hot grits for a chip that has "poser consumption."

  58. Why bother with x86... by $criptah · · Score: 1

    I do not understand Intel nor other companies that do not try to develop anything besides x86. Let's face it, the architecture is flawed at its root. There are several issues that have been there from its very beginning(that is a topic for another forum) and instead of coming up with something new, Intel tries to patch its products with more crap.

    When Apple realized that MOS Technolgies' clone of 6800 was not the best solution, the architecture was replaced with a new one that better suited Apple's goals. Why can't Intel retire x86 and move up to something new. If you think about, they can make a good chunck of money by coming up with a processor that can put AMD and Apple next to nothing. If they want to compete with this CPU, they better have good branch prediction.

    1. Re:Why bother with x86... by Indy1 · · Score: 2, Insightful

      x86 is old and flawed, but it has such a base of o/s 's and apps for it that its not funny. Look at itanium. There's hardly any programs availible for it, and its hugely expensive. In order to jump to anything new, you need Uncle Bill to port windows to whatever your coming up with, and we all know how fast and effective M$ is at doing such a complicated task (i.e. not very fast or effective at all). Sure Linux and the bsd's can be ported without a huge amount of work, but a cpu manufacter cant survive without a windows base.

      --
      Lawyers, MBA's, RIAA? A jedi fears not these things!
    2. Re:Why bother with x86... by F2F · · Score: 1

      they tried. it's itanium... ... itanic in some circles.

      HAND

    3. Re:Why bother with x86... by Anonymous Coward · · Score: 0

      Intel gave up on x86 long ago. I believe the 486 was the last x86 processor, though it could be later (I know P2 was not x86). The x86 instructions are broken up in a hardware emulation layer. This has actually turned out to be a good thing for performance AND compatibility.

    4. Re:Why bother with x86... by nelsonal · · Score: 1

      Developing a modern Computer CPU (no licesning or embedded stuff) requires about $1-2 billion. You have two choices with your investment, make an elegant solution that has 0 inital software compatability, or work with an archaic slipshod architecture, that will likely be compatable with virtually all consumer software written in the last decade. Which one are you going to bet your billion on? Intel's trying the first choice, with Itanium, and even in enterprise software, which has a much better record for diversity of hardware supported, and so far their only sales have come from their co-developer, and a little house that does something with graphics and was tottering near bankruptcy three years ago.

      --
      Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
    5. Re:Why bother with x86... by AvitarX · · Score: 1

      Why don't you give a short list of flaws.

      I am too lazy to read long detailed articles, that lead to many others being read to understand them.

      And all I here on /. is that alpha is the only real prossessor.

      people complain about x86. but then people say x86 64 cleans it up.

      Then I here Itanium called Itanic, and too difficult to make a good compiler for.

      Then I here RISC is to CISCy now.

      so please enlighten me on the huge flaws of x86 that make it so useless.

      even though it does quite well in the benchmarks and spanks in price performance?

      --
      Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    6. Re:Why bother with x86... by Anonymous Coward · · Score: 0

      OK, the short list is this:

      1. lack of general purpose registers
      2. inconsistent, arbitrary instruction set
      3. long pipelines exacerbate stalls

      Fortunately, register renaming and out-of-order execution pretty much make #1 not matter too much. Sure, it's a bit of a pain to write assembler for it, and the logic in writing a compiler is trickier, but that doesn't matter to end users. And it turns out that fewer registers make context switches that much faster.

      The typical RISC architecture has every instruction being the same size (32 bits), while more common x86 instructions, even relatively common ones can be just a single byte, with others even being 12 bytes. As such, an x86 instruction decoder is a big complicated mess, but it turns out that throwing more transistors at the problem eliminates any potential slowdown. As a matter of fact, all those short instructions reduce memory bandwidth and instruction cache size. And by using the I-cache to cache predecoded instructions (called traces), you don't even need to worry about the decoder that often, so #2 isn't so important either.

      The more transistors in any one path on the CPU, the slower your clock frequency must be. So if most paths are 30 gates deep, but one is 50 gates deep, you must limit the clock so that it will always wait for that 50-gate path. If you can split it up into two paths with 25 gates, it will add an extra cycle to your pipeline, but you can increase your frequency enough that it doesn't matter. So what do you do when a branch is mispredicted? Execute instructions from one of the threads that isn't stalled while the pipeline fills up again. Thus, hyperthreading eliminates #3 as a major source of slowdowns, unless you're mainly concerned with a single thread of computation.

      Intel has essentially proven that if you throw enough engineers and enough transistors at the problem, you can make any architecture perform at the top of the list.

      aQazaQa

    7. Re:Why bother with x86... by Anonymous Coward · · Score: 0

      Wow, it really is an unpopulare idea if you need to hide behind anonyminity to post for it's defense.

    8. Re:Why bother with x86... by Hoser+McMoose · · Score: 1

      Hmm, I might just be rehashing some of the things you've already heard, but...

      Here's a short list of some of the main problems with x86:

      1. Too few general purpose registers and some restrictions on how they can be used.

      2. Stack-based FPU can be a real pain in the ass and you spend a lot of time and effort shuffling data around.

      3. Some people don't like x86 assembly.

      4. x86 decoders are very complicated and generally kind of messy.

      Now, for #1, first off there are lots of renamed registers. Rather than moving data in and out of regular registers, the processors instead move them from the regular visible registers into the hidden rename registers and back again (kinda oversimplied explination there, but that's more or less how it works). This eliminates the penalty of moving the data to and from cache all the time, but you still need to handle extra load and store instructions that wouldn't be necessary if you had more real registers. There are also some restrictions on what registers can be used for which instructions, so they aren't truely "general purpose" registers.

      AMD64 (aka x86-64) helps here in two ways. First they double the number of GPRs from 8 to 16, and secondly it eliminates the restrictions on what registers can be used for different instructions.

      For #2, the stack-base x87 FPU is a bit of a problem, and there isn't really anything that can be done to easily fix it. So instead it's been replaced. SSE may have started as just a vector engine, but SSE2 is really a full-fledge FPU replacement. For AMD64 FPU code AMD is recommending that everyone use SSE2 exclusively.

      For #3, x86 assembly? Well that isn't nearly the problem that it used to be simply because so few people do assembly anymore. Generally speaking processor optimizations have become sufficiently complex and compilers sufficiently smart that you usually don't buy much in terms of performance when you code in assembly. Also some people really do like x86 assembly, so I guess it's just what you get used to (I personally am not a big fan of it).

      The x86 decoders, #4 on my list, are always going to be tricky, but Intel has a rather nifty solution in their trace cache. Rather than saving pre-decoded instruction like is normally done with an L1 I-cache, Intel instead stores post-decoded instructions in their trace cache. They still need a decoder, but since 90%+ of their instructions are coming from the trace cache they can use a much simpler/cheaper decoder.

      As for the other stuff. RISC and CISC, in my mind, are pretty darn close to the same thing these days. Sure you've got more instructions with CISC chips and more visible registers with RISC chips, but with the decline in assembly programing and renamed registers, these are quickly disapearing as differences. Both chips end up decoding instructions internally anyway and you end up with very similar execuation paths in the end. Basically on the inside they are the same, just dressed up differently.

      Lets see here. Alpha only real processor? Alpha was nice, they did a bang-up job designing it and than DEC managed to do a bang-up job driving the company into the ground and taking the processor with it. Interestingly, the Alpha was designed from the ground-up to clock to high speeds. They took the idea that high clock speeds were the way to go to get performance, yet now if you read this thread people are complaining about Intel doing the same thing.

      Itanium is the Itanic? Well now that one is a complex issue and perhaps best left for another thread. Suffice it to say though that the Itanium line of processors have yet to find the right market niche in which to live, despite a LARGE sum of money that Intel has invested. In my mind the issue is that Intel is pushing a chip to a problem that demands a system as a solution. No matter how good the chip is, Intel still needs the infrastructure there to support it. They need HP and SGI to provide the systems and support and they nee

  59. Dilbert Marketing by stuffedmonkey · · Score: 2, Interesting

    This is the end result of engineering driven marketing... When you relentlessly try to make the chip with the "most megahertz', you lose focus. AMD and Apple/IBM have started to pull away in quality - in terms of actual work done per clock cycle. While it's true that the average Joe or PHB might not know any better - you can only continue on so long...

  60. GRAMMAR! by Anonymous Coward · · Score: 0

    "Further contributing to the MHz Myth, The Register and ZDNet are reporting that the new P4 core, codenamed Prescott, will have a longer pipeline then Northwood."

    Should be "than"

    1. Re:GRAMMAR! by homeobocks · · Score: 0

      Wow! I can't believe your score on that is really zero! Insightful, funny, useful, on-topic! You should get a 5 on that! To lighten things up, How many [intel|amd|ibm] CPUs does it take to push a bit in the registry?



      33. One to hold the bit still, and 32 to push the registry.

      --
      MOUNT TAPE U1439 ON B3, NO RING
    2. Re:GRAMMAR! by scottgfx · · Score: 1

      Quote: "then Northwood."

      Perhaps a NEW Northwood comes out AFTER we're done laughing at Prescott. :)

      I don't have a sig you insensitive cod!

      --
      It's mandatory to wash your hands before returning to the land of Dairy Queen.
  61. Re:BUSH LIED, SOLDIERS DIED by Chuck+Bucket · · Score: 0, Offtopic

    Fair enough, I appreciate your candor.

    CB

  62. Branch prediction by alphorn · · Score: 1

    "Branch likely" has been done - and deprecated - e.g. by MIPS.

    You underestimate how much effort goes into branch prediction already. Modern predictors even notice that a certain branch is taken exactly every 4th time. Any hinting that can be provided at compile time - especially if it's done without profile feedback - would not add much value to this.

    1. Re:Branch prediction by Elladan · · Score: 1

      MIPS "branch likely" instructions don't really have anything to do with branches being likely. They're a weird hack which basically instructs the CPU that it can ignore an instruction in the branch delay slot if a branch is taken (normally, the instruction must be executed).

      This was deprecated because... well... It's almost entirely useless. It's essentially there just to simplify things for the compiler and allow it to make a few micro-optimizations.

      Actual "branch likely" instructions that mean what they say are available on modern CPU's and are used extensively, for example in the Linux Kernel.

  63. Re:WHEN CLINTON LIED, NO ONE DIED by Chuck+Bucket · · Score: 0, Offtopic

    Er, the topic is about Intel, I honestly think that there are more important issues we should be addressing. My posts here are simply a small ripple in the ocean that is /. with all of it's goatse.cx, gnnas and fps post.

    still, I appreciate your candor, and will drop the commentary for now.

    CB

  64. Completely Wrong by Anonymous Coward · · Score: 1, Interesting

    --Wow, I can't believe this got modded as 'Insightful'. 3000+ is a performance rating that is designed to show the CPU performs equivalently to a P4-3Ghz.

    If you look at some actual benchmarks, you will see that the P4 3.06 is actually better in some cases than an AthlonXP3000+ (note this is the 2.167Ghz Barton in the graph)

    SpecFP
    SpecInt

    Additionally, the data shows that a 3Ghz P4 is in fact MORE than 3x faster at SpecFP than a 1Ghz P3. Perhaps you should inform yourself a little before posting FUD.

  65. Watching... by Goalie_Ca · · Score: 1

    Intel getting closer and closer to registering every single gate

    --

    ----
    Go canucks, habs, and sens!
  66. A note about pipeline stages by Anonymous Coward · · Score: 2, Interesting

    The reasons that Intel has for increasing the # of pipeline stages seems, to me, more for marketing than actual performance.

    By increasing the # of stages (say, to do less work per stage), they're able to minimize interconnect delay (among other things), and therefore bump up the processor speed.

    It doesn't mean they'll be able to do more -- in fact, they're doing less per stage, just at a faster rate. (Whereas I suspect the Athlons are doing more per stage, and that's why we're seeing 2GHz Athlons tying or beating 3.2GHz Pentiums.)

    Marketing-wise, it'll be a win for Intel. Performance-wise (due to pipeline stalls), these changes will demand that Intel keep bumping up chip performance or else lose out to AMD. Of course, we all know which of these two criteria are the most important to the bottom-line.

  67. Another possiblity... by Anonymous Coward · · Score: 0

    ...is that your work computer is so loaded with crap [MS Office, MS Voice Analysis, Exchange/Groupwise/Lotus Notes client, fax client, Active Directory/Novell Directory Services/iPlanet authentication client, Attachmate host emulation software - and we haven't even begun to mention the specialized packages] that it takes forever and a day just to pull up any single program [probably from swapped memory, even if it's been "loaded" once already].

    Sometimes my jaw just drops when I see how much crap gets loaded on some of these business computers.

    [And it also drops when I see how much spyware gets loaded on the typical home user's computer...]

  68. Technical discussion by Rufus211 · · Score: 4, Informative

    For those into the technical side of this type of stuff and heck of a lot higher S/N ration, check out the Ace's Hardware forum. There's a large thread going on overthere taking about the rumors and what it would actually mean.

  69. true by rebelcool · · Score: 1

    I should never say "no" or "nothing" when it comes to chip architecture... somewhere in the world there is a family of niche devices that do things in weird ways for their own reasons.

    --

    -

    1. Re:true by addaon · · Score: 1

      Very, very true. Personally, I'm a stack machine fan... but everyone should really know about microcontrollers, stack machines, transputers, bitslice processors, and them what-are-they-calleds that switch threads every cycle to eliminate the cost of latency without cache.

      --

      I've had this sig for three days.
    2. Re:true by Endive4Ever · · Score: 1

      somewhere in the world there is a family of niche devices that do things in weird ways for their own reasons.

      It's a 'weird way' for a chip's I/O bus to run at the same clock rate as the central processor?

      Goodness, times have changed!

      --
      ---
  70. Uh. by juuri · · Score: 1

    The only reason Apple went with PowerPC is because MOT completely blew it on the 680x0 line by being extremely late, with horrible yeilds and crazy prices. 68040s and 68060s were good procs that would have been great had they come out a year earlier.

    The same thing MOT went on to do with PowerPC.

    --
    --- I do not moderate.
  71. Mod parent up: Is this true? by Anonymous Coward · · Score: 0

    I thought that SSE and MMX both had significantly lower precision than standard IEEE floating point ops. If I'm wrong, please correct me, but if it is lower precision, it makes it useless for Real Work(tm).

    Is this true? [I'm pretty sure it's true of Altivec, but that's the first I'd heard of it for SSE/MMX].

  72. No offical numbers eh? by Mercid · · Score: 1

    I thought it was known that they added 11, for a total of 31 stages. Ouch. Its looking more and more that one could stay with a highly overclocked Northwood while waiting for a Athlon 64 with all the new toys like PCI-X and some DDR-2.

  73. Poll options by hayden · · Score: 1

    1) Yes, I knew that.
    2) I only read slashdot to karma whore.
    3) I've heard Cowboy Neal has a longer pipeline

    --
    Nerd: Derogatory term typically directed at anybody with a lower Slashdot ID than you.
  74. what, are you an expert? by mrm677 · · Score: 4, Insightful

    "As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls

    Get off your high horse. Intel architects aren't dummies. Itanium benchmarks are starting to whoop some serious ass and the P4 and Athlon have been neck-and-neck for years. I'm sure Prescott will perform very well.

    I can get into all kinds of architecture speak as to why your simplistic notions of mispredictions and pipeline stalls might not be so terrible. Who knows? Maybe Intel will execute both paths of a branch? They've already got partial instruction replay to make squashes much less expensive. With deep speculation, a big instruction window, good bypassing capabilities, and effective non-blocking caches, "pipeline stalls" are not an issue due to branch mispredictions. The bigger issue is memory latency/bandwidth and Intel has always done well with that. A branch misprediction can be easily tolerated...an L2 cache miss can't.

    1. Re:what, are you an expert? by EvilTwinSkippy · · Score: 1
      You have to remember that a garden variety PC is a very unpredictable environment. You have network packets coming in, mouse events, keyboard presses, USB chatter, DMA access, every event generates and interrupt that requires the processor to stop what it's doing, and start the pipeline over again.

      Yes, it will be very useful for calculating seti@home and missile simulations, namely because the task it absolutely predictable and requires very little disk or network I/O. Everything else slogs it down. It's like a Hummer in Tokyo. Sure it can go off-road, but where are you going to park it?

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    2. Re:what, are you an expert? by SuiteSisterMary · · Score: 1

      Yes, but a garden variety PC is probably stalling due to accessing the disk, waiting for packets to come in, or the user sitting there not doing anything. They're not going to notice the difference between a P2-450 and a P4-4ghz.

      --
      Vintage computer games and RPG books available. Email me if you're interested.
  75. IBM's doing it too... by mnemonic_ · · Score: 1
    Instead of the original G4's 4-stage or G4+ chip's 7-stage integer pipeline, the PowerPC 970 follows the superpipelined approach of the 20-stage Pentium 4 with a 16-stage integer pipeline -- 21 stages for floating-point instructions, as many as 25 stages for single-instruction-multiple-data (SIMD) multimedia instructions.

    Look! IBM's perpetuating the MHz myth!

    Guys, there's more to CPU architecture than what Apple's advertising department claims (or at least used to claim). I don't think anyone would doubt the PPC970/G5's superiority to the G4 performance-price wise (or has Apple somehow made a terrible mistake? ha), and yet it has a far longer pipeline than the G4. Perhaps there is more to pipeline size than trying to achieve a higher clock in exchange for less computation per cycle?

    Or perhaps the only "megahertz myth" are Apple's vast simplifications of modern CPU technology?
  76. Re:So What? - Speaking of Papers... by the_ed_dawg · · Score: 1
    So the happy few, highly paid architects, 30 years-experience in the industry, hundred-published scientific papers at Intel decide that the next gen chip will have more stages and they have to be called morons ? How do you know better? ... Let the pros do the work and go back playing Quake.
    w00t.

    For those of l33t h4x0r5 out there who are IEEE members, check out "Increasing Processor Performance by Implementing Deeper Pipelines" by Eric Sprangle and Doug Carmean of Intel's Pentium Processor Architecture Group in 2002. (Sorry about not having a link. I'm not at a location with IEEE Xplore access.)

    In short, the paper describes how creating a deeper pipeline and increasing L2 cache can improve performance by 35-90% over a 2-GHz P4. This improvement is not dependent on process, so one may anticipate a similar improvement based upon the new process, although hard data is not available to me at this time.

    The paper acknowledges branch misprediction as the leading cause of performance degradation and includes the penalty in the above mentioned statistics.

    If there's anything I've learned about computer architecture, it's that there are always more factors than you know what to do with. Got a problem with branch penalty? Make a more accurate predictor. In the meantime, you increase throughput with a longer pipeline. Why? Because everything else gets the boost. The golden rule of architecture: MAKE COMMON CASES FASTER!!!

    --
    There are two types of people: those prepared for the zombie apocalypse and those who will be eaten.
  77. "As most of us know..." -- riiiight by nazgul000 · · Score: 2, Insightful

    "As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls."

    Sigh... most of the people I know cannot place the planets of the Solar System in their correct order. What a rarefied realm we inhabit here...

    1. Re:"As most of us know..." -- riiiight by gerardrj · · Score: 1

      While I know what you mean, there are indeed at least two "orders" for the planets of our solar system. Neptune and Pluto tend to swap out for the title of farthest from time to time If memoery serves, Pluto is currenly farther from the Sun than Neptune.

      --
      Article X: The powers not delegated... by the Constitution...are reserved...to the people
    2. Re:"As most of us know..." -- riiiight by Anonymous Coward · · Score: 0

      +1 Pedant :)

  78. It won't be - look at the cache... by Chordonblue · · Score: 1

    As the P4 EE proves, more cache is really important to the performance of the P4. Take a look at the specs of all but one of the new Prescotts' and you'll see that they come with 1 MB of cache instead of the Northwood standard of 512K.

    That should at least allow Prescott to be on par with if not exceed the performance of Northwood. That said, I wouldn't expect it to be faster in everything. Those extra stages will hurt for certain functions no matter WHAT the cache.

    --
    "...Well, there's egg and bacon; egg sausage and bacon; egg and spam; egg bacon and spam; egg bacon sausage and spam..."
  79. Hyper Threading maybe the answer. by chobee · · Score: 1

    While in the past, longer pipelines did cause slowdowns, Intel maybe able to lengthen the pipelines by leveraging Hyper Threading (HT). http://intel.com/business/bss/products/hyperthread ing/overview.htm Its my understanding that HT uses the gaps in the pipeline as a second virtual cpu. -Cho

  80. Not Duron, not P4 - Athlon XP by Namarrgon · · Score: 1
    AMD have always denied that the PR number is a comparison to the P4 (though it does seem to work out pretty close on average).

    They claim it's supposed to compare against the the 'Thunderbird' model Athlon (the one that topped out at 1.4 GHz). Not the Duron.

    Most people will keep matching it against the P4 regardless, of course. Will this continue to hold true against the Prescott (allowing AMD to hike up their PR numbers by a goodly amount), or will they stick to their supposed guns?

    More info here.

    --
    Why would anyone engrave "Elbereth"?
    1. Re:Not Duron, not P4 - Athlon XP by FictionPimp · · Score: 0

      Which p4 do people match it with xp3000 and p4 3.06 533fsb or p4 3gig 800fsb? Owning all 3, i can say that my p4 3gig with the 800fsb is faster for everything I do.

  81. intel's design philosophy is sound by violently_ill · · Score: 1

    intel is not just lengthening the pipeline for marketing reasons, they're doing it open up another front in the battle for more performance. by lengthening the pipeline, they reap the benefits of higher clockspeeds at the cost of branch mispredictions, etc. however, all this means is that things like trace cache and improved branch prediction are more effective at improving performance. in other words, if both amd and intel were to devote the same resources to improving branch prediction and minimizing the penalties for branch misprediction, amd would yield less of a performance gain. occasionally, this will lead to oddities such as the p3 outperforming williamette on a clock-for-clock basis, but overall the philosophy is sound.

  82. Summary of article by utahjazz · · Score: 2, Funny

    A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor [colorado.edu], ISCA 2002.

    Let me guess...42?

    1. Re:Summary of article by Anonymous Coward · · Score: 0

      If only I could type (as opposed to hum) the opening synth part from the Level 42 track "lessons in love" That's the first thing I think of when I see the number 42 - not the song, just those first 3 notes. Sorta synth brass stab type sounding thing.

      To make my point, that strong bad guitar bit just wouldnt work without audio - it would just look like he was typing a bunch of gibberish for the guitar sounds. So if I just typed something like "bwaah bwaahh bwaahh" you probably wouldnt make the connection to the Level 42 track. Oh, and while I'm thinking about strong bad, do you think his rhythm guitar part is from the Heart song Barracuda? :)

      And as to this Northwood thing - I never made this association until just now after reading some of the other posts here - and now I cant help but equate the name with a certain porn star :/ Not sure I would want to buy a Northwood CPU now.

  83. Don't forget Prescott's larger L1/L2 cache sizes by Anonymous Coward · · Score: 1, Interesting
    It'll most likely be slower per clock cycle. What this means, is that it will take a faster clock cycle (4GHZ, for instance) to do the same amount of processing as the Northwood core.

    Prescott will have 16KB of L1 cache (Northwood has 8KB) and 1024KB of L2 cache (Northwood has 512KB). These changes will most likely increase the performance per clock cycle.

    Maybe the larger cache sizes will "make up" for the longer pipeline. I won't criticize Intel until I see benchmarks of 3.4GHz Northwood vs 3.4GHz Prescott.

  84. Cough *shill* cough. by ProtonMotiveForce · · Score: 1

    So how long have you been an AMD fanboy/stock holder/employee? This is complete crap. The P4 performs very well in matlab or mathematica.

    You're simply full of shit and yes, it's obvious.

  85. P4 vs AMD. by mesmartyoudumb · · Score: 0

    Let me start out by saying i am a die hard AMD fan,However - in the two most important processor measurements,; MFLOPS(Million FLoating OPerations per Second),and MIPS(Million Instructions Per Second) ,the P4 accels.

    In fact, even after over clocking my Barton 2500+ from 1.87 ghz to 2.45 ghz(3200+ is 2.2ghz mind you),i STILL can't touch a stock pentium 3.2 ghz with 800mhz fsb...or a 2.4 clocked to 3.2 :-)

    People say that a longer pipeline is going to cause a lot of problems,and be slow, but this is not true because of 3 simple reasons; More Advanced Hypertheading,Faster MhZ,Larger Caches. Honestly either 1 of those 3 solutions would solve any of the problems,but intel will be introducing all 3 of them,With the most obvious improvement coming from clockspeed.

    For single processor* crunching numbers,especially ones that have been optimized with SSL2 goodness, you just can't beat the pentium 4.

    If you want to go dual proc or more for science..well then theres nothing like an AMD opteron.

    I'm not going to upgrade my proc until Intel introduces the "socket T" And AMD introduces the socket 939 (I suggest you guys wait too)and both of them are running well and have proven to be highly overclockable.

    --
    "Comedy's a dead art form. Now tragedy, that's funny."
  86. Hello, McFly? by ProtonMotiveForce · · Score: 1

    It's called the Pentium-M line. They perform very well per clock (based on PIII to some degree) and are low consumption.

  87. Matlab, Schmatlab, I want to write some code! by Latent+Heat · · Score: 4, Informative
    Matlab is to the academic-scientific-engineering world what Visual Basic is to the accounting-business-data processing world.

    Your EE or ME or ChemE full professor as a grad student could have written a FORTRAN program to compute some stuff and write output to a numeric text file or perhaps draw some plots using a subroutine library. You are probably thinking that anyone who can't sling together C programs using VI to draw graphics straight to X is a luser, but I am talking about pretty technically savy people who don't have time to spend on this stuff and who employ armies of Engineering majors from foreign lands who are not up on this stuff either.

    My own take is that if a particular numerical calculation can be easily programmed by some package, it must not be on the cutting edge of research because someone has already done it. Besides, if your software package is really deep, most of the effort goes into the architecture and the data flows and into graphics, and the RAD bit is only simplifying a tiny part of what you are spending your time. A high-power scientific data visualization is really a video game, and how many video games are implemented in Matlab?

    But what Perl is to text processing, Python is to collections, and VB is to slinging together a GUI, Matlab is to numerics (what used to be FORTRAN libraries) -- it may not have the best algorithms, but it has a lot of algorithms -- it has a semi-decent scripting language, and it has some facility with producing plots from your computations and other data.

    Now that's the thing -- if you are doing matrix operations or using some canned function (most likely C under the hood), Matlab is as fast as fast can be. The minute you start looping in Matlab, it is interpreted and the speeds are in the Python range.

    Before you knock it completely, it has very good integration with Java modules -- more seamless than with C modules. While Java may be pokey for its GUI, for tight numeric loops the JIT is almost as fast as C -- no joke, a person should consider writing numeric extensions to Matlab in Java of all things, especially on Windows where they tweaked up Java 1.4.2_03. And how many scripting languages (OK, Jython) have this level of Java integration?

    But as a scripting language, Matlab has its shortcomings. It started out as a matrix calculator and has had features grafted on in a hodge-podge Visual Basic 6.0 kind of way. In terms of its data type restrictions and fubar scoping rules and brain-dead object extensions, I don't think, as they say, it scales very well.

    My other peeve is that it is proprietary, and while Math Works is not Microsoft, I worry if engineering schools, emphasizing use of "commercial packages students will use in the real world when they graduate" (as opposed to professors dinking around with their homebrew software for use in instruction), are becoming trade schools shilling for the big software houses. I don't have a lot of experience with it, but in place of Matlab we should be using stuff like Python and the Python NumPy extension -- Open Source alternative, comparable performance, C extensions for speed, but much more Turing complete, consistent, and scalable.

    And where is Matlab 6.5 using Java internally? Try doing a Files Open to start editing a Matlab script (M-file) with the Matlab editor window. One potato, two potato, three potato, and the window comes up. Now what language has that kind of GUI lag, I wonder what it could be?

    1. Re:Matlab, Schmatlab, I want to write some code! by dasmegabyte · · Score: 2, Insightful

      The reason Java GUIs are pokey for the most part is that people have been SPOILED by OOP. If you create a New window everytime, then yet, it'll be slow, because Java has to basically learn how to make the window in the given OS, lay it out, and populate it, all before it can display it (as opposed to VB/.NET, which apply very sneaky, often exasperating hints on how to make windows).

      Really, the New window should be made once, the optimizations saved in the assembly cache, and the same window used to subsequent calls. Some of the faster, non-Sun VMs do this kind of thing whether you tell them to or not.

      --
      Hey freaks: now you're ju
    2. Re:Matlab, Schmatlab, I want to write some code! by biostatman · · Score: 2, Insightful

      My other peeve is that it is proprietary

      You should try R. Free as in beer + speech, high level scripting, can link in compiled low level code (C, FORTRAN, maybe even Java), good graphics output, good matrix handling, lots of 3rd party extensions (most GPL'd). Not good for symbolic mathematics, though. Used heavily in the statistical community and actively developed by some very smart people.

      --
      For the love of $DEITY, loose != not win!!!!!
    3. Re:Matlab, Schmatlab, I want to write some code! by Dr.+Zowie · · Score: 2, Insightful
      Unfortunately, Matlab is still a category killer for certain kinds of pipelining. But the various open-source data analysis languages are coming on strong. Perl Data Language, Numeric Python, Octave, R -- they're all worth a look, though at least the first three fit the IDL niche a little better than the MatLab one. I'm not as familiar with R as I probably should be.

      Unfortunately, all of 'em (including MatLab) suck if you're working with chunks of data that are bigger than your cache, because you end up pumping stuff out over the main bus.

    4. Re:Matlab, Schmatlab, I want to write some code! by Anonymous Coward · · Score: 0

      Matlab is great for quick n dirty calculations.

      But here's a tip for you: avoid loops if possible. For example, if you're running a Monte Carlo, instead of looping over each sampled event, generate all the events at once. As you say, matlab is loaded with efficient matrix operation algorithms. It will even do a lot of memoization, for example if you are interpolating to make a pdf that you sample from.

      Ok, some of that won't make sense unless you've done that particular problem, but believe me, as long as you figure out ways to use matlab as more of a matrix calculator, it can be actually be pretty speedy. I wouldn't use it for anything heavy duty, but it sure beats the hell out of Excel. (Yes, I know some people who actually use Excel for nontrivial calculations! It's quite frightening)

    5. Re:Matlab, Schmatlab, I want to write some code! by forgotmypassword · · Score: 1

      R is a clone of S, which is primarily a statistics program. Is R really appropriate?

      And the syntax is ... uhg ... like S

    6. Re:Matlab, Schmatlab, I want to write some code! by Abcd1234 · · Score: 1

      Whoa, dude... that's almost like compiling the Java to machine code just in time for the CPU to execute it, and then reusing it later! Hey, you should totally write that up... you could call it "JIT", for Just In Time compiling! Yeah! This is a great idea! You're a genius!

    7. Re:Matlab, Schmatlab, I want to write some code! by sco08y · · Score: 1

      If it's in a cache it's not being compiled "just in time." It's being compiled beforehand and cached.

    8. Re:Matlab, Schmatlab, I want to write some code! by Abcd1234 · · Score: 1

      Someone needs to learn about how JIT compiling VMs work... A JIT-based JVM doesn't compile the bytecodes into machine code until it encounters them for the first time (hence "just in time"), and then it caches the results for reuse later. Doing anything else would be exceedingly stupid (you've already done the work to compile the bytecodes... why on earth wouldn't you reuse the results??).

  88. Most Of Us by The+Patient · · Score: 0
    Quoth Whatsisname:
    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls.

    Damn skippy most of us know that. The receptionist, the janitor and my Honda mechanic all pointed that out today. =)

  89. heh... by rebelcool · · Score: 1

    I do not understand Intel nor other companies that do not try to develop anything besides x86

    There are lots of architectures in use besides x86 from many different companies...macs don't use x86, nor do most embedded devices or really anything besides intel (and their clones) PCs. On the other hand, there are millions of PCs out there in a variety of applications and not just as desktops.

    instead of coming up with something new, Intel tries to patch its products with more crap.

    Backwards compatability is why intel has stayed in business. This is why x86 is still around. The sheer number of programs written for x86 would cost hundreds of billions to change architectures. It isn't just a matter of recompiling - many programs, especially older ones, rely on certain features and issues that are found only in x86.

    When Apple realized that MOS Technolgies' clone of 6800 was not the best solution, the architecture was replaced with a new one that better suited Apple's goals.

    Apple has nowhere near the market share as intel does and their hardware has been tailored to a different crowd - mainly the home and graphic arts niche. x86 is everywhere - not just under your desk. There are billions of processes relying on it. To dump it outright would wreak havoc.

    --

    -

  90. Consider the Pentium M by brucmack · · Score: 1

    The Pentium M is a processor that runs at a slower clock speed than Pentium 4s with rougly equivalent performance. So Intel at least changed tactics in the mobile field.

    I think the bottom line is performance... if they can keep reasonably close while continuing on the current path, they won't change. If some future chip ends up being horribly slow compared to others, they'll have to switch.

    Now, there is another aspect that could make all this irrelevant... it is rumoured that the successor to Prescott might be multi-cored, i.e. multiple processors on one die. That would certainly give them a way to market a lower clock speed, if they have two or more CPUs in their chip.

  91. More details on Intel's processor by rice_burners_suck · · Score: 5, Funny
    Intel today announced its new 1024-hexabit microprocessor architecture technology. Named the Quantium, Intel's new processor core boasts powerful new technologies which will enable governments to better manage the rights (or lack thereof) of their subjects.

    The Quantium has the following new features:

    • Intel (r) LightSpeed (tm) technology breaks the processing pipeline into 299,792,458 discreet steps. As there is no internal clock within the processor, all operations occur at the speed of light. Hence, one "cycle" represents the absolute cosmic measure unit of time and all operations occur in one cycle. While this will not increase the processor's performance--indeed, it will pale in comparison to that of the ancient 80286 processor of old folklore--the faster internal clock speed is expected to increase Intel's sales by 0.000001% within 180 quarters.
    • Intel (r) SingleAtom (tm) technology squeezes the entire processor into a single atom by modifying the universe at the M-theory level. Individual strings compose modified quarks and other subatomic structures, which combine to form a very heavy atom, one with approximately the same weight as 1 million protons. As the matter is extremely dense, the radioactive decay, combined with the gravity generated by itself causes the configuration of the subatomic particles to remain bonded at the subatomic level while realigning a nearly infinite number of times every second. This realignment constitutes the execution of instructions within the SingleAtom (tm) processor.
    • 893,378,665,113 new operations have been added since the previous model, bringing the new total to over 18 googleplexes of instructions. All SCO intellectual property can be programmed in a single instruction, increasing SCO revenues. Corporations will have to pay $799 per processor instruction executed, or face serious legal action.
    • RAM has been depreciated. 4 billion exabytes of internal general-use registers allow software to make more efficient data access, providing a more compelling Internet experience over a 28k modem connection.
  92. Indeed by Anonymous Coward · · Score: 0

    Those awful common people don't share your interests, so they must be morons. I mean, obviously there is nothing more to human life than computer processors and astronomy.

  93. Lab to market lag time: 4 years by Anonymous Coward · · Score: 2, Interesting

    I knew they were up to something when this mail appeared on the linux-kernel mailing list in 2000. 4.3 GHz, indeed!

  94. Article Text by Anonymous Coward · · Score: 0

    The site seems semi-slashdotted. Half the time you'll get a "too many users" error.

    By the way - women's breasts kick ass!

    I decided to test Arctic Silver 5, Arctic Silver 3, OCZ Ultra II Premium Silver Compound, and CompUSA Silver Thermal Grease. This test was not conducted to test performance, but rather to determine if these compounds have Silver as an ingredient.

    All Testing was done twice, once on a jeweler's acid free 'Black stone', and the test was repeated on paper. The testing solution was Nitric acid and Muriatic acid that was pre-mixed professionally.

    The tests produced some very disturbing results:

    OCZ Ultra II Premium Silver compound and the CompUSA Silver Thermal Grease has ZERO silver in it.

    The testing solution stayed orange - if it had any silver in it, the acids would turn varying degrees of red, depending on the purity of the silver present. OCZ claims that OCZ Ultra II Premium Silver compound is, "Made with 99.9% pure micronized silver, Over 70% silver content by weight".

    I cannot concur and my tests conclusively show that there is Zero micronized silver present, and Zero silver content by weight.

    Arctic Silver 3 and Arctic Silver 5 were also tested and both produced a blood red color, indicating 90% - 100% purity of Silver in both Arctic Silver 3 and Arctic Silver 5. Arctic Silver's claim of, "Contains 99.9% pure silver" by my testing is accurate and of the compounds tested, only Arctic Silver products produced results showing that Silver is in fact present.

    The tubes in the picture below from left to right, Arctic Silver 5, Arctic Silver 3, OCZ Ultra II Premium Silver Compound and CompUSA Silver Thermal Grease.

    In picture 3 below, from left to right is Arctic Silver 5, Arctic Silver 3 and OCZ Ultra II Premium Silver Compound. The compounds were placed on the paper and the acid was place on the compound undisturbed. Notice how the acid drop placed on the OCZ Ultra II Premium Silver Compound remains orange, indicating zero silver present:

    When you go into a jewelry store and buy a sterling silver or a fine silver necklace, you expect the jewelry to be made of sterling or fine silver. The same should apply to silver thermal pastes - if the silver paste has no silver in it and the manufacturer says it does, that is misleading.

    Based on my testing, I can not recommend OCZ Ultra II Premium Silver Compound or CompUSA Silver Thermal Grease, as they are both misleading products with zero silver in them. If you want a product that actually has silver as an ingredient, Arctic Silver 3, Arctic Silver 5 or Arctic Silver Adhesive tested OK.

    Ed Note: Silversinksam's conclusions have been verified by an independent testing laboratory - details will follow in Part 2 of this article.

  95. And eventually... by localman · · Score: 1

    I look forward to the Pentium X that will have an infinite pipeline, infinite clockspeed, and get nothing done at each stage!

  96. More misinformation -- for "MHz Myth" fans by 0x0d0a · · Score: 3, Informative

    You have to remember that a garden variety PC is a very unpredictable environment. You have network packets coming in, mouse events, keyboard presses, USB chatter, DMA access, every event generates and interrupt that requires the processor to stop what it's doing, and start the pipeline over again.

    It's nothing personal, but articles like this one, as well as posts like this, drive me absolutely batty with the amount of incorrect ideas propagated. It's not that one particular person is misinformed -- it's just that the amount of generally bogus information is silly.

    First off, at some point, as far as I can tell, a bunch of people read Maximum PC or somesuch consumer "PC enthusiast" magazines, and read about "The Megahertz Myth". Maybe Ars Technica ran the story that started all this. Heck if I know. All that the original author was trying to do was point out that people shouldn't judge processors strictly by clock speed.

    Boy, did they ever create a monster. Somehow, a bunch of folks managed to get the idea that Intel was pulling this as some sort of PR job to deliberately trick people into buying their processors. For Chrissake, this is such an incredibly stupid idea. The OEMs have purchasers that know what they're buying. Not only are they not going to just sit down and look at benchmarks, they're going to have a bunch of test machines built when deciding what to go with. That and business considerations outweight any "MHz rating". The OEM market just plain doesn't care. The only people getting excited about the "MHz Myth" are the "PC enthusiasts", a tiny, tiny sliver of a group when it comes to dollar value. If the sort of "PC enthusiast
    riffraff really think that they constitute any kind of a significant market to Intel -- enough for Intel to *redesign their entire processor*, using a longer pipeline and higher clock rate, around getting them to purchase a computer, they are vastly overestimating their own importance in the universe.

    When Intel makes the decision about a new processor, it's a pretty safe bet that they don't run out and say "Gee, how would Joe Assmunch in Marketing like us to structure this thing?" They have many, many PhDs in chip and circuit design who have many competing ideas about what the best designs would be. They run many, many simulations before even thinking about deciding on major design decisions.

    The "PC enthusiast" folks who think that Intel has taken this path to trick those people that buy from Dell, and that, ho ho ho, *they* are smart enough to see through the trick are ridiculous. If Intel wanted a high clock rate to put on stickers, they could jack the thing through the sky, run at 10GHz, then demux data and only accept data at a lower rate into the various units. Some of the units would move to even more instructions per cycle.

    The *current* poster is talking about *keyboard* and *mouse* events? "USB chatter"? Those don't even show up on the *radar*. You roll that mouse, send your 200 Hz interrupts, and you worry about 200 measly mispredictions per second? Just blowing away the page table cache during process switches (which runs at 100 Hz on Linux 2.4 x86 by default) already dwarfs any misprediction performance hit from the said devices, and folks frequently bump it up by an order of magnitude or so and don't see any measurable performance hit -- on Pentium IIs.

    As for DMA, the entire point of DMA is so that the processor *isn't* running code from the host. It can continue on in its own happy little world while a co-processor pokes at the memory bus.

    You might see significant branch misprediction issues with an inner loop with a branch statement that flicks back and forth just about every loop or so to screw over the branch caching. And "significant" is still pretty minor. The compilers hint to the CPU whether a branch is likely to be taken...it's not as if there's this massive, awful mistake that all the chip designers in the world are making that Joe I-Built-My-Own-Computer-

    1. Re:More misinformation -- for "MHz Myth" fans by 0x0d0a · · Score: 2, Informative

      Errata for the above -- "Some of the units would move to even more instructions per cycle." should be "Some of the units would move to even more cycles per instruction."

    2. Re:More misinformation -- for "MHz Myth" fans by Kjella · · Score: 1

      Somehow, a bunch of folks managed to get the idea that Intel was pulling this as some sort of PR job to deliberately trick people into buying their processors. For Chrissake, this is such an incredibly stupid idea. The OEMs have purchasers that know what they're buying. Not only are they not going to just sit down and look at benchmarks, they're going to have a bunch of test machines built when deciding what to go with. That and business considerations outweight any "MHz rating". The OEM market just plain doesn't care.

      Actually, the OEM market is all about what the end user wants. If they want an X GHz CPU, but without the RAM to use it efficiently so it'll have to swap, they will have it (Seen that). If they want USB 2.0 because it's fast, they'll rebrand USB 1.1 as 2.0. If they'll purchase a GFX card because 256mb is more than 128mb, or a CPU because 64 bit is wider than 32 bit the OEMs will supply.

      The purchasers are looking to buy stuff that sells - not what should have sold. Because buyers aren't looking for advice, they're looking for price. Very few will actually notice the difference - only that it's much faster than their last system. That they didn't get the most bang for the buck is irrelevant, the customer is happy because he found a "good deal". That's what gets them sold.

      Kjella

      --
      Live today, because you never know what tomorrow brings
    3. Re:More misinformation -- for "MHz Myth" fans by fbg111 · · Score: 1

      Well, there are different kinds of end-users. There are corporations whose buying decisions are guided by informed, educated Sys/Net Admins who don't fall for the "bigger numbers are better" marketing schtick, and there are also companies whose aquisitions departments do fall for that. There are small businesses run by tech savy folks, and small businesses that aren't and can't afford a dedicated, knowledgeable sysadmin. There are also individuals and home users, some of whom buy the 256mb GPU instead of the 128mb without knowing they won't use that extra 128mb, and others who don't. What 0x0d0a was referring to (I think), are the informed sysadmins at large corporations who buy most Dell's corporate product and know their stuff well enough to actually get the best deal for their money and requirements.

      --
      Flying is easy, just throw yourself at the ground and miss. -Douglas Adams
    4. Re:More misinformation -- for "MHz Myth" fans by gbrayut · · Score: 1

      PREACH IT BROTHA!!!

    5. Re:More misinformation -- for "MHz Myth" fans by EvilTwinSkippy · · Score: 1
      For the record chip design WAS an area of my expertice, and I have had PhD's confirm my suspicions that Intel's new stuff was bogus. Indeed, I've done the simulations myself.

      Come on, this is the same company that is pitching 'centrino' as a new platform. It's nothing more than integrating 802.11 wireless into the chipset.

      This is the same company that produced the 386SX. It was a 16bit chip with a 32 bit bus. They later got it right with the 386 DX, and then continued to sell the crippled chip. They had such a success with the SX/DX thing they tried it with the 486. The 486SX was a DX chip that they etched out the FPU unit out of. They you would later buy the '487' co-processor which was simply a 486 chip that deactivated the original chip and took over.

      Oh yea, and let's not forget the whole Celeron cache size debacle, nor the fact the PII and PIII are essentally the same. The Xeon is a PIII with extra cache. Ooooooo.

      The only reason Intel needs a predictor is because the chip moves so much faster than the RAM. The machine spends more time guessing what it should do than actually doing it. Why? To dance around the fact the most people would never see the difference in performance between an 800Mhz machine and 4Ghz machine.

      It is all about seperating you from your almighty dollar, friend.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
  97. Vectorized languages... careful with your cache! by Dr.+Zowie · · Score: 1
    Are you sure you're testing the chip and not the memory bus? Matlab, IDL, PDL, Numeric Python, and (for all I know) APL all have the same general problem that you try to `vectorize' your instructions to minimize your use of the slow interpreter -- but vectorized order is usually the wrong way to use your machine.

    If you have a bunch of steps to do on each of a million pixels, your best bet is to do them all on the current pixel, then advance to the next one -- which keeps everything in cache. But the vectorized languages tend to do the first step to all million pixels, then do the second step to all million pixels, etc. That swaps everything out to RAM every time, so you're running at the main bus fetch/write rate, not at the CPU's clock speed.

  98. Why your post is BS by 0x0d0a · · Score: 1

    I'm kind of tired of you armchair OS coders. So the happy few, highly paid Microsoft employees, 20 years experience in copying IBM, thousands of stock options in Redmond decide the next gen OS will have some wack FS and they have to be called morons? How do you know better? Hasn't Microsoft produced the best selling OS on the market for 15 years? Why don't YOU have the job leading the Longhorn team?

    Oh. Yeah... LINUX.


    There's a rather crucial difference here. You are very much not Linus. As a matter of fact, I will happily bet that you have never submitted a patch to Linux. As have most of the people on Slashdot vicariously living through Linus's triumpths.

    Linus might, in fact, be a not unreasonable critic, to some degree, of the pipeline length being discussed in the article. He, unlike most Slashdotters, is actually familiar with (a) high level system architecture, (b) the x86 instruction set that's being used here, and (c) probably, by virtue of enough low level work, (plus, he may have potentially done work in the field) at least something about the code gcc and other compilers are spitting out.

    Here's some metric of how far most Slashdotters are from being qualified to comment on this: I'm probably reasonably knowledgeable on the issues involved relative to the bulk of Slashdotters, using only the other posts as a watermark for what other Slashdotters know.

    One assignment I had, back at Carnegie Mellon as an undergrad in a CS class, was to design a very simple processor. The assignment would have made a CE laugh out loud -- we didn't have to worry about gate delay or initial states or inductance or anything that's an issue in the real world. We got to ignore all timing problems. We could split signals as much as we wanted. It was strictly a logical processor. That processor is probably more than most Slashdotters have built.

    I have a friend in grad school who is a CE. He laughed his ass off when he heard about the assignment. He, however, is a grad student. He hasn't published for years, he doesn't have much experience, and he has a *lot* to learn. He designs chips on a pretty regular basis.

    If *he* came along and said "those people at Intel are idiots", *he'd* be laughed down by a PhD in the field. Why? Because he simply hasn't done research in branch prediction or any of the related issues, and isn't remotely qualified. Of course, *he* wouldn't be out trying to call out the Intel engineers because he's aware of how competent they are.

    As a matter of fact, anyone who hasn't either gotten into serious compiler design research (not just "I wrote a compiler once for a class") or whatever areas are relevant to the CE chip work probably isn't qualified to criticize the Intel engineers on design decisions. You aren't seeing a lot of those in this article. Why? Because they have a fair amount of respect for the Intel folks and know enough to avoid making damn fools out of themselves.

    Folks who are end users -- software engineers, system builders, etc -- are really qualified to judge the processor on price and performance as a black box, and not much else. I include myself in there, and I've read a number of research papers on the thing in the past. And, frankly, the Intel folks have done a pretty good job if you measure the product as a black box from a performance standpoint. If you want to complain, complain about the price. Don't try to say "Well, that P4 sure is good, but what the Intel engineers really messed up on is that HyperThreading stuff. Boy, if they only understood PCI, then they never would have done anything like it." I see way too many completely and utterly uninformed posts, and it propagates.

    1. Re:Why your post is BS by stevesliva · · Score: 1
      As a matter of fact, I will happily bet that you have never submitted a patch to Linux.
      Nope, but I have implemented a pipeline in a 90nm process technology. Not on a processor at 10Ghz or whatever Intel's aiming for, though. I can truly tell you that that frequency must be horrifyingly difficult to acheive.

      The irony here is that I work in VLSI design. My post was obviously sarcastic, but I'm seriously pointing out the fact that people outside the industry behemoths can and do participate in cutting edge research and design. Yes, usually it's folks who are working day-to-day on this stuff that know best exactly how many pipeline stages an Intel processor needs, but trying to claim that no one outside Intel can question such a decision is akin to saying no one outside Microsoft can question OS design choices made in Redmond. We both know that BS, whether I've ever looked at Linux source or not.

      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
  99. WTF are you talking about? by Hoser+McMoose · · Score: 1

    Ok, someone please mod the parent -1 clueless!

    First off, the P4 has about the best branch predictor in the business. The only processor that had a better branch predictor was AMD's K6 (and the NexGen chip from which the design originated).

    As for the "slowness" of the P4 vs. the PIII, in many applications the P4 is actually FASTER, clock for clock, than the P3. This was even true back in the days of the "Willamette" P4, and especially true for the "Northwood" P4. Plus the P4 core clocked nearly twice as faster on an identical fab process (2.0GHz vs. 1.13GHz, the top speeds produced for both cores at a 180nm node), so who cares about clock for clock! Dollar for dollar the P4 was WAY faster.

    And the broad is better than deep because the CPU is doing many things at once? First off, the CPU can only handle ONE task at any given time unless you support SMT. Intel supports SMT ("hyperthreading in Intel-speak), AMD does not at this time. But more to the point, both broad and deep give you certain advantages and disadvantages with regards to keeping your execution units full, but both requirements end up being pretty similar in the end, ie you need instructions that do not depend on the results of previous instructions.

    Now in your third paragraph things start getting really sketchy. First off, AMD's highest market share numbers ever were back in the late 386/early 486 days when they hit 30%. With the K6-2 AMD managed up to about 20% market share. Now they're sitting at about 15-17%, depending on who you ask, and they've been there for a while. The Q4 reports from Intel and AMD tend to suggest that Intel actually gained market share from AMD over the past 3 months, that AMD's average selling price went up a lot more (so theit total revenue is increased more than Intel's).

    Of course, the part about AMD's 24-bit FPU must be when the crack really kicking in for the original poster. The K6, like EVERY OTHER FRIGGING X87 FPU EVER PRODUCED, had an 80-bit floating point unit! The K6-III did not make any changes to the FPU (though the K6-2 made a very minor change to how it handled the FXCH instruction, which mainly helped performance in Quake). These chips also did not have any serious compatibility problems, though there were the standard few errata that you get with any modern x86 processor (whether made by Intel, AMD, VIA or anyone else). I can only remember one errata in the K6 line of chips that ever really caused problems, and that was fixed pretty early on with a new stepping to the original K6. There were also timing loop bugs that caused some problems, but that was the result of dumb-ass software, the hardware performed exactly as expected.

    1. Re:WTF are you talking about? by Kazymyr · · Score: 1

      As for the "slowness" of the P4 vs. the PIII, in many applications the P4 is actually FASTER, clock for clock, than the P3. This was even true back in the days of the "Willamette" P4, and especially true for the "Northwood" P4. Plus the P4 core clocked nearly twice as faster on an identical fab process (2.0GHz vs. 1.13GHz, the top speeds produced for both cores at a 180nm node), so who cares about clock for clock! Dollar for dollar the P4 was WAY faster.

      As it happens, I have 2 machines sitting here right now: a P3-1.4 and a P4 "Willamette" 1.8

      Care to guess which is faster? Right. The P3 beats the P4 by about 25% both in benchmarks and real life applications.

      Not to mention that the P4 cost in its day way more than the P3 did. Dollar for dollar, the P3 was a way better bargain than the P4 was.

      --
      I hadn't known there were so many idiots in the world until I started using the Internet -Stanislaw Lem
  100. Re:So What? - Speaking of Papers... by Anonymous Coward · · Score: 0

    (Sorry about not having a link. I'm not at a location with IEEE Xplore access.)

    No problem. Neither are we...

  101. hmmm ... power by Anonymous Coward · · Score: 0

    So once again Intel goes for the petrol guzzler rather than efficient design ? It makes me sad to see that current P4 3.2 GHz can consume 100W vs. 12W for a G4 PowerPC 1 GHz ? I'd rather have a couple of G4 at 12W each and save wasting energy. In fact I've seen a few 1GHz chips around 7-10W, why not use 2 or 4 of them instead of one behemoth ? Efficiency and simplicity in design are surely better than wasted energy for the sake of it.

  102. Geek Idealogy by Anonymous Coward · · Score: 0

    "Further contributing to the MHz Myth..."

    Nothing like a load of bias to fill out an introductory clause.

  103. Branch Prediction by RAMMS+EIN · · Score: 1

    Wasn't Prescott also the CPU for which branch prediction was going to be moved into the compiler (i.e. the compiler has to generate code that tells the CPU which way the branch will probably go)?

    I reckon a compiler could do a much better job predicting branches (it can do much more analysis then would be worth doing at run time).

    If all this is true, longer pipelines may have a less severe effect than they have had up to now.

    IANAE(xpert), though.

    --
    Please correct me if I got my facts wrong.
  104. Long pipeline is not only for Clockspeed... by JollyFinn · · Score: 1

    Firstly I wan't to debunk some myths on branch missprediction stalls. Firstly from pentium pro all the stalls in the pipeline combined was more than double the CPI of the computer. And thats result of OO execution, when you have large OO window the any single type of stall means less to you. Your execution engine probably has some instructions BEFORE the branch that where stalled before the branch was executed, so the stall is not mathematically equal. 2nd point is that larger OoO window means a LOT more work to deal with, so P4 requires more work to do with large OoO window, and that large OoO window is BIG part of long pipeline, but the benefits of large OoO are more related to memory latency, it gives memory subsystem more known addresses to handle in the memory pipeline which means better scalability, when increasing core execution capabilities, [either width or dept=clock it doesn't matter].
    Now P4 shines in better dealing with memory latencies, and thats more important than just a branch missprediction. Also Prescot has superior branch predictor so reducing misspredicted branches negates the missprediction penalty. L1&L2 cache are doubled which means that integer performance of prescott is increased by reducing the critical code paths from going to L2 cache. And increased L2 cache reduces the amount of time spend on waiting the memory, so I'd say prescott should do just fine in IPC side of things also.
    In overall P4's systems aren't as good as integrated memory controller, but it just helps to tolerate the latency a lot better than the PIII or athlon design.And in overall when most of time is spend on waiting memory references the narrow and deep approach seems slightly better. [Narrow reduces the costs of increased reordering capabilities, don't put G5 reference here they patented clever trick and are risc for christ sake. So they are wide and deep for improved reordering.]
    .
    The thing I'd wan't from AMD would be more or less, one additional pipeline stages for better reordering capabilities.

    --
    Emacs is good operating system, but it has one flaw: Its text editor could be better.
  105. If they keep this up by confused+one · · Score: 1

    I'm gonna convert to Power PC machines. RISC. Mmmmmm RISC is good...

  106. DDR came before XP. by SpinyManiac · · Score: 1

    However, it should be noted that during the time the Athlon XPs were introduced, so was DDR memory, which helped with the bandwidth issues.

    Explain my Athlon 1.33GHz with DDR.

    The second generation Athlon (Thunderbird) had DDR support, at 200 or 266MHz.
    AthlonXP is the third generation Athlon, and IIRC the names are based on Thunderbird speeds.

    I may be wrong about the generations, but I know my PC had DDR266.

    --
    It's never too late to have a happy childhood.
    1. Re:DDR came before XP. by Mauvious · · Score: 1

      Thats definitely true, when I bought my Athlon 1.33GHz there were motherboards out with DDR. I saved money by recycling my old PC133 SDRAM. What I should say is it was later on in the life of the original Athlons that they were introduced. There was no DDR motherboards available when I got my Athlon 700. Original Athlon + DDR with same cache is marginally below the Athlon XP in performance with same clock speed, cache and DDR.

  107. And this is why I switched to AMD long ago. by Bruha · · Score: 1

    Not only is Price vs Performance very good with AMD AFAIK AMD is not in the habbit of putting out new processors that are crippled vs their predecessors. And while Intel is focused on keeping 32bit processors on their roadmaps AMD is being agressive with moves to 64Bit computing even long term plans show them making less 32 bit processors in the next few years.

    And what's up with Intel adopting the format that Alpha used with their processors?? It was always a royal PITA and I've never had any issues with the current system or are there that many id10t's out there that are just not careful with those pins?

  108. Re:Dilbert Marketing ... started??? by adzoox · · Score: 1
    AMD and Apple/IBM have started to pull away in quality - in terms of actual work done per clock cycle. While it's true that the average Joe or PHB might not know any better-..."

    Started to pull away in quality??? I would say that Apple has always been ahead in quality (sure there have been quality issues - but most Apple buyers are assholes and attorneys)

    Apple has always been better (easier, faster) at individual tasks such as audio editing, video editing, and overall ease of use. They also FAR surpass ANY PC in terms in longevity ... I know of a finger's count few PCs that still work from 1984, that in use, or for that matter, still useable. I see Mac SE's from 1985 almost every day - still working after 19 years!

    The megahertz myth is honestly less about performance as whole - it is performance perception.

    Case in point - two computers are taken into a retirement home. (An iMac and similarly priced/styled PC) They are told where the internet connection is and told to get onto the internet - the average Mac is up in 17 minutes out of the box - the average PC is never able to get online or goes well beyond an hour. Conclusion - ease of productivity - is perceived speed.

    --
    Yell & scream & rant & rave... it's no use... you need a shaaaave ~ Bugs Bunny
  109. Re:Do you? by gr8_phk · · Score: 2, Insightful
    Thanks for the techno-babble. This guys company obviously looked at real world performance. Their understanding of the cause may or may not be correct, but their conclusion (switch to AMD) is correct for them because they compared using the application that matters to them.

  110. Nice plug how about.... by gr8_phk · · Score: 2, Informative
    That's a nice plug for Matlab. Since plugs are not being modded off-topic today :-) Let me say that I know several people who use GNU Octave instead of Matlab. It does most the same things, and its free software. Some just for home use, and some working at small companies that couldn't afford Matlab. You can write code that works on both, so one guy uses Matlab at work and can run the same stuff on Octave at home.

    1. Re:Nice plug how about.... by Anonymous Coward · · Score: 0

      As far as I could tell, Octave doesn't support arbitrary precision numbers. Has this changed recently?

    2. Re:Nice plug how about.... by Anonymous Coward · · Score: 0

      It wasn't a plug, but the parent I replied to was putting down matlab and has he clearly stated later on he does not use it. I have used it extensively in the past 5 years, and I didn't want to let the bash stand without a counter-point.

  111. avoid loops in matlab by IncohereD · · Score: 1

    But here's a tip for you: avoid loops if possible.

    Amen. I've actually had this emphasized to me in my University statistics class, digital comm classes, and at my internships. Vectorize everything and it's smoking fast. Loop and you're done for.

  112. Of course Prescott is big and overblown! by Zog+The+Undeniable · · Score: 1
    --
    When I am king, you will be first against the wall.
  113. C'mon editors -- THAN by Anonymous Coward · · Score: 0

    "...will have a longer pipeline than Northwood. "

    I know high school was a long time ago, but seriously.

  114. Blah blah blah. by SuiteSisterMary · · Score: 1

    If Intel had cancelled the Pentium Pro based on people whining about how it was different from the Pentium, and didn't stack up, where would the P3 be now?

    --
    Vintage computer games and RPG books available. Email me if you're interested.
  115. Re:Do you? by blair1q · · Score: 1

    Thanks for the digression. This guy's story is obviously anecdotal evidence, and therefore a fallacy.

  116. yawn by Anonymous Coward · · Score: 0

    Yawn. Interpertation of blerb: blah blah blah *game geeks* blah blah blah blah *money money* blah blah. Blah blah blah bah *we can't think of anything new* blah blah blah blah blah blah blah blah blah blah.

  117. 4004 by Anonymous Coward · · Score: 0

    This is all 4004 shit. Who the hell cares?

    PPC. Say it: PPC.

  118. Well, by vlad_petric · · Score: 1
    The Xeon server processors do support an external L3 cache, but the memory still isn't any faster :). It does take no less than 400 cycles to go to memory to service an L2/L3 miss on such a processor.

    The Opterons do slightly better, as the memory controller is on-chip.

    --

    The Raven

    1. Re:Well, by Luminous+Coward · · Score: 1
      I was being facetious...

      You claimed it took a couple of hundred cycles to service an L2 miss. If a third level of cache is available, then it takes less than a hundred cycles to service an L2 miss. (I think the Xeon's L3 cache has a latency of 23 cycles. Is that correct?)

  119. MOD PARENT UP by WD · · Score: 1

    For once, the correct information is listed. (Rather than just making stuff up, and stating it as fact)

  120. Branch Prediction by tweakt · · Score: 1
    IANAIE (I am not an Intel Engineer), but...

    Um, isn't that handled by branch prediction? The processor knows it has just branched the same way for the last 5,000 iterations then it will correctly predict the branch target each time. The only time it misses in your example would be at the end of the last iteration.

  121. Matlab JIT by Anonymous Coward · · Score: 0

    Matlab now uses the Java JIT to speed up loops.
    I've had some loop-heavy code speed up by a factor of 10. There is also some speedup when m-files are compiled to C and native code in some cases as well. The interpreter is not real fast but it's fast enough, esp. if you do things the 'Matlab way' and heavily vectorize your code. The Mathworks plans to make Matlab ultimately as fast as C or Fortran.

    Matlab works (and in fact I've done my entire physics PhD thesis with it, including instrument control) because it's easy enough to learn, has good graphics/plotting/curvefitting capability, the matrix model maps well onto typical scientific computing needs, and because more sophisticated languages are not necessary for 95% of data analysis and numerical computation. The toolboxes are quite useful as well and it's 'fast enough'. The friendly GUI and help system are a big bonus too, and I work very, very productively under Matlab.

    NumPy, GSL, and Python are great but require a bigger learning curve and in most physics depts aren't widely used.

  122. GREAT POST!!! by gbrayut · · Score: 1

    thanks

  123. Is it possible? 2x15 pipes? by lpq · · Score: 1

    If the Athlon 15 stage beat the p4-20, is it possible for Intel two do two competing pipes that could do both branchs of a True/False if?

    In single decisions, wouldn't that be likely to improve performance dramatically?

    Dunno if it is possible, but just a thought....

    -l