Slashdot Mirror


Can SSE-2 Save the Pentium 4?

Siloh writes "Ace's hardware has posted a Floating-Point Compiler Performance Analysis which, in a nutshell, tests Intel's most important claim about the Pentium 4. "It does not reach its full potential with today's software, but with future software (including SSE-2 optimizations) it will outclass the competition". They test with Floating point benchmarks which have been recompiled on the latest Intel and MS compilers." Basically, another iteration of the question: Can the P4 dethrone the Athlon?

171 comments

  1. Re:Anyone know when M$ VC++ will support SSE2 nati by Anonymous Coward · · Score: 1

    This reminds of some comments made by the Great Carmack. Games written with SSE or 3dNow optimizations didn't really benefit from the extra code. What made games go faster is having those optimizations built into the video card's drivers.

  2. Tired... by Anonymous Coward · · Score: 1

    * Of people making line graph instead of bar graph
    * Of people that don't understand most of what they write, so they put all the data, instead of focusing on the important one
    * Of over-verbose hardware sites that make you scan 5 pages before getting on the (rotten) beef
    * Of clueless people that pretend being surprised when optimizing compilers on very specific code can get a 240% speedup.

    Btw, this sort of shit reminds me of someone:

    "Paul Hsieh, our local assembler guru, analyzed the assembler output of the SSE-2 optimized version of Flops. He pointed out that "some of the loops are not fully vectorized, only the lower half of the XMM octaword is being used." In other words, SSE-2 instructions which normally operate on two double precision floating point numbers are replacing the "normal" x87 instructions and are only working on one floating point number at time."

    Anyone think that "Paul Hsieh" == "Bob Ababooey" ?

    Cheers,

    --fred

    1. Re:Tired... by Galactic+Avenger · · Score: 1

      > Btw, this sort of shit reminds me of someone:
      >
      > "Paul Hsieh, our local assembler guru, analyzed
      > the assembler output of the SSE-2 optimized
      > version of Flops. He pointed out that "some of
      > the loops are not fully vectorized, [...]
      >
      > Anyone think that "Paul Hsieh" == "Bob
      > Ababooey" ?

      Well certainly not me! Who or what is "Bob Abobooey"?

      --
      Paul Hsieh
      http://www.pobox.com/~qed/

  3. Re:Morons by Anonymous Coward · · Score: 1

    It makes no sense to look at flops/GHz. The P4 design, via its longer pipelines, *intentionally* sacrifices flops/GHz so that the chip can run at a higher clock rate.

    The only sensible metric is performace at the available clock speed, which for P4 is higher than for Athlon.

    If I had a CPU that achieved 5000 flops/GHz but only ran at 1 MHz, would you want it, or would you want the 1.5 GHz P4?

  4. The problem with Intel by Anonymous Coward · · Score: 5
    SSE-2 will be nice, but the problem with Intel is that they have fallen behind AMD in the CPU wars. Their stock price is only one of many indicators that they have made several bad business decisions in the past few years, and those decisions continue to haunt them and give AMD a leg up on the market. Consider:
    • The RAMBUS mess. They tried to leverage their chip/chipset monopoly to control the RAM market through large investments and contracts with RAMBUS. Now RAMBUS is on the brink of death and Intel has lost.
    • The IA-64 disaster. It's hard to launch a new architecture, and even harder when you keep prices high and don't put enough chips in the hands of developers.
    • The uniprocessor-only P4. Intel spent years perfecting SMP on their earlier processors, and for what? So that AMD could beat them to the punch, running a 1.4Ghz CPU in SMP mode. Intel also embraced the slower-but-cheaper shared memory bus architecture, which is going to kill SMP performance in comparison.
    • Unwise investments. Intel has invested in several dot-coms that are dying or dead already. Intel Capital hasn't been profitable since FY 1999 because they have sunk billions into companies like VA that could never hope to turn a profit.
    Intel still has potential but they will need to get their act together if they want to start competing with AMD again.

    -A former Intel employee

    1. Re:The problem with Intel by VAXman · · Score: 3

      The uniprocessor-only P4. Intel spent years perfecting SMP on their earlier processors, and for what? So that AMD could beat them to the punch, running a 1.4Ghz CPU in SMP mode. Intel also embraced the slower-but-cheaper shared memory bus architecture, which is going to kill SMP performance in comparison.

      You are wrong. The DP capable P4 (known as Xeon) was launched in May, and was launched well before the DP Athlon was released. Moreover, you can buy real dual Xeon systems from Dell, IBM, Compaq, and the like, yet you cannot buy a DP Athlon system from any major vendor, since no major OEM's want it.

    2. Re:The problem with Intel by MrBogus · · Score: 1

      Intel has contracts which require "second source" for their technology. That's how AMD became a licenced producer of Intel tech to begin with, and agreement that continues even today.

      --

      When I hear the word 'innovation', I reach for my pistol.
    3. Re:The problem with Intel by nekid_singularity · · Score: 1

      Intel wants AMD around because it gives some credibility when they say they are not a monopoly. Say AMD went out of buisiness tomarrow, that means Intel would be the only source for high-performence x86 chips, and the government wouldn't like that.

      --
      Numbers 31:17,18 Now kill all the boys. And kill every woman who has slept with a man,but save for yourselves every virg
    4. Re:The problem with Intel by Shortcut+to+CmdrTaco · · Score: 1

      That's true but they are well on their way. When they come out with DDR-supporting chipsets they will be on the right track again. Look out AMD!

  5. Morons by Anonymous Coward · · Score: 5

    Look at the final results:

    bestover2.gif

    Now look at the place where the P4 shows the most improvement over the Athlon: the first data point, Flops 8, with the P4 using the Intel compiler and the Athlon using Microsoft's.

    From the graph, the Pentium 4 clocks in at about 1140 flops while the Athlon gets only 900 flops.

    But wait! We're forgetting something. You're running the Pentium 4 at a faster clock speed! For the love of crumbcake, normalize those values for clock speed, please!

    Pentium 4: 1140 flops / 1.5 GHz = 760 flops/GHz
    Athlon: 900 flops / 1.2 GHz = 750 flops/GHz

    Now things are a bit more fair. Yes, with the absolute latest compiler from the maker of the processor, the Pentium 4 beats the Athlon in one of eight tests by a measly ten flops per gigahertz. With the latest compiler from some big software company, the Athlon beats the Pentium 4 in the other seven categories, hands down.

    Don't believe everything you read.

    1. Re:Morons by cdipierr · · Score: 1

      True the Athlon is faster on a per clock basis, but it's a fair comparison to compare the fastest Athlon vs. the fastest P4 since they're both obtainable (although actually the fastest is 1.4 Athlon, 1.7 P4, so we're comparing 1 speed grade down or so).

    2. Re:Morons by JoeBuck · · Score: 3

      It is ignorant to argue that you should normalize for clock speed. The Pentium 4's deep pipelines are present precisely so that the chip can be run at a faster clock speed than otherwise.

      With the exact same technology, same fabs, you can't make the Athlon run at the same clock speed as the Pentium 4.

    3. Re:Morons by csbruce · · Score: 2

      the Pentium 4 clocks in at about 1140 flops

      Wow, 1140 flops. With some tight code, my VIC-20 would be competitive with this!

    4. Re:Morons by csbruce · · Score: 2

      It is ignorant to argue that you should normalize for clock speed.

      A better way to normalize would be bang/buck.

    5. Re:Morons by maraist · · Score: 2

      I disagree. This is definately the case with a 486 to Athlon comparison, but we're already taking into account the architectural differences (stages / pipe, etc). Part of the analysis is to monitor efficiency. This is especially true with the Pentium4 / Athlon debate since we can get 1.4GHZ Athlons.. The question is whether to purchase a 1.4GHZ Pentium 4 at significantly higher cost; to say nothing of the added cost of a 1.7GHZ setup.

      The difference is more dramatic between the P5-4 and P5-3, since you max out at about 1GHZ for the P5-3, and so I'd be inclined to believe you. The Athlon, however is not yet out of steam for its current. If it can best the P5-4 in 50% of the categories (including legacy apps.. e.g. modern ones), then the value of the P5-4 is limited, even if it can produce top-notch synthetic scores.

      The point is that it is not ignorant to normalize, so long as you look at the periferal factors. It's like having taking the average, but also taking the standard deviation. You do find useful information from such numbers.

      -Michael

      --
      -Michael
    6. Re:Morons by randombit · · Score: 1

      A better way to normalize would be bang/buck.

      Really. I read some review for DVD/MPEG-4 encoding (on Tom's, I think), which basically decided that all for all the Duron 800 was the best CPU if you wanted to do that stuff a lot. I liked that: it's not the fastest, but it was 70 or 80% of the speed of the fastest while being 1/N as much.

      I mean, if all you care about is having tons and tons of Flops, go buy an Alpha or R10000 (or IA-64 (<g>)). Performance isn't everything, at least not for me.

    7. Re:Morons by frantzen · · Score: 2

      Clock speed is only relevant to marketing droids and those stupid enough to believe them.

      The are processors (UltraSparc III) where the core pipeline is not clocked (called wave pipelining). There are caches that are double pumped; they do work on each edge of the clock instead of only latching on one edge.

      And an even clearer fact: different processors do different amounts of work per edge of the clock. If you want a _really_ high clock rate, put only one gate between each latch. That clock rate would be obscene. But half of the work done would be latching the values (assuming you could distribute the clock over so large an area).

      If you want to normalize anything, normalize over price. Unless you have stupid friends and compete over having the highest clock.

      Oh ya. Don't bother talking about FLOPS or MIPS. You'll just end up sounding stupid (and you need all the help you can get). Any benchmark not targetted to YOUR specific application is next to worthless.

      Heh, some processors don't even bother to dispatch NOPs. With a little hackert, they could ``execute'' as many NOPs per clock as the depth of their dependancy issue window.

    8. Re:Morons by jsse · · Score: 2

      Hey man, be fair, don't just take the graph in favour to your conclusion.

      How about this, this and this?

      Don't believe everything you read.

      Assumed you believe everything in aceshardware, do you believe the graphs above? :D
      &nbsp_
      /. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

  6. p4 ddr chipsets by vipw · · Score: 1

    the p4 ddr chipset (i845) doesn't perform anywhere near as well as their rambus ones. also the ddr mode in that chip won't be working for several months. word on the street is that via's upcoming p4 ddr chipset is a pretty good performer, but nothing much has been published on that, via doesn't have a clear license on the bus, and via chipsets are often buggy. so really, they won't be having ddr chipsets any time soon, and chances are that the chipsets will be terrible performers. amd has to watch out more for a drop in rdram prices more than anything else.

  7. Re:ehh? Thats dumb by Trepidity · · Score: 2

    Europeans don't build smaller-engine cars to be more energy efficient, they build them because many EU countries tax engines by the volume they take up. So they make smaller engines but with higher compression ratios, so they end up being about the same in efficiency.

  8. Re:The answer is by RelliK · · Score: 2
    and SMT can take full advantage of the smaller RAMBUS latencies

    huh? Rambus has much higher latencies than SDRAM. That is why P3 with PC133 SDRAM outperforms the same P3 with Rambus on most benchmars. Since you got this part wrong, I take it the rest of your post should be taken with a grain of salt as well.
    ___

    --
    ___
    If you think big enough, you'll never have to do it.
  9. Intel better be careful... by HiredMan · · Score: 2

    The only thing that separates the Itanium from the rest of the pack is it's FP performance. If the P4 gets better FP performance it'll show the results of the multi-year Merced project for the dog it really is.

    The 800Mhz Itanium has the same SpecInt performance as a 800Mhz PIII... if the 1.7Ghz P4 got only 20% faster SpecFPU performance it would match the Itanium in SpecFP performance to go with it's already 50% better SpecInt performance.

    Yeah I know the Itanium is only at 800Mhz but Intel needs to keep cranking out P4s to fend off the Anthlon - they can't afford NOT to release new chips even if the 2Ghz P4 shames their new "top-o-the-line" server chip.

    Sure the Merced has a better box around it and huge amounts of onboard cache, but given the same surroundings the P4 would make their VERY expensive "server" chip look pretty bad...

    =tkk

  10. Hmm. Maybe i'm missing something, but -- by washort · · Score: 4

    Why wouldn't Intel be doing stuff like putting SSE-2 optimisation code into gcc so that all us hacker-types would have a _reason_ to pick the P4 over the Athlon? I know they have their own compiler but to the best of my knowledge it's not free (or at least it's not in Debian... ;-)
    Just seems odd that they'd pass up the opportunity for something like that. *shrug*

    1. Re:Hmm. Maybe i'm missing something, but -- by csbruce · · Score: 2

      Intel may have good compilers, but they don't give 'em away

      Well, they should, and they should open-source them as well. Intel is primarily in the business of selling processors, not compilers, so getting their P4 performance optimizations into as many third-party compilers should be their top priority.

      Better general compiler support for the P4 would be an effective way to compensate for its hardware inferiority to the Athlon.

    2. Re:Hmm. Maybe i'm missing something, but -- by biohazard99 · · Score: 2

      A little Karma-whoring, swiped from intel's site
      Compatible with Microsoft* Visual C++* and Visual Studio*, the Intel® C++ Compiler is designed from the silicon up to let developers easily take advantage of the performance and features of the latest Intel® architecture, including the Pentium® 4 processor.
      Intel is committed to customer support. See www.intel.com/software/products/prodsupport.htm for further information on product support.

      Windows*NT*/98/2000 Full Product Electronic Delivery $399.00
      Windows*NT*/98/2000 Full Product CD Delivery $499.00
      Windows*NT*/98/2000 Upgrade Product Electronic Delivery $175.00
      Windows*NT*/98/2000 Upgrade Product CD Delivery $275.00
      Intel® Compilers for Linux* Field Test Intel® Compilers for Linux, field test versions, are available for download only. No CDROM versions are available. Not all of the GNU C language extensions, including the GNU inline assembly format, are currently supported and, due to this, one cannot build the Linux kernel with the beta release of the Intel compilers and the initial product release. The C language implementation is compatible with the GNU C compiler, gcc, and one can link C language objects files built with gcc to build applications. However, the C++ implementation uses a different object model than the GNU C++ compiler, g++, and due to this, C++ applications cannot use C++ object files compiled by g++. For further details, see the FAQs on the support site. Before using the compiler, we recommend you read Optimizing Applications with the Intel® C++ and Fortran Compilers for Linux to learn about the appropriate optimization switches for your application. You should have received the invitation letter that explains how to get started using the Intel compilers for Linux. All support issues, compiler updates, FAQ's and support information will only be available when you register for an account on the Intel Premier Support site. Please register for a support account at http://support.intel.com/support/go/linux/compiler s.htm. To begin the process of downloading...
      Click Here!

    3. Re:Hmm. Maybe i'm missing something, but -- by Grishnakh · · Score: 1

      Intel would never do any such thing, because they really don't care about Linux. I work here at Intel, and this is definitely not a pro-OSS environment at all. The concept of making their tools freely available in order to promote processor/hardware sales is completely foreign to this Microsoft-worshipping company and its employees. The same way Microsoft will never make their tools freely available in order to promote sales and market penetration of their OS and other software, Intel would not make their compilers freely available either. The difference that they're missing is that MS has enough market share and a large enough monopoly that they can get away with this, and make a lot of money on compiler sales. No one's going to buy the Intel compiler, certainly not OSS types, and not MS developers either (they already paid for VC++; why would they spend extra for another compiler? They just want to sell software; they don't care about making it run better on one particular processor which isn't even doing well in sales).

    4. Re:Hmm. Maybe i'm missing something, but -- by Chakat · · Score: 5

      Intel's working on a Linux compiler with all of the P4 goodness. Although it's in beta right now, you can bet your sweet butt your going to pay for it once the program gets out of beta. Intel may have good compilers, but they don't give 'em away

      --

      If god had intended you to be naked, you would have been born that way.

  11. Re:This appears to be the typical load of slashdot by SEE · · Score: 2

    Yes, odds are it will be an Intel, but AMD is looking like it's going to have 30% marketshare this year.

    And the difference in processors doesn't change what you can do with the computer (while things like changing OS does). The better analogy here is Dell beating Compaq which beat IBM.

    Even the suits listen when you say "This runs everything the Intel does, as well as the Intel does, for less" enough times
    Steven E. Ehrbar

  12. Re:.NET to the rescue by PD · · Score: 1

    Wow. This is completely unlike Java. Microsoft is really innovating here. Just think how fast interpreted code could run if you optimize the interpreter. I wonder why Sun hasn't thought of that? I'm going to send them an e-mail right now with my suggestion.

  13. Re:.NET to the rescue by PD · · Score: 1

    Ah, for fuck's sake. My article wasn't a troll. It was either sarchasm, or if your sarchasm detector was broken, I suppose it could pass for a flame.

    But a troll? Come on. (eyes roll)

  14. .NET to the rescue by samael · · Score: 5

    It occured to me a while back that .NET while affect this immensely.
    Consider, .NET compilers compile to an intermediate code level that isn't actually transformed into machine code until they are run for the first time on the target machine.
    This means that all you have to do to get the most out of your machine is make sure you have the .NET IL->machine code compiler for your specific CPU and all .NET code will be totally optimised for _your_ CPU.

    Of course, this also means that you don't need to recompile to work on any CPU that has the CLR available on it, which makes transferring to IA64 (or any other architecture) a lot easier.
    _____

    1. Re:.NET to the rescue by JohnZed · · Score: 2

      Actually, it's entirely unlike JVMs on the market today, because .NET does not include an interpreter. It always compiles the code natively before running. It's more like a TowerJ or JOVE Java environment.
      --JRZ

    2. Re:.NET to the rescue by mjprobst · · Score: 1
      There's another possible conspiracy attached to this . . .

      They can make _real_ sure that there are instructions on their Pentiums that behave slightly differently, and that the CLR doesn't generate valid or efficient code for AMD processors.

    3. Re:.NET to the rescue by neuneu2K · · Score: 1

      well, hotspot compiles only the most used parts of the code, in my experience it works perfectly well for most apps...

      In fact the real slowness of java came not from the application code but from two things:

      • the loading of Swing (horrible... Swing was loaded by the normal classloader and was bytecode-validated at each load !)
      • and the BAD implementation of AWT...

      Of course for small apps, the loading time is all-important and any interpretor overhead is BAD... but for long running apps (server side is a perfect case :-) the adventage of dynamic recompilation and "auto-profiling" is GOOD !

      I do not know the details of the .NET CLR but I hope (in reality I do not mind because i doubt that the platform will be ported to any non microsoft OS now that the breakup is void) it has an interpreter, otherwise, dynamic class loading will be much too slow. Self-modifying code would be impossible too (and Yes, I have made self modifing code in Java !)

  15. so much for PC's by um...+Lucas · · Score: 1

    Remember when software was labeled "requires IBM or 100% compatible PC"?

    Just in the main stream, how many variations are we now or soon facing?

    Pentium w/ MMX is the lowest common denominator...
    Intel's SSE instructions
    AMD's 3D-NOW!
    Aren't there separate instructions in the Athlon, like 3D-NOW2, or something?
    Now we're heading towards two different x86 64 bit implentations (yes, IA-64 isn't actually x86 anymore, but since they're bolting an x86 processor onto the silicon as well, it may as well be counted as one)...

    Either developers will continue as they've been doing, writing software for the lowest commmon denominator, which makes all of intel's and AMD's attempts to add features to their processors useless efforts, which ulitimately just cost us more money since they can't manufacture as many chips per wafer, or else we're going to start seeing "Windows/Pentium 4", "Windows/AMD", "Windows/64-bit AMD" and "Windows/Itanium" sections in compUSA and such....

    ANd before the oblicatory comment arrives, i'll state that no, i really would not like to compile my own software, which would be possible if everything ni the computing world was open source/GPLed/etc...

  16. 32-bit FP or 80-bit FP? High end guys need more by The+Optimizer · · Score: 2

    One thing I don't see mentioned here is what degree of precision that SSE-2 has. I'm guessing that it only works on 32-bit floats.

    The SSE instructions on the P-III operate on 32-bit float, while the x87 FPU instructions work on 80 bit floats ( You can load 32-bit, 64-bit and 80-bit floats into the FPU registers and they are all expanded to 80-bits). Intermediate FPU results are computed/stored with 80-bit values. For SSE I believe (I could be wrong) that everything is 32-bit internally and register wise.

    For scientific and engineering, 32-bits of floating point (7-8 digits of precision) just doesn't cut it. Most people I know doing that kind of work on a PC (well, both of them) use the FPU but not SSE for that reason. They have apps that take days to perform a single calculation - lots of time for accumulated precision errors to become a factor.

    32-bit floats are currently enough for most 3D-graphics work (at PC resolutions), and those games ^h^h^h^h^h apps are probably a bigger consideration in driving mainstream CPU development. Given that the SSE/2 instructions have multiple math units to perform ops in parallel, there has to be a big transistor savings to have less precision.

    I would bet that the FPU floating point precision on those Sun, Irix, and Alpha boxes is higher than 32-bits.

    -Mp

  17. Re:32-bit FP or 80-bit FP? High end guys need mor by The+Optimizer · · Score: 2

    64-bits, Cool. Hey, I said it was guess. :-)

    For 3d apps that's an interesting trade off: More precision at 2 data items or more throughput at 4 data items.

    That still doesn't invalidate the point about precision for scientific and engineering applications, and understanding that it may be a factor in deciding what systems to run said apps on.

    -Mp

  18. Wrong Hardware by BRock97 · · Score: 2

    Actually, yes, I have. At my current place of employment, we use four 650 Quad Xeon with 2 Gig of RAM a piece, each with an Adaptec RAID controller on it with 128MB of memory. They grind to a halt, being barely usable, but probably a lot like your situation, that is what we have to use. Another division has 1 Sun Enterprise server doing the equivalent and the thing doesn't break a sweat. Sounds like you, along with some of what we do, are using the wrong hardware for the job. Why use x86 when there is much faster hardware out there for vector crunching?

    Bryan R.

    --

    Bryan R.
    The price of freedom is eternal vigilance, or $12.50 as seen on eBay.....
  19. It's A Different Thrown Now by BRock97 · · Score: 5

    Why bother? Every iteration of processors that comes out has some special optimization that is required to run at peak performance. If you use one or the other, it gets you a marginal performance boost. Sure the P4 can do magic if you turn on this compile flag, and then disable this other. Who cares? Things are fast enough now that price should be considered the king. Why spend $100 - $200 more for a processor when all it gets you is a few more frames at 1600x1200 in Quake3. Until the P4 comes down in price (and they are making big inroads for this), the Athlon will be king.

    Bryan R.

    --

    Bryan R.
    The price of freedom is eternal vigilance, or $12.50 as seen on eBay.....
    1. Re:It's A Different Thrown Now by p0six · · Score: 2

      While I'll agree with you that price does make a big difference, don't forget that branding is important too. Intel, with the Pentium (tm), has one of the strongest brands out there, probably on par with big names like Coca Cola. That is one of the reasons that Intel continues to have a big market share even though Althons have been higher performance + lower cost.

      So, in a sense, Marketing is King.

    2. Re:It's A Different Thrown Now by Foxxxy · · Score: 1

      I agree... my experience with the P4 during everyday work has been non-impressive... as far as I can see, my Athlon 1.2 destroys the P4 1.3 I have at work in normal day to day use including MP3 encoding and digital video rendering... With the price of a P4 I don't understand why people insist on buying them

    3. Re:It's A Different Thrown Now by markh1967 · · Score: 1

      Don't count on it being more than you need for very long. Moore's Law still stands true and we'll probably see 5GHz machines as entry level in under four years. I really hope they release a 4.77GHz system. That would be a milestone.

      --
      Input error. Replace user and press any key to continue.
    4. Re:It's A Different Thrown Now by rseuhs · · Score: 1
      That is one of the reasons that Intel continues to have a big market share even though Althons have been higher performance + lower cost.

      The main reason is that AMD does not have enough fabs to produce for the whole market.

      So in a way, the slowing market is a great chance for AMD...

    5. Re:It's A Different Thrown Now by Derkec · · Score: 1

      My personal bet: By the time SSE2 use becomes widespread enough for it to be important, AMD will have it on all their chips. AMD is bright enough to see when instruction sets have to be implemented to stay competitive.

  20. Thats not the question... by EvilJohn · · Score: 3

    The answer is yes, with SSE-2, it will beat the athlon into the ground. Check http://www.hardocp.com/reviews/cpus/intel/p417 out for more details.

    The real question is the Short lifespan on this P4. With Intel going to DDR (thank god) but changing socket types, how viable is a P4 at this point?

    Even gamers think about TCO.

    // EvilJohn
    // Java Geek

    --

    Less Talk, More Beer.
  21. I think you are mistaken by cartman · · Score: 1

    Take a second processor, with more pipelines available for instruction issue. Since it has more pipelines available it is able to issue more instructions while waiting for the branch to the calculated.

    He was referring to pipeline length, not width. In a 20 stage processor at the same clock rate, it takes longer to fill a pipeline and consequently the branch misprediction penalty is worse.

    Suppose you have two processors, each at the same clock speed. One has a 5-stage pipeline, the second a 20-stage pipeline. Suppose that there is a branch every 6 instructions (which is typical). For every mispredicted branch, the first processor need only throw away 4 instructions, but the second 19. If most branches were mispredicted, it would kill the second processor.

    Pipeline length and clock speed are closely related design parameters. Longer pipes allow faster clock rates (because less is done per cycle per stage), but they increase the branch misprediction penalty. Generally there is a "happy compromise" for a processor, between pipeline length and clock speed. Most recent chips have found that happy medium to be around 10 stages. The Pentium-4 is unusual in the regard that it has 20 stages. Branch prediction therefore becomes extremely important.

    Long pipelines tend to benefit Floating Point code more than Integer code, because FP is more loop-intensive, and the branches are therefore more easily predicted. This is why the P4, with its extremely long pipelines, performes poorly on integer performance compared to the PIII, but well on FP.

  22. Re:2 things by BilldaCat · · Score: 2

    how is this flamebait? you cannot seriously claim that AMD has overtaken Intel in the average consumer's mind.

    --
    BilldaCat
  23. Strange?? by GroundBounce · · Score: 1

    Why does this matter so much if you're happily running Win2K?

    1. Re:Strange?? by GroundBounce · · Score: 2

      The Intel Linux compiler will be optimized for the P4 and so there will be at least one compiler up to the job. It will cost money, but if you are really after top performance, you will probably not let a few hundred dollars stand in the way. It appears Intel is trying to make it compatable with gcc (and eventually g++), so ultimately (though not with the beta) you can link in your high performance modules with the vast array of existing libraries that have been compiled with gcc.

      The interesting thing will be to see how well gcc becomes optimized for the Itanium processor, since Intel's long term plans are really to push this as the future workhorse of high performance computing. Since gcc must start over from scratch with this architecture anyway, maybe it will start out more optimized than gcc for x86, which has had to work with everything from the 386 to the P4.

    2. Re:Strange?? by GroundBounce · · Score: 3

      Of course GCC is available for win2k; however it is very seldom (if ever) used for serious commerical applications. Yes, it is used for porting UNIX applications, and these applications tend to run more slowly than their native windows counterparts. I have nothing against Win2k, and I use Win2K as well as Linux and HP-UX, it's just that the performance of GCC on win32 should be relatively irrelevent to someone who uses Win2k exclusively (and his is sig implies that he uses Windows exclusively), except in the rare circumstance that they are porting a UNIX/Linux app, or are using GCC because it's free in which case they are probably not developing an ultra high performance application.

      On the other hand, GCC *does* matter for Linux. It is true that most apps run just fine on Linux compiled with GCC. But clearly newer x86 processors are becoming more specialized and there are applications where every drop of performance counts. I do large circuit simulations, and a 10% improvement could mean getting results hours sooner. For Linux to compete seriously in these areas the apps will have to be compiled with a compiler who's results can compete with what's available under win32.

    3. Re:Strange?? by be-fan · · Score: 2

      Actually, I multi-boot Win2K and Linux. I've been using Linux since Slack 3.5. Should teach you something about looking at the .sig (or the screename!) rather than the post. As for why I care, I was just curious. I do lots of graphics type applications and a good compiler can really speed up matrix processing (which lends itself to pipelining quite well).

      --
      A deep unwavering belief is a sure sign you're missing something...
    4. Re:Strange?? by Diomedes01 · · Score: 1

      Because GCC is certainly available for Win2K - one of it's strengths is availability on so many platforms! Sheesh... jumping down someone's throat because of their .sig is pretty lame. At any rate, to answer the original question, GCC's strength is definitely not speed and/or optimizations. I believe that the GCC team concentrates on having solid support for many different processors, at the expense of speed. I doubt that this will change in the forseeable future, but honestly, anything I've ever needed to compile with GCC has run just fine. Given the current speed of desktop processors, the difference isn't even noticeable.



      -------

      --
      "To hope's end I rode and to heart's breaking: Now for wrath, now for ruin and a red nightfall!"
    5. Re:Strange?? by Diomedes01 · · Score: 1

      I agree, I would never use GCC for a performance critical application. For every-day userland-type stuff, it's fine. For large-scale data processing it certainly isn't the way to go.

      Regarding the whole sig thing, I can see where you're coming from, but just because he's using Win2k on a desktop doesn't mean he doesn't use GCC for development at work or on personal projects...
      At any rate, there is definitely a need for a more optimized compiler under Linux. With Intel releasing their compiler for Linux, this is a small step in the right direction. Unfortnuately, GCC will probably never reach the optimization level that the vendor compilers are at. I would love to see someone write a specialized x86-optimized Linux compiler; maybe use the parsing code from GCC, but redo the code generation. Maybe someone like IBM could get the ball rolling on this in order to show some real support for commercial applications on Linux.


      -------

      --
      "To hope's end I rode and to heart's breaking: Now for wrath, now for ruin and a red nightfall!"
  24. Re:please.... by toofast · · Score: 2

    Actually, you could spell it as "Athlon" rather than "Athalon", and you would be much more credible.

  25. Re:excuse me but um... by JBv · · Score: 1

    Not really.

    Many people (myself included) use cheap pcs to do number crunching for scientific porposes.

    Normaly I use the low end machines, like my home PC (linux duron 900), to develop and test the code I will put to run on alphas.

    I haven't made any calculations, but i suppose that for poor labs with many sudents, the cost of an alpha (for example) could finance >2 "lower end" systems which are also cheaper & easier to maintain and upgrade.

  26. Still a war worth winning by AlpineR · · Score: 1
    Sometimes speed is still king. I recently bought a computer for running floating point intensive simulations. A large part of the cost in my research isn't the expense of the hardware but the expense of my time. So I got the fastest system I could put together. I wanted dual processors and preferred a Dell machine so I was already stuck with Intel CPU's. The only question was whether to go with P-III's or spend $1,000 more for dual P4's. All of my searching on the Web showed that P4's are no better than P-III's for floating point calculations, so I went with the dual P-III system. Intel would now be $1,000 richer if I were aware that the P4 really could perform much faster.

    By the way, I run Linux and compile with g++. Does anybody know if the GNU compiler does a good job of processor-specific optimizations?

    There are more uses for computers than playing games and reading Slashdot. ;-)

    AlpineR

  27. Re:what about gcc? by SpinyNorman · · Score: 2

    IMO gcc's optimization is generally weak. gcc doesn't have any MMX/SSE/SSE2 support, and even without considering vertorization it produces code that's around 20% slower than the Intel compiler.

    gcc 3.0 apparently has an entirely new x86 back end, but from comments I've heard it produces code that's around 5% SLOWER than the old back end... It'd be nice to see some comprehensive benchmarks of gcc 2.95 vs 3.0 though.

    There's a very interesting open source SIMD compiler project (mainly focusing on MMX) at Purdue university:

    http://shay.ecn.purdue.edu/~swar/Index.html

  28. Re:This appears to be the typical load of slashdot by SpinyNorman · · Score: 2

    Did you check what's been in all the high street GHz+ computers for the last year? Maybe P4 is makign a showing now (at least it's made it to the TV shopping channels), but for a least a year you couldn't even find a high end Intel PC retail - because they didn't have a GHz processor that worked (remember the Intel 1GHz - recalled after about 2 weeks).

    AMD is also kicking Intel's ass in Europe, and are expected to continue gaining worldwide market share (from current 20%+ to close to 30% by end of year.

    Most consumers don't know enough to make a technical decision anyway - they're going to buy what's cheapest or what their college student geek son/daughter advises.

  29. Re:P4 can't dethrone Athlon in Linux by srwalter · · Score: 1

    You forget, however, that Intel's compiler does not support many GCC extentions, specifically the all-important inline asm extension. Without this, the compiler has no chance of compiling the kernel. Not to mention that only GCC is supported.

    ==================================================

    --
    Freedom is the freedom to say that 2 + 2 = 4
  30. Re:Uh... Hemos? by oops · · Score: 1
    (Yeah, yeah, I know you meant "iteration." But any computer geek who doesn't know how that term is spelled deserves some ribbing.)

    Spelt. The word you want is spelt

  31. Re:Uh... Hemos? by oops · · Score: 1
    http://www.dictionary.com

    spelt is the past participle
    spelled is the past tense.

    or at least it was when I did my O-level.

  32. Re:excuse me but um... by Buck2 · · Score: 1

    We just bought 12 1.4 GHz Athlon machines with
    1.5 GB RAM each for $10k for neural computations.

    We could have gone with Sun, Irix, or Alpha if
    we wanted one machine with 2-4 processors.

    I looked into it. It wasn't going to happen.

    --

    As my father lik@(munch munch)... ....
  33. Re:One big problem by chrysalis · · Score: 2

    A stock OpenBSD installation is compiled for 386. Did you recompile the kernel and the whole source code with pentium3 optimizations ?

    -- Pure FTP server - Upgrade your FTP server to something simple and secure.

    --
    {{.sig}}
  34. Re:32-bit FP or 80-bit FP? High end guys need mor by eric17 · · Score: 1

    One thing I don't see mentioned here is what degree of precision that SSE-2 has. I'm guessing that it only works on 32-bit floats.

    You guessed wrong. SSE2 can operate on 2 64 bit floats in parallel.

  35. Intel on the right path? by kgasso · · Score: 1

    It seems Intel is on the right path to giving the Athlon a run for it's money... I'm vaguely reminded of how quickly many companies/software developers/etc. picked up support for 3dNow! (likely due to the large number of customers and potential customers with AMD K6-2/K6-3 chips).

    AMD had a fairly large number of developers promising 3dNow! support, and seemed to be doing the "right thing" by helping developers optimize their code.

    It seems Intel has picked up on this, and has made it easy to optimize for SSE-2 with their own compiler plugin for VC. I'm just curious if this breaks AMD optimizations.

    This is definitely a move in the right direction for Intel, though. I don't necessarily like it though, because I'm an avid AMD fan. :D

  36. Re:Problems? by be-fan · · Score: 2

    As of yet, Intel's compiler is the only optimizing game in town. Even AMD uses Intel's compiler when giving Athlon benchmarks.

    --
    A deep unwavering belief is a sure sign you're missing something...
  37. Re:More meaningless numbers by be-fan · · Score: 2

    More meaningless blathering about meaningless numbers. This article wasn't TRYING to measure real world performance! Why do you think they used a benchmark that fit entirely in L1 cache? They were simply trying to measure the peak throughput of the floating point units on the Athlon and the P4.

    --
    A deep unwavering belief is a sure sign you're missing something...
  38. Where is GCC in all of this? by be-fan · · Score: 2

    Is it just me, or is everyone talking about which compiler can vectorize code better for cutting edge architectures, while GCC is still trying to get good P6 optimizations? Seriously, though, does anyone know if GCC 3.0 is in any way competitive with the new MS and Intel compilers?

    --
    A deep unwavering belief is a sure sign you're missing something...
  39. Re:excuse me but um... by be-fan · · Score: 3

    I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha? Although benchmarks claim to show factual information, I've always seen them as a bit biased.
    >>>>>>>>>>>
    Not everyone working on a scientific application is blessed to be in a huge project with infintely deep pockets. There are tons of college students/projects doing different types of scientific computing, and x86 provides a very good price/performance ratio for these users.

    --
    A deep unwavering belief is a sure sign you're missing something...
  40. Re:P4 can't dethrone Athlon in Linux by be-fan · · Score: 4

    However, Intel's C compiler is in Beta for Linux. Thus, apps that need vectorizing could simply pony up $500 for a license and compile with that.

    --
    A deep unwavering belief is a sure sign you're missing something...
  41. Re:The answer is by barracg8 · · Score: 4
    • As a result Pentium III would out-perform Pentium 4 in some occasion, as the latter tends to lose more instructions when branch-misprediction rate is too high.
    Your reasons for the P3 outperformning the P4 don't seem to make a lot of sense.

    Take a processor. It hits a branch instruction. While it is working out whether or not to take the branch, it keeps itself busy by executing instructions from one side or other of the branch. It gets it wrong, so when it realizes this, it throws away a bunch of work it has done. Hence branch misprediction it a Bad thing.

    Take a second processor, with more pipelines available for instruction issue. Again makes a branch prediction. Since it has more pipelines available it is able to issue more instructions while waiting for the branch to the calculated. Again it gets it wrong, and since it has been able to issue more instructions from after the branch, more are thrown away when it realizes a misprediction has taken place.

    The point it that while more instructions are thrown away, this is only because more have been issued, and therefore the fact that you have more pipelines in a new generation does not lead to that processor running slower than previous versions. The increased branch misprediction penalty can only diminish the amount of increased performance that the extra pipelines give you, and not lead to an overall speed decrease, right?

    G.

  42. excuse me but um... by joq · · Score: 3
    They also represent the majority of FPU applications. Most applications contain very few FDIV, but some scientific and engineering applications do.

    I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha? Although benchmarks claim to show factual information, I've always seen them as a bit biased.

    Typical PIV purchaser in my eyes: Gamer, Newbie buying preconfiged pc's. What about this end user where are the stats for the typical purchaser? Sometimes these benchmarks confuse the average person into thinking the PIV is lowly in comparison to others.

    In this article we will try to answer the following three questions:
    • How well will the Pentium 4 and Athlon perform with software that is compiled with newer compilers (MS Visual 7.0 and Intel C 5.0.1)?
      2.Can better compilers automatically create SSE-2 optimized code from simple C++ code?
      3.Can Pentium 4 aware compilers boost the Pentium 4's floating-point performance past the strong FPU of the Athlon?

    Again I may be off my rocker here, but most developers I've met have always customized their own machines, dual processors, other architectures, so again is it completely unbiased to say the PIV lacks? Förvirring om denna skit =\
    1. Re:excuse me but um... by randombit · · Score: 1

      I fear that in the next decade we will stop using Alphas, if things continue to develop as they are.

      I have some hopes that AMD's SledgeHammer will end up being an Alpha replacement of sorts. In a lot of ways (FPU, bus), the Athlon is like a 32-bit version of the Alpha.

    2. Re:excuse me but um... by jmv · · Score: 2

      I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha?

      I do. For my master project, I've trained hundred of neural networks, each taking between an hour and 2 days to train. At my job, we're doing the same kind of stuff on linux and solaris PC's. I believe a lot of people do that too. PC's are so cheap compared to the other architecture, that it's still the best thing to buy for many types of computations.

      And by the way, training a neural network requires about one division for several millions of add/mul.

    3. Re:excuse me but um... by Doomdark · · Score: 1
      I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha?

      That trio has traditionally been top pack for projects with enough money (I loved our alphastations), but like others say, price/performance ratio favours x86. In future, though, the biggest problem is that out of 3, only one seems to survive on medium term: Alpha will be discontinued (see other articles at ./), and it seems to be only matter of time when same happens to SGI stuff (if not to the whole company, much like to DEC).

      Also, what about HP-PA? It used to be a good performer on benchmarks at some point? Its future looks very bleak too, of course, but I'm just curious whether it was used by scientist (or just faithful HP customers running legacy accounting applications... that's where I saw it most often).

      --
      I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
    4. Re:excuse me but um... by cfleming · · Score: 1

      "I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha? Although benchmarks claim to show factual information, I've always seen them as a bit biased."

      Nobody in the scientific community purchases Sparcs of SGI's for computational work anymore. The Sparc and MIPS boxes that you see crunching today are all older boxes that still work. Today Sparcs and MIPS are slower and more expensive than a reasonable PC. You are just as likely to see simulations done with an Apple as you are with a brand new Sparc or Indy. I.E. not much.

      Today everybody buys PC's and Alphas, although Alphas are getting used less and less as the i386 has caught up. I fear that in the next decade we will stop using Alphas, if things continue to develop as they are.

    5. Re:excuse me but um... by PM4RK5 · · Score: 1

      I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha?

      Recently, the Alpha structure was sold to intel. Anyway, if Intel continues on there we-need-a-new-socket trend, then the P4 will be outdated (in the realm of connectivity) soon. And with AMD's x86-64 technology coming out the end of this year from what I understand, it may become the processor of choice for most scientific computing, because it would easily be another low-cost, high-performance solution from AMD. I don't really know, but thats my $0.02.

    6. Re:excuse me but um... by jlmaccal · · Score: 1

      Actually in some research areas PC type hardware is rapidly becoming popular at the expense of traditional high end equipment. I work in a lab that does molecular dynamics simulations and we recently bought 60 dual 1Ghz P3's. Not only is the price/performance fantastic, but the performance is very good as well. The software we use has SSE and 3DNow optimizations and beats more traditional high end workstations hands down. The SSE optimizations account for a nearly 100% percent increase in speed.

  43. Re:Uh... Hemos? by quartz · · Score: 1

    I know he meant "iteration", but the first thing that popped into my mind when I first read the story was, for some reason, "interaction". Hmm.

  44. Re:Intel Compiler costs $???? by HerbieStone · · Score: 1
    I guess it is for the same reason why nVidia doesn't release their driver in source: Because it would give the concurence an insight how their chip is working. So they can't do it.

    That's also the reason why in the long run open-source projects are more efficient than closed-source projects.

  45. Anyone know when M$ VC++ will support SSE2 native by UnknownSoldier · · Score: 2

    Is it just us game programmers using SSE2 or are other apps using it?

    *gets a feeling of PPro all over again*

  46. Re:Thats not the question... (Wrong conclusion) by TheSunborn · · Score: 1
    From the link:

    Notice that the scaling of the scores for both Win2K and Win98SE are very much the same, but we do see better overall scores with Win2K. The 1.33 Athlon outperforms the 1.7GHz Pentium4 at stock speeds, while the 1.53GHz Athlon outperforms the 2.1GHz Pentium4. It is our opinion that the SysMark2000 benchmark reflects the "normal everyday computer user's" plethora of programs used pretty well.

    So no the P4 do not beat the athlon
  47. Re:What I thought J__ was supposed to do... by maraist · · Score: 2

    This is an interesting take, but there are other considerations.. First, if you read the Findings of Fact against MS, you can easily be led to believe that MS's real purpose was to lock java to the MS platform by polluting it was ms-only extensions that proliferate on the net. This practice showed up too often in other areas to ignore its likelihood.

    Next, why would MS want a write-once run anywhere development environment for themselves. They're not about to build their drivers and win32 API in Java, and any apps that they build on top of them is pure C++, so all it would take is a simple recompile for the different platforms.

    When Java came out, I don't believe that Alpha-NT was that popular, and SGI-NT was being dropped (not certain about the timing, but it seems about right).

    I agree with you about win 9x being stepping stones, but I don't think cross-platform was a big focus for NT. Yeah they have the Hardware abstraction layer, but I don't know that this wasn't more for stability and protected code than for true platform independance. Thought it was really just a carry-over from VMS.

    -Michael

    --
    -Michael
  48. Re:The answer is by maraist · · Score: 2

    yes and no.. how do you categorize a new chip? The AMD K5 / K6 were roughly inline with the P5, but they were separately designed.. It of course comes down to marketing. BUT, what you can look at is the generation of the design. Pentium introduced (for x86's anyway) relatively deep pipes and multi-instructions. The next generation was OOE. You may or may not be able to categorize the Pentium4 as a new generation based on it's double-pumped integer. I think that all the other aspects of the P4 are simply augmentations or incorporations of nifty ideas (like caching the decoded ops, which I believe AMD did a while ago).

    I use to call the IA-64 the P7 just so that may lay-friends could know what I was talking about. It's VLIW / speculative execution could probably be considered a new generation. But in reality it's a completely separate product with hardly any ability to compare to the x86 line.

    I think, however, that I'd recognize SMT / CMP as a next generation label.

    -Michael

    --
    -Michael
  49. Re:Hold on a minute... by maraist · · Score: 2

    Not sure that I'm reading you correcty. My initial impression is that SMT / CMP would hurt cache hits. If you had an app that was single threaded, then obviously SMT won't help you, but CMP would have you compete for cache space. If you had a multi-threaded app, then yes their code-cache would most likely have less thrashing, but their data stands a good chance of competing for the same space.. In single threaded operation, your cache can afford to risk having 2, 4, 16, etc memory locations overlap on a cache line, since it's not too statistically likely that you'll thrash.. But if you have SMT, then various types of applications that require large data-sets (such as text-processing web servers), enhance the chances for accessing conflicting memory regions. Even with more expensive cache architectures, the likely-hood of cache-conflicts is still higher with SMT.

    My understanding of the proposed SMT on x86 is that you simply switch to another thread when there's a memory stall. I think SPARCs have done that for a while... What I believe you're referring to is the reduction in the number of times you have to context switch and thereby flush your cache. Though it's true that having fewer distinct processes (even LWP ones) requires fewer context switches, I believe that you are not given a time-delta extension simply because you have 2 or more threads associated with a process for an SMT core. Thus, I believe the time-delta is still the same for all processes (minus HW interrupts), and the number of cache flushes per second is the same. Hence, little realized benifit.

    Just for completeness, what I think you do get is fewer memory stalls within your time-delta. Additionally, if each thread is stalling, then you at least have multiple concurrent memory requests, which I believe does suite RDRAM well. You could achieve a similar situation by having multiple independant banks of SDRAM (like nVida's GeForce 3).

    In summary, if anything, cache is the weak link towards multi-core / multi-threading.

    --
    -Michael
  50. Re:The answer is by maraist · · Score: 2
    On a side note, I always thought that the P-III only had 3 pipelines, one of which could execute any micro-op, and the other two of which could only execute the simpler micro-ops.


    Believe you're thinking of the number it can "issue", which is separate than the number of [semi-]independent pipes. In the PPro, some instructions (like divide) would lock other pipes or stages within it's own pipe. Issuing instructions is expensive, so it's generally accepted that you issue less than the number of pipes, but as the P4/Athlon have significantly more pipes than their predecessors, they have augmented the number of issued instructinos by 1 or so.

    -Michael
    --
    -Michael
  51. Re:More meaningless numbers by Knobby · · Score: 1

    You spelled AltiVec correctly

    The PowerPC camp has something of an advantage over Intel in terms of there SIMD extensions.. Namely, the only large company shipping their processors is Apple and Apple is involved in both the compilers and OS for the machines they ship.. Now how does this help? Well, let's look at OS X.. If I develop a code that dynamically links to Apple's libraries and those libraries are written to take advantage of the AltiVec unit, then my app should see an immediate speed-up.. I don't need to do anything crazy, I just need Apple to construct their installer to check the processor type and install libraries optimized for the 2-3 processors they support..

    It may also be worth mentioning, that this could probably be done with glibc, but it will be more difficult to support optimized libraries for the umpteen million processors out there..

  52. Re:P4 can't dethrone Athlon in Linux by GooberToo · · Score: 1

    Does anyone know if vectorization optimizations are being planned for GCC? Is Intel going to pony up anything here?

  53. The x86/x87 instruction set by neonstz · · Score: 1

    After reading the article the conclution seems to be that that the x87 instruction set is ineffective. Everyone who has spent some time writing assembly code for the x86 architecture (and other architectures) knows that. Intel has finally moved forward and replaced the FPU instructions with something more sensible. This may or may not be a step in the wrong direction. Think about it, backwards compatibility is one of the biggest reasons why the x86 architecture has been so successful. I think it is a bold move from Intel to change this, but on the other hand, Intel is the only company that has the resources and influence to do this. AMD can only join the ride.

    Athlon still gives the users most bang for the buck. Almost everyone I know (including me) has bought (or will buy) an Athlon for the next CPU upgrade. Combined with the RAMBUS-mistake this may be the end of the Intel domination. However, I don't think we'll see the result of this until IA-64 is the standard Intel CPU.

  54. Re:CPU-specific optimisations by robertito · · Score: 1
    Maybe in the future we will see commercial code being distributed in such a way that parts of the code are compiled on the destination machine as the code gets installed. That way the code vendor can test a variety of complier options and not have to ship 42 different binaries for all the different CPUs in use.


    Distribute with source code? You crazy fool!
  55. Marketecture by diablovision · · Score: 3

    It seems Intel may have bet the farm on Marketecture...20 stage pipeline to reach multiple gigahertz speeds, double pumped ALUs that run at twice core clockspeed, a trace cache of recently decoded RISC "micro-ops", and SSE2, almost 200 new floating point SIMD instructions that are supposed to give incredible performance. Yet the Pentium 4 has trouble against a lower clocked Athlon in many many benchmarks.

    Intel is the market leader, but they shouldn't let their marketing team design their chips!

    --
    120 characters isn't enough to explain it.
  56. Re:processors heading in the wrong direction.... by King_TJ · · Score: 1

    Yeah, they are getting more proprietary -- but that's bound to happen, since there are so few players in the CPU manufacturing game.

    If you're Intel and have that large a market-share, why can't you decide that your latest CPU will throw in a few "twists"?

    It won't reach a point where you need a new OS for each chip, though. That's been done already (see DEC Alpha, for a prime example) and the market has rejected it.

    Most of these "processor extensions" such as 3D-Now, MMX, etc. are purely optional. Your code will still run fine if you don't take advantage of the extras.

  57. Re:CPU-specific optimisations by randombit · · Score: 1

    holding their hotspot code and use dynamic library linking to load in the right one after doing a CPU detection routine. This sort of thing is already being done to a certain degree with Windows based games that might support 3dNow, SSE, SSE2, etc.

    Redhat 7.1 has that for i686 machines, actually (just libc/libm and a few others, but then again, those are the ones that are really going to matter). It's invisible to the programs, it's just some magic in the dynamic linker. Of course if you're compiling it yourself it's a much simpler job, but then again one can always offer DEBs/RPMs compiled for the specific architeture.

  58. Re:compiler plug-ins by VAXman · · Score: 2

    Does AMD have an optimizing compiler for the Athlon that you can plug into VC++? If so, it should have been included in the tests this article ran.

    No they do not. AMD uses Intel compilers for their SPEC scores since it is the best X86 compiler.

  59. Re:In that case... by VAXman · · Score: 2

    In that case, Intel could insert some obfuscated code to detect AMD CPUs into its compilers' output and then run delay loops on AMD CPUs to create a phony lack of benchmark performance.

    You seem to be confused. AMD has the choice of any compiler in the world to use when submitting SPEC benchmarks. They choose to use Intel's because it is the best. If Intel crippled support for AMD processors in its compiler, then AMD would use a different compiler. Of course, if AMD had compiler expertise they would develop their own compilers optimized for their chips. But they don't know how to develop compilers (and that will be quite a performance limiter for x86-64 since they will have to rely on GCC, which has terribl performance).

  60. Re:The answer is by steveha · · Score: 2
    The length of the pipelines is not the main reason that the Pentium4 sucks. The main reason is that the chip is broken in several important ways, such that you need to rearrange your code specially in order to mitigate the broken stuff. This is straight out of the article you cited (great article, I agree!).

    Historically, if you took code for one processor and ran it on a later processor, the later processor would always do a better job of running it than the original. (The major, glaring exception to this was the Pentium Pro, which really sucked unless you optimized the code for it.) This is why Linux distributions such as Debian just optimize for the 386 and call it good -- most of the time, for most of the applications, you won't pick up very much performance by optimizing for a specific chip architecture. (By the way, you should rebuild your kernel with chip-specific optimizations. Your kernel is running all the time, and any savings will add up quickly. Of course, all the CPUs are so fast these days that few of us will really notice any difference even with the kernel.)

    But now the Pentium4 has so much wrong with it, that unless you rearrange the code specially, it chokes and underperforms. The Level 1 cache is actually a cache for decoded instructions, which is cool... but it is only 8K, which is insane! Sure, since the instructions were already decoded, the 8K cache is probably worth a bit more than a simple 8K instruction cache, but the Athlon has a 64K instruction cache! The Pentium4 has all these internal execution units, but it can only feed three of them per clock cycle from the cache, so most of them will be idle in any given clock cycle. And while earlier chips introduced cool features that would make code run really fast (bit-shifting was really fast, and there were special instructions like CMOVE) these all run dog-slow on the Pentium4.

    So, the Pentium4 runs really hot, and needs special cooling and a special power supply. Right now it needs expensive RDRAM. And it needs special optimizations to allow it to run at full speed. Summary: unless you really need its special features, buy an Athlon.

    When does a P4 beat an Athlon? Some specific situations where RDRAM is really appropriate, some specific situations where the SSE features really work (and assuming the code is optimized for it), and that's about it.

    Can a future P4 dethrone the Athlon? Maybe. Intel claims that the P4 is slower, clock-for-clock, than the Athlon for a good reason: because the P4 will reach really high clock speeds really fast. Some breathless press release I read said something about a 10 GHz version of the P4 within four years or so. Let's face it, the P4 can stay as broken as it is and still stomp the Athlon if Intel can really get the P4 going twice as fast or more than the Athlon! But I'll believe it when I see it. The current P4 goes into thermal overload and slows to half-speed if you work it really hard, and dissipates 73 Watts at 1.5 GHz; even with a die shrink I'll bet a 10 GHz P4 would melt itself into a puddle.

    Because the Athlon gets more work done per clock, and is available at clock speeds nearly as high as the P4, the Athlon is better than the P4 across the board. There are a few narrow situations where the P4 is better than the Athlon, but if you check the price/performance ratio the Athlon still wins.

    steveha

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
  61. Compiler vs processor by BradleyUffner · · Score: 3

    After reading the article it looks like Intel is much better at making compilers then it is at making it's processor. The article says that the intel compiler is a "masterpiece", and a work of genius. It looks to me like thier compiler is a lot more impressive then thier CPU.
    =\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\ =\=\=\

    1. Re:Compiler vs processor by VBL · · Score: 1

      Well for Itanic most of the compiler technology comes from Multiflow via HP.

    2. Re:Compiler vs processor by Courageous · · Score: 1

      "After reading the article it looks like Intel is much better at making compilers then it is at making it's processor."
      ----
      Well, consider what they are doing with Itanic, and you will realize that they absolutely *HAVE* to have a world-class compiler team. Itanic would otherwise fall flat on its face, the way it moves all of the branch-prediction and parallelism code over to the compiler.

      C//

  62. What I thought J__ was supposed to do... by alexhmit01 · · Score: 2

    Remember than Chicago/Windows 4.0/Win93/Win95 was designed to create a transition OS to get the code to Win32 faster. Also, NT was build as a cross platform OS because MS didn't want to be dependant on Intel. Remember, everyone thought that x86 was nearly dead at that point.

    I assumed that the idea of J++ was for MS to have their own Java. That would give them tremendous platform independance. You would write "cross platform" Win32 code, meaning it would run natively on any MS OS. I had always expected that this was why MS bought into Java. An MS version would work on MIPS/PPC/Alpha/x86.

    Given that their RISC compilers were always a gen back, this never materialized. However, shipping a semi-compiled mode would have let them become truly cross-processor. I mean, think of it as Install Shield on crack... or a BSD port...

    Alex

  63. Or Java, or any open source application... by codemonkey_uk · · Score: 2
    Just for the record, all you say also applies to Java (bytecode VM generally using JIT means per platfom optermisation from a single binary distributable).

    And of course, as soon as GCC can take advantage of whatever the latest CPU gizmo, everyone who runs an open source OS, or application can simply simply recompile for a performance boost.

    All the more reason, me thinks, for the chip vendors to help the open source compiler developers.

    Thad

    --

    Thad

    1. Re:Or Java, or any open source application... by Doomdark · · Score: 1

      While true, the benefit of Java/.NET over gcc is that most people don't keep on recompiling their applications (or get the correct one installed, if there's a choice), whereas transparent recompilation JIT happens independent of users actions. Downside with (current, at least) Java JITs is that they do this every time application runs, which is unfortunate. Would be nice if they did cache the end result, perhaps just checking timestamps of bytecode files vs. native compilations (and save the native stuff in the first place).

      --
      I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
  64. Have a look at mprime by blip · · Score: 2

    There will be a new version of the famous mprime. Look at www.mersenne.org. The upcoming v21 includes advanced P4 optimizations. According to their mailing list, the P4 is faster than the Athlon. Cheers! Blip

  65. CPU-specific optimisations by nickovs · · Score: 4
    It seems to me that the tests that were used to give the Flops crown to the Intel CPU are a little biased. Surely for a fair test the Athalon should have been tested with the latest experimental AMD compiler as well.

    As CPU designs get more complex the compliers need to know more ane more about the exact nature of the CPU. Despite the lable of binary compatability given to the CPUs from AMD (and others), those who need to squeeze the best performance out of machines are going to need to run code that is complied for their specific machine. Despite the best efforts of the open source community most end users do not want to recomplile source, let alone spend time finding obscure /QaxW flags to make the most of the system. Really this should be a job for the OS.

    Maybe in the future we will see commercial code being distributed in such a way that parts of the code are compiled on the destination machine as the code gets installed. That way the code vendor can test a variety of complier options and not have to ship 42 different binaries for all the different CPUs in use.

    --
    If intelligent life is too complex to evolve on its own, who designed God?
    1. Re:CPU-specific optimisations by geomcbay · · Score: 3

      The most likely solution for the short term is that developers will compile multiple versions of DLLs (or .so's in Linux/UNIX space) holding their hotspot code and use dynamic library linking to load in the right one after doing a CPU detection routine. This sort of thing is already being done to a certain degree with Windows based games that might support 3dNow, SSE, SSE2, etc.

  66. P4 can't dethrone Athlon in Linux by spiro_killglance · · Score: 2
    Because Linux is built with gcc, and linux apps are built with gcc, and gcc doesn't have intel's compiler's SSE2 auto-vectorisation shenanigans.

    By next year when many programs are SSE2 enabled, AMD clawhammer should take back any lead Intel gets, because it use SSE2 as well.

    1. Re:P4 can't dethrone Athlon in Linux by Hater's+Leaving,+The · · Score: 1

      With Intel in flux (changing RAM architecture etc.) AMD probably do have the time to bring out SSE2 enabled CPUs that are better all-round than the P4, yes. However, AMD have got to get it right first time, and canät afford to release something as broken as the current P4.

      I found the following (the 2nd link particularly) interesting.
      http://www.theregister.co.uk/content/35/15736.ht ml
      which points to
      http://www.emulators.com/pentium4.htm

      THL.
      --

      --
      Keeping /. cynic density high since the fscking Kwhores/trolls arrived.
  67. Re:This is nothing new... by SuiteSisterMary · · Score: 2

    Ah, but if Intel supplies a compiler that does, say, 50 percent of the job for programmers, then sales go up. Sales go up, the programmers do the other 50 percent. Joy, bliss.

    --
    Vintage computer games and RPG books available. Email me if you're interested.
  68. In that case... by yerricde · · Score: 1

    AMD uses Intel compilers for their SPEC scores

    In that case, Intel could insert some obfuscated code to detect AMD CPUs into its compilers' output and then run delay loops on AMD CPUs to create a phony lack of benchmark performance.

    --
    Will I retire or break 10K?
  69. Hotspot optimizations can be dynamically linked. by yerricde · · Score: 1

    or else we're going to start seeing "Windows/Pentium 4", "Windows/AMD", "Windows/64-bit AMD" and "Windows/Itanium" sections in compUSA

    As another user commented above, high-performance consumer applications often put their hotspots into DLLs so that a build optimized for a given microarchitecture can be used. For example, Windows could have nthotp4.dll and nthotk7.dll. And no, *hammer and Itanium would not have their own sections, as app binaries would be shipped for multiple architectures (as was done during the Macintosh computer's transition from 68K to PowerPC processors).

    The other way to do it would be to recompile the software at installation time. For example, ALFS and Gentoo are Linux distributions that come as source; a distro based on ALFS or Gentoo would provide boot floppies for each architecture, a CD with just enough binaries to get the compiler going, and a source CD, and then build everything especially for your processor at installation time.

    --
    Will I retire or break 10K?
  70. Re:Intel should.. by Doomdark · · Score: 1

    Just an idea (which probably is totally wrong); perhaps they view this as an opportunity for competitors like AMD to get their hands on useful compiler technology? And perhaps also they think that compiler implementation might give hints about internal chip design (far-fetched?). Of course it could and may well be just good old-fashioned corporate culture, where giving away stuff is "so dot.com". :-)

    --
    I like paying taxes. With them I buy civilization -- Oliver Wendell Holmes
  71. Re:One big problem by bellings · · Score: 2

    bullshit, quasi-informative troll about BSD elided...

    Wow. You sound like a really smart guy. I bet you can think of all sorts of reasons why BSD is dying. Why don't you share some with the slashdot community?

    --
    Slashdot is jumping the shark. I'm just driving the boat.
  72. Re:honest question... by jmichaelg · · Score: 1

    The benchmark source is at http://www.aceshardware.com/files/benchmarks/flops _dist.zip - download it to your powerpc, compile it and find out. Or, take a look at the Seti@home benchmark which shows the powerpc getting a score 1/8th the Athlon's (3 vs 25). Course they both get trounced by the HP PA-Risc which for some odd reason Fiona has decided to drop in favor of IA-64.

  73. Re:Anyone know when M$ VC++ will support SSE2 nati by connorbd · · Score: 2

    So really the only people who need to get excited about this stuff are the driver writers and their brethren who write plugins for things like Photoshop is what you're saying.

    Somehow I'm not shocked.

    /Brian

  74. Re:This appears to be the typical load of slashdot by ostone · · Score: 1

    Repling to myself...
    I'm not saying Intel is the invincible king of processors... all I am saying is that AMD is not really the leader yet. In general people know the word Pentium, and while the number is growing less know the word Athelon and even less know that it is a better processor. NO I'm not an Intel lacky I just like to think that slashdotters could cope with the idea that the world isn't perfect and just because something is better means it's winning the market.

    --
    Remove *your pants* to send me email.
  75. This appears to be the typical load of slashdot bs by ostone · · Score: 3

    Do you people pay attention... yes AMD is making better processors... yes people who really know choose AMD over Intel... therefor slashdotters choose AMD over Intel...

    But back to the real world: If you turn on a computer out there in happy fun land (aka "The Real World"(TM)) then odds are it will be a running Intel. Linux your precious kernel started out with optimized non-portable code for the i386. You geeks keep falling victim to the same trap year after year... just because it's better doesn't mean people will use it. Linux/BSD/Solaris/Irix/SVR4/MACOS/BeOS... is clearly better than Windows when you look at a track record... and yes in some cases can be *almost* as easy to use... but they have been winning the OS war since they made a *bad* ripoff of the Macintosh (read Xerox) GUI OS. MacOS was better, more stable, and quite cleaner... but Micro$oft had the market share and they won. People listen to money, and Intel is still the processor most people/companies would prefer buying. Hackers are one of the lowest demographics in the computing industry these days and people (outside of their community) don't pay much attention to them.

    Well, I guess thats it... go ahead return to your illusion and mod this down.

    --
    Remove *your pants* to send me email.
  76. Re:Hold on a minute... by andr0meda · · Score: 1

    Smaller RAMBUS latencies? It was my understanding that RAMBUS had masive latencies, but really good bandwidth. Did I miss something here?

    No you're quite right, the latencies stay. What I should have said was that the technology is less dependant on them because of the relatively fewer cache misses, since the pipelines can stay at 20 stages. So overall, the chip can take better advantage of RAMBUS. Sorry for the quirk there.

    --
    With great power comes great electricity bills.
  77. Re:The answer is by andr0meda · · Score: 1


    Yep, it's a slip of the mind, sorry It should be the other way around of course.

    --
    With great power comes great electricity bills.
  78. Re:The answer is by andr0meda · · Score: 5

    it's due to the fact that Pentium will flush the entire pipelines during branch-misprediction/pipelines stall. As a result Pentium III would out-perform Pentium 4 in some occasion, as the latter tends to lose more instructions when branch-misprediction rate is too high.

    Rumours have it that PentiumIV will have Simultaneous Multithreading(SMT) enabled, which let's the processoor run any instruction from any thread on any unit at any time. Supposedly this feature was allready included in current processor designs but not enabled because the P6-4 is not ready for SMP yet.

    AMD uses On-chip Multiprocessing(CMP) in Sledgehammer, which is basicly the sames as subdividing the resources of the cpu (registers & units) between the threads. The benefit of this technique is that the design can be kept simpeler and the clock can go faster than a similar monolithic chip with the same resources. On the other hand, a lot of resources are wasted if only one thread is operational in this setup.

    Needless to say, SMT has some problems too, for example, CMP lends it self much better for branch prediction through Slipstreaming than SMT does. You can find some good reading in this previous slashpost about how intel and amd deal with multithreading on their single/multiprocessor designs. To be taken with a bit of salt of course, but very sharp.

    My point is that if branch prediction in the form of Slipstreaming is implemented (and Jackson Technology seems to be that kind of SMT), the P6-4 problems with the excessive cache flushing are completely over, and SMT can take full advantage of the smaller RAMBUS latencies, easily outperforming a similar CMP setup like AMD has.

    --
    With great power comes great electricity bills.
  79. Re:The answer is by Diomedes01 · · Score: 1
    The point it that while more instructions are thrown away, this is only because more have been issued, and therefore the fact that you have more pipelines in a new generation does not lead to that processor running slower than previous versions. The increased branch misprediction penalty can only diminish the amount of increased performance that the extra pipelines give you, and not lead to an overall speed decrease, right?

    I'm not sure, but I don't think that this is the case. Let's use the numbers that the original poster gave, because I am not familiar with the specifics of the P3/P4 pipelines.
    Pentium 4 - 20 stage pipeline (x9)
    Pentium 3 - 10 stage pipeline (x6)

    So, we have both processors running along, merrily decoding instructions into micro-ops, and executing them. A branch instruction is executed, and the processors both proceed to implement their respective branch prediction algorithms. Upon guessing, they immediately begin executing the branched-to instructions.

    Ok, now assume that the results of the branch calculation are available 2 cycles before the end of the pipeline (via data forwarding). 8 cycles later, the P-III realizes that it guessed wrong, and flushes the pipelines (10 cycles). So, 18 cycles later, the P-III is executing the proper code.

    Now, look at the P-IV. It is at cycle 18 that the branch calculation is available. Whoops, it's wrong! Now, we have to flush that 20 stage pipeline. 38 stages later, we begin executing the proper code.

    Keep in mind that this assumes the same branch prediction probability for both processors, which certainly isn't the case. So even though there are MORE pipelines, that just means that more instructions get thrown out.

    I would love someone to check this for accuracy, because it has been a long while since I dealt with CPU architecture. Take everything I say with a grain of salt.

    On a side note, I always thought that the P-III only had 3 pipelines, one of which could execute any micro-op, and the other two of which could only execute the simpler micro-ops.




    -------
    --
    "To hope's end I rode and to heart's breaking: Now for wrath, now for ruin and a red nightfall!"
  80. Re:Isn't the Pentium 4 in the P7 family? by 11223 · · Score: 2

    The 80186 was never used widely outside of embedded systems and was simply a slightly extended 8086.

  81. compiler plug-ins by maddogsparky · · Score: 1
    The article states that the Intel compiler can be used as a simple plug-in, taking the place of the default M$ compiler. So by recompiling, any app could use SSE2. SSE2 and non-SSE2 code are both generated by the compiler and the decision of which to use is made at runtime.

    My question is:

    Does AMD have an optimizing compiler for the Athlon that you can plug into VC++? If so, it should have been included in the tests this article ran.

    --
    science is a religion
    1. Re:compiler plug-ins by Narcissus · · Score: 1
      An optimising compiler for the Athlon is not as necessary as one for the P4. The Athlon was designed to optimise code within the CPU (ie. it was made to work better with current code), as opposed to the P4 which needs code to be optimised by the compiler to run faster.

      This is the major problem with the P4. Admittedly, the article I read this in is a little old, but I'm sure it still applies today. Either way, this article is what made me absolutely certain that I'd chosen the right CPU. You can find it at http://www.emulators.com/pentium4.htm .

      There's a lot of background info on CPU's leading up to the P4 and the Athlon, but it's quite an interesting article...

  82. Re:The answer is by Espen+Skoglund · · Score: 1
    Rumours have it that PentiumIV will have Simultaneous Multithreading(SMT) enabled, which let's the processoor run any instruction from any thread on any unit at any time.

    It is supposedly going to be implemented in the Foster next year (enter ``foster'' and ``smt'' into your favourite search engine). I recently also independently heard this from some Intel source, so I guess there are some thruths to these rumours.

  83. Here's a good analogy.. by willy_me · · Score: 2
    Take a processor. It hits a branch instruction. While it is working out whether or not to take the branch, it keeps itself busy by executing instructions from one side or other of the branch. It gets it wrong, so when it realizes this, it throws away a bunch of work it has done. Hence branch misprediction it a Bad thing.

    One thing that you forgot - it takes more time to go back and run the other branch if there is a longer pipeline. Hence, a CPU with a long pipeline will sit there idle as the data makes it's way through the pipeline.

    To better visualize how a pipeline works I like to think of this little analogy:

    Have a line of people passing buckets of water from a well to a burning house. Given that every person works at a given speed, it requires them a defined amount of time to move the water from one person to the next. The more people present, the smaller the distance required to move the water. This allows them to move more buckets in the same amount of time (or operate at a higher frequency - just like the P4.) The problem is it takes longer for the water to actually get to the fire (assuming 20 vs 10 people working at the same frequency.) Now lets say there are two different kinds of water (a very hypothetical situation.) Should the wrong type of water be sent and arrive at the house, the guy at the house would have to tell the guy at the well to send over the correct type of water. Now with more guys in between the two, it'll take longer for the correct water to get to the house. While the water is in transit - the guy at the house sits wasting his time.

    So as you can see - more people increases the potential speed. The speed determines the volume of water being sent. This is great but if the wrong thing is sent it takes a long time because the correct "thing" has to travel through the enlarged pipeline.

    A long pipeline is great if you're running code that doesn't have a pile of "branch if" instructions in it. Performing an "add" on every byte in a 4MB file (think Photoshop) will result in very efficient use of the CPU. However, if you're running code with lots of "if then" statements then you run the risk of wasting a great deal of CPU time. This is where a smaller pipeline helps (or should I say doesn't cause as much damage.)

    The other big problem with a large pipeline is that it greatly increases the complexity of the chip design. More transisters result in the components of the CPU getting spread further apart - hence you need an even longer pipeline (think of the buring house example - the house just moved an extra block away from the well.)

    Overall, chips with smaller pipelines offer far greater efficiency. Look at a G3 PPC CPU. It has a 4 stage pipeline. Because of this it maxes out at 700MHz but it is faster then a PIII when comparing MHz to MHz. All this and it's a third of the die size and typically uses only 5Watts. You can also look at the Alpha with it's 7 stage pipeline. It might not operate as fast (MHz) as todays P4s or Athlons but it still offers incredible performance.

    The real advantage of the P4 will come with multimedia type applications. The problem is that it will quickly max out the memory bandwith. Now take an Athlon - it might not be quite as good for those same apps but so long as it can also max out memory bandwith you're not going to see a difference. As John Cormack (did I spell that right?) said in a receint /. posting - the new G4 is great but the main problem is memory bandwith. As CPUs double in speed this will become an even greater problem.

    Willy

  84. This is nothing new... by Ixnert · · Score: 1
    We've all known since the P4 appeared that IF SSE2 is adopted en masse by software developers, it would have a significant advantage over every other chip on the market.

    It's just that that's one heck of an "if", especially as long as sales of the P4 are as weak as they have been so far.

  85. Re:Intel Compiler costs $???? by Grishnakh · · Score: 1

    But Microsoft makes a lot of money selling their compiler and development tools. Intel can do the same (or at least that's their flawed way of thinking...)

  86. Usefulness of P4 by Grishnakh · · Score: 1

    Seems to me that, given its long pipelines, usage of RAMBUS memory, dependence on a special compiler, and subsequent high performance for floating-point computation, the only place the P4 can even hope to get an edge is with gamers. And this is dependant on the game companies compiling with Intel's compiler.

    For everyone else (everyone running office apps, non-gaming apps, or non-MS operating systems), the P4 is a real waste of money.

  87. Re:ehh? Thats dumb by Grishnakh · · Score: 2

    The main reason for this is that there's more to a car than just an engine.

    Having a smaller engine with more HP/Litre allows you to have a smaller (lighter) car. The reduction in torque becomes much less significant since the engine has less mass to accelerate. If GM made an small 4-cylinder engine with only 61 HP/Litre, and put it in a car the size of the S2000, it'd be terribly slow and wouldn't compete too well.

    If Honda made a 5.7L V8 engine, it probably wouldn't scale linearly, but I'm sure they can easily do better than 61 HP/Litre. Why haven't they? They probably 1) aren't interested in large V8 engined-cars, or 2) don't feel that it'd be a profitable market segment for them to enter, especially given their reputation for small, lightweight cars. They are planning to make a V8 NSX soon, though, although it'll probably be more like 4.0L.

    In the end, I guess it comes down to which kind of engine you prefer in a car, and in what kind of car: a small car with a small, high-revving engine (but not much torque), or a larger, heavier car with a large, powerful engine which concentrates more on low-end torque. If you like lots of torque, Honda probably isn't the company for you to be buying from.

  88. Re:Ace Hardware??? - John Madden? [ot] by ackthpt · · Score: 1
    If John Madden had written it:

    You'd still be reading it

    There'd be refernces to ceiling fans or what color bird just went by the window

    It may be in that anyman style, but you'd still be confused what his final point is

    You'd suspect CBS had bought out ACE's and was trying to mass-market their reviews and articles

    There'd be the familiar Monty Python "Laugh, It's Funny" icon on the header telling you not to take it too seriously.

    --
    All your .sig are belong to us!

    --

    A feeling of having made the same mistake before: Deja Foobar
  89. Re:It's A Different Thrown[sic] Now by ackthpt · · Score: 5
    Agreed, the market is in a slump and people shopping for computers are going to be bargain hunting for some time. Even more rumblings about layoffs at the ever-optimistic Intel, despite yammering on about how the downturn won't affect Intel, how they expect growth, etc.

    Cheap chips rule in a soft market and AMD has demonstrated the ability to produce wicked fast at cheap prices. This would seem to be the best evidence yet that Intel has lost it's way and the bureaucracy is in need of some serious house cleaning.

    Some blunders:

    Tying themselves legally to Rambus

    Talk of discontinuing the P3, their best mover.

    Pushing the 1.13GHz P3 out the door before it was ready and suffering the consequences.

    Slashing prices and subsidizing RDRAM just to move P4 product.

    The P4 may have some advantages, but imagine what it would be like if AMD had rolled it out... um hm.. It would have killed the Athlon alright, assuming the Athlon were Intel's. ;-)

    The truth is out there.

    --
    All your .sig are belong to us!

    --

    A feeling of having made the same mistake before: Deja Foobar
  90. Intel Compiler costs $???? by Arethan · · Score: 4

    Someone mentioned above that the Intel compiler is selling for a couple hundred bucks per license. I've been in the development market for a few years now, and I've used Intel's "optimized" compiler a few times already. It has a few flaws right out of the box, being that it will only work on Windows systems. It can act as a plugin for MS Dev Studio (which I must admit is a pretty slick IDE), but the bottom line is that Intel is charging money for something that they should be trying to GIVE away. If they want a leg up on the market, they should be making it VERY easy for developers to use their compiler when they build their applications. The result would be a lot more stickers on product boxes labeled "Optimized for Intel CPUs", making the cpu decision much easier for newbies.

    "Oh look, all of these games are optimized for Intel chips. They must be good!"

    Better yet, if they want their cpus to get on top of the server market, they should be releasing the source code for their compiler as well. This would let the gcc crew use the optimizations in their compiler creating better/faster *nix software. (Unix being the server platform of choice for more large companies I've worked with than I can shake a stick at. I won't get into why, as that will probably start a small war.)

    Bottom line, make the compiler free, and open the source, and Intel would definitely take off again.

    Until that day, though, I will stick with AMD since they have better prices for equal performance.

  91. Guys!! by 3-State+Bit · · Score: 2

    Everyone take this survey to get Dell to start offering AMD Athlon:
    http://www.dell.com/html/us/segments/dhs/intel_amd _survey.htm?keycode=6Vc00&DGV
    But go to a different page before you paste it in, so that they won't know we're all coming from slashdot. :)
    ~

  92. Problems? by FreeMath · · Score: 1
    What about AMD optimizations, I would assume AMD optimized code would run faster than any plain x86 code. And other compilers, yes most programmes are done in MSVC++, but I do all of my work(as well as most /.ers) with gcc. This only measures cpu preformance. With the 100Mhz bus and Rambus on the P4, and a 200Mhz bus with DDR on the Athalon, The Athalon will give you better overall preformance for less $$.

    Corrections (I'm sure there will be) accepted.

    --
    This sig intentionally left blank.
  93. You neglect one key element, though by ToneHog · · Score: 1

    The average consumer doesn't brainlessly make a computer purchase. Most of the time, a consumer will have a friend, coworker, or someone that they can consult with for hardware. If we, as hackers, are the lowest demographic, then our knowledge eventually propogates to these consumers, granted our advocacy is done appropriately.

    Case in point, Napster. We knew about Napster for quite a long time before it's popularity ramped. Word-of-mouth increased its popularity. Make sense?

    --
    Center bodied, omni-minded.
  94. Indeed... by RareHeintz · · Score: 1
    Well, unless this somehow fixes the problem where the chip slows down to half-speed when you run your computer for any useful length of time, my next box will contain an Athlon.

    OK,
    - B
    --

  95. honest question... by Eslyjah · · Score: 1

    where does powerpc stand in all of this? not trying to start a flame-war...just an honest question.

    1. Re:honest question... by Space+Coyote · · Score: 1
      The PowerPC (specifically the 74x0 aka G4 class, which is what I assume you're interrested in) is similar to the P4's situation in that it doesn't show its full potential without a compiler taht specifically takes advantage of the floating point capability of the processor.

      There is a vectorizing compiler available for the G4, but it is expensive and (afaik) isn't available for linux. Intel's vectorizing P4 compiler (the one tested in the article) is also quite costly.

      Apple has committed themselves to optimizing gcc for the G4s, since it's what they're using to compile Darwin/OS X. Intel hasn't mentioned supporting GCC, and it seems they would rather work towards being able to re-compile the linux kernel using their own (proprietary and expensive) compiler.

      --
      ___
      Cogito cogito, ergo cogito sum.
  96. impressive by Extimes · · Score: 1

    well, i thought it was pretty impressive to see the boost, whether or not that'll ever be realized in applications. that is some impressive performance

    --
    I want transparency effects. I want so much transparency, I can see the back of my monitor! http://www.andrew.cmu.edu/
  97. Re:The answer is by jsse · · Score: 1

    Thank you for your reply.

    Aye, I just wrote in the aspects that are not explicitly mentioned in the article I cited.

    Frankly I really miss Pentium Pro, it takes the best of of P6 arch. Well. :/
    &nbsp_
    /. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

  98. Re:Isn't the Pentium 4 in the P7 family? by jsse · · Score: 2

    well....
    in the summer of 2000 it tried to push the aging "P6" architecture too far. The P6 design, or 6th generation of x86 processor which since 1996 has been the heart of all Pentium Pro, Pentium II, Celeron, and Pentium III processors, simply does not scale well above 1 GHz. As the aborted 1.13 GHz Pentium III launch this summer showed, Intel tried to overclock an aging architecture without doing thorough enough testing to make sure it would work. The chip was recalled on the day of the launch, costing Intel, and costing computer manufacturers such as DELL millions of dollars in lost sales as speed conscious users migrated to the faster AMD Athlon.

    From the article I linked before.
    &nbsp_
    /. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

  99. Re:The answer is by jsse · · Score: 2

    Aye, you are right. That was what I always wondering....in theory SMT should kick CMP as CMP would lose processing power by its simplistic division algorithm. However, it turns out quite like the opposite....is it ture that SMT is not yet enable in P6-4, or CMP practically more feasible in real life? :)

    Thanks for the info.
    &nbsp_
    /. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

  100. Re:The answer is by jsse · · Score: 2

    Sorry I've over-simplied my argument which may cause misunderstanding.

    Typical instructions take more clock cycles to execute in Pentium 4(not P4). Longer and more pipelines doesn't mean more instructions can be fed and excuted in one clock cycle. Also the longer pipeline used in the Pentium 4, flow control operations (such as branches, jumps, and calls), the longer the time needed to fill up the pipelines.

    (reminder: it's very simplifed view)in theory the execution units can process 9 micro-opts per clock cycle, thanks to the problem in cache design, it can only feed 3 micro-opts per clock cycle.

    Pentium III's decoder can feed up to 3 instructions and 6 micro-ops (4+1+1) to the core per clock cycle.

    Pentium III is like a motorcycle engine in a motorcycle. Pentium 4 is like upgrading the same engine to run a bus.(just ignore it if you think the analogy is wrong ^_^)

    I might miss some points. Please comment.
    &nbsp_
    /. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

  101. The answer is by jsse · · Score: 5

    Can the P4 dethrone the Athlon

    No.

    Let me explain this way: Pentium III has 6 10-stage pipelines for out-of-order superscaler execution, while Pentium 4(avoid using short-form P4 - Pentium 4 is in P6 family) has 9 20-stage pipelines.

    More pipelines more stages sounds good huh? Unfortunately, in some benchmark tests Pentium III beats Pentium 4, it's due to the fact that Pentium will flush the entire pipelines during branch-misprediction/pipelines stall. As a result Pentium III would out-perform Pentium 4 in some occasion, as the latter tends to lose more instructions when branch-misprediction rate is too high.

    Althon, on the other hand, only flush 1/2 pipelines on averages. They really need to fix this fundamental design glitch before they could beat Althon.

    If you are very interested in this subject you can read this article. You can understand why Intel cannot giveup Pentium III in favour of the market of Pentium 4.
    &nbsp_
    /. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

  102. Re:Short Answer by sir_nas · · Score: 1

    oooh, guess i finally found out what the major cause of global warming is, thank you!

  103. Re:Anyone know when M$ VC++ will support SSE2 nati by geomcbay · · Score: 1

    Considering how long it took just to get support for the original SSE and 3dNow extentions in inline asm, I'm guessing it'll be quite a while yet.

  104. Re:One big problem by geomcbay · · Score: 2
    Your sig is quite funny considering your post is mostly garbage. Did you read the article? Intel chips actually perform fairly poorly on current Microsoft compilers. That's one of the primary arguments of the article. The P4 shines when used with (surprise) Intel's compilers.

    Microsoft's current compilers, while inferior to Intel's on the new Intel processors, are better with Pentium Pro style architecture than gcc, but that's just because of different development goals (gcc tries to serve everyone, Microsoft can focus on a much more limited set of CPUs). Its not a grand conspiracy or anything.

  105. Re:Intel should.. by geomcbay · · Score: 2
    I'm sure AMD knows intel's chip designs inside and out, almost as well as intel knows them, and vice versa. Its mostly patents and such that protect their "innovations" at that level.

    Nevertheless, like I said, I'd be shocked to see Intel open source their compiler...But I wouldn't be shocked (and I think it makes a lot of sense for them to do this) if they started giving away the Win32 binary for free (as in beer). Otherwise the majority of developers are going to keep using Visual C++ and/or Cygwin/gcc and Intel's chips are going to continue to look inferior to AMD's, even if that view is not entirely accurate.

  106. Intel should.. by geomcbay · · Score: 5
    Intel should consider giving their compiler away. Currently they charge hundreds of dollars per license for it. Considering their market in compiler tools is relatively small beans, you'd think it would make more sense for them to just give the compiler away to entice developers to use it and thus wind up with executables that really showcase the next-gen Intel processor's speed.

    I won't even get into the argument about how it might help them to Open Source the thing so that parts of the technology might be rolled into other compilers like gcc, because I just can't imagine that happening anytime soon.

  107. Re:Short Answer by minard · · Score: 1

    What "malfunctions" are we talking about here?
    And what's the deal with "overheating"? Neither of these has been reported, except for some ill-informed commentary on the thermal protection feature. Just to be clear:
    1. If you run an Athlon and a Pentium 4 side by side, running the same application, the numbers I have seen say that the Pentium 4 dissipates less power, not more.
    2. In the event that the temperature limits get exceeded, due to an inadequate thermal solution, the Pentium 4 thermal protection diode causes it to clock throttle to prevent it from blowing up. In the same situation, the Athlon fries itself.
    I have seen many contorted arguments presented by the rabid Intel-haters who post here from time to time attempting to cast this as somehow a negative for the Pentium 4. Can somebody please explain to me how this can possibly be so? If not, can we please stop posting this kind of misinformed or plain wrong FUD

  108. Re:Short Answer by minard · · Score: 1

    In the chipset maybe, not the chip. So you have to use a chipset that has the clock throttling feature. I'm reading the datasheets for the Athlon and Pentium 4 (you can get them from the relevant websites).
    Go back to my original point. I wasn't making any point about the Athlon, rather asking how this is somehow a negative for the Pentium 4. So how is it?

  109. I'll believe when I see it. by AnotherBlackHat · · Score: 1
    ... with future software (including SSE-2 optimizations) it will outclass the competition ...
    Future software is better known as vaporware.
    A year from now, we should be seeing 2 gigahertz stuff at the same price.
    The only important benchmarks are the ones we can run today.

  110. More meaningless numbers by bryan1945 · · Score: 4

    In this test A beats B, but in this test B beats A, etc. etc. All these different tests try to measure some specific performance parameter, but as hard as you try to standardize the rest of the equipment to isolate that one parameter, you just can't in the real world. And that is the true test- how well does the entire system run? You could slap a P4 3GHz onto a 33MHz bus (well, not really, but you get the point) and get the equivalent performance of a 3-toed sloth. That or the bus wires will glow.

    As for the SSE extensions, Intel tried this first back with MMX, and Apple is trying it now with AltiVec(sp?). Yes these extension can help, but only after software is optimized for them. It not a case of "drop 'em in and watch out!" It takes time to develop.

    Of course, all of this is just marketing. Kinda like the MHz wars. Intel needs some positive press after that oft quoted test where the P3 trounced the P4.

    --
    Vote monkeys into Congress. They are cheaper and more trustworthy.
  111. Intel Complier's Athlon Optimization by abumarie · · Score: 2
    if(cputype == ATHLON) {
    putz;
    putz;
    putz;
    wait(ALONGTIME);
    putz;
    putz;
    putz;
    do(operation);
    putz;
    }

    Using this processor specific optimixation for the Athlon chips, the Pentium 4 has managed to outrun the Athlon. Intel's compiler cannot be expected (realistically) to generated optimized code for the Athlon. Any of their comparisons based on their compiler should be highly suspect.

    --


    Sex is heriditary, if your parents didn't have it chances are good you won't either.
  112. Hold on a minute... by srvivn21 · · Score: 1

    Smaller RAMBUS latencies? It was my understanding that RAMBUS had masive latencies, but really good bandwidth. Did I miss something here?

  113. what about gcc? by s20451 · · Score: 2

    Interesting results! Looks like heavily optimizing one's compiler pays huge dividends in terms of processing power.

    There's an important question though. The article used the MS compilers exclusively, with the best results coming from the Intel plug-ins - since these are apparently the industry standards. However, I'm at a university, and everybody I know is using gcc. We would be very interested in the kind of performance that is displayed here. Does gcc keep rigorously up to date with the most modern CPU technology, or does it lag (and if so, how much)? How long until these optimizations will appear in a release of gcc?

    --
    Toronto-area transit rider? Rate your ride.
  114. One big problem by Jack+Wagner · · Score: 2

    The P4 is only optomized for a Microsoft compiler. It's true. I was doing some consulting work for a major fortune 100 company and they were looking into migrating from Windows to openBSD. However after they did some testing they found that their database applications were running 15 - 20% slower on BSD than Windows. I expected them to be a little slower due to the threading problems wiht BSD, but not that slow.

    After one full week of testing we found the problem wasn't with BSD at all, it was with the P4 on BSD. It would seem Intel has an enhanced instruction set cache which is only available on Microsoft compilers. This is not a trivila thing to implement so I doubt the OSS camp will be able to migrate it into their compilers anytime soon.

    --


    Wagner LLC Consulting Co. - Getting it right the first time
  115. Re:2 things by Gord888 · · Score: 1

    Consumers are naturally stupid. They watch TV and buy whichever is hyped more. Seeing as how Intel went on that month long Pentium 4 comercial blitz, I'd have to say that comsumers are probably gonna buy Intel because they don't know any better! Personally, bang-for-buck... i'd buy AMD.

    --
    -=-=- I don't suck... you blow. -=-=-
  116. Isn't the Pentium 4 in the P7 family? by SuperGrut · · Score: 1

    I thought it was Pentium - P5 Pentium Pro, Penium II and Penium III - P6 Pentium 4 - P7 It obviously has a different core than Pentium III.

    --
    The city is being overrun by a herd of Lucy Liu's.
  117. Re:This appears to be the typical load of slashdot by codehead78 · · Score: 1

    You don't think the nForce coming out for Socket A motherboards first will effect this? Intel isn't something that's tangible to you're average consumer, it's just a sticker on the front. If AMD systems are more than just a little faster than Intel systems, people will buy them... and there will be an nForce sticker on the front... next to an AMD sticker. Intel's postion is not the same a microsoft's, anyone can simply buy an AMD instead of an Intel without any hassle.

  118. processors heading in the wrong direction.... by Prolixium · · Score: 2

    I don't get it lately with processors. Why do we need all of this SSE, SSE-2, 3dnow, mmx stuff? Granted, I don't have a degree in computer engineering (yet.) but doesn't it seem that processors are becoming more and more proprietory?

    To me, it seems like we're moving toward a time where there will be different versions of o/s's for each processor. (myos for intel / myos for amd) It's going to be increasingly hard for vendors to be able to write code that will be optimized for all processors.

    Anyone else think this way? does this make sense?

  119. Short Answer by GreyOrange · · Score: 3

    If they manage to get there chip working properly yes, if it keeps on malfunctioning and overheating no.

    -------------------

    --

    Insert Witty Remark Here ===>____________________________
    1. Re:Short Answer by Valiss · · Score: 2

      Yeah AND if Intel lowers thier prices...

      --

      -Valiss
  120. but... by _avs_007 · · Score: 1

    The original poster didn't do any research, because in every test performed, the 1.7 ghz processor did not throttle back speed when crunching. The thermal throttling happens when the CPU is about to overheat. Not in normal operation, or even heavy operation. Just when the cooling fails.

  121. please.... by _avs_007 · · Score: 1

    Do us a favor, and at least compare the correct numbers. Either say 100 Mhz bus Athalon vs 100Mhz Bus P4, or 200Mhz Athalon vs 400Mhz P4... Do not say 200Mhz Athalon vs 100Mhz P4... You are just making yourself look silly...

  122. ehh? Thats dumb by _avs_007 · · Score: 1

    Normalize for clock speed? Why? Thats like in the car world comparing engines by comparing HP/Litre figures... Thats the dumbest comparison I've seen in my life... People always tout the engine in the S2000 gets 140HP/Litre, while the LS1 engine gets a measly 61 HP/Litre.. Nevermind the LS1 generates 345 HP, and WAAAAY more torque than the 2.0 Litre in the S2000... Than honda enthusiasts say to normalize for engine size, and that's why they use HP/Litre... They say if the S2000 engine was 5.7 litres is would produce 798 HP... That argument is rediculous, because just because the 2.0 litre variety makes 240HP, does not mean it scales linearly and a 5.7 litre variety will produce 798 HP... But more importantly, a 5.7 Litre S2000 engine DOES NOT EXIST so talking about it is silly... Just like a 1.7ghz Athalon does not exist, so why bother comparing a high end athalon to a middle end P4??

  123. You're missing the point by _avs_007 · · Score: 1

    I was just saying that comparing a non-existent engine to an existing engine for the sake of "fair" is dumb... However, to correct you... Smaller engine with more HP/Litre does not translate to smaller lighter car... What? You think those extra camshafts,valves, chains, weigh nothing? You think those extra camshafts don't take up extra space? Then again, you are comparing a 4 banger to a v8... My point is that the Nissan VQ 3.0 24 valve V6, Honda's 3.2L DOHC V6, and Toyota's 3.0 Litre DOHC V6 all are physically bigger and weigh more than GM's Pushrod 3800-Series II which displaces 3.8 litres... It displaces more volume, but the engine itself has a smaller footprint, but that's an entirely different thread. But anyways, my main point was we should be comparing existing things to existing things... Not existing things to vapor...

  124. Give me a break by _avs_007 · · Score: 1

    I'm typing this on a VAIO, which has the crappiest keyboard I've ever seen. I like my thinkpad better. Some of the keys on this keyboard stick...

  125. also... by _avs_007 · · Score: 1

    before I get any flames, I meant to say that some of the keys stick as well... Not that the reason for the misspelling is because the keys stick, though that happens too.... I was in a hurry, as I was typing it as I was working, and got ahead of myself and didn't bother checking as closely as I should've.

  126. SPEC benchs have profiling activated by mecachis · · Score: 1

    I still believe that P4 is better for scientific applications, what SPECfp tries to analyze. Nevertheless, something must be considered.
    Both, P4 and Athlon, use profiling to improve branch prediction. Clearly, P4 is more sensitive to this technology because the large pipeline.
    In real world, nobody does profiling (sad but true). Without profiling the differences are closer.

    --
    Never underestimate the power of human stupidity. - Lazarus Lon
  127. Crummy gasoline slows things down by standards · · Score: 1

    I know the computational computing farm we're building where I work is moving to the P4. For now, it's the our right choice. But for the typical user that doesn't run such a farm with very purpose-built software, the Athalon seems like the right answer.

    For most users, the P4 is currently like a big American V8 with crummy, watered-down gasoline - it just can't compete with the Athalon, which performs better with plain-old gasoline.

    Until and unless the software manufacturers purposely support the P4, the Athalon will be a strong contender... and the only contender given price/performance.