Slashdot Mirror


Can SSE-2 Save the Pentium 4?

Siloh writes "Ace's hardware has posted a Floating-Point Compiler Performance Analysis which, in a nutshell, tests Intel's most important claim about the Pentium 4. "It does not reach its full potential with today's software, but with future software (including SSE-2 optimizations) it will outclass the competition". They test with Floating point benchmarks which have been recompiled on the latest Intel and MS compilers." Basically, another iteration of the question: Can the P4 dethrone the Athlon?

26 of 171 comments (clear)

  1. The problem with Intel by Anonymous Coward · · Score: 5
    SSE-2 will be nice, but the problem with Intel is that they have fallen behind AMD in the CPU wars. Their stock price is only one of many indicators that they have made several bad business decisions in the past few years, and those decisions continue to haunt them and give AMD a leg up on the market. Consider:
    • The RAMBUS mess. They tried to leverage their chip/chipset monopoly to control the RAM market through large investments and contracts with RAMBUS. Now RAMBUS is on the brink of death and Intel has lost.
    • The IA-64 disaster. It's hard to launch a new architecture, and even harder when you keep prices high and don't put enough chips in the hands of developers.
    • The uniprocessor-only P4. Intel spent years perfecting SMP on their earlier processors, and for what? So that AMD could beat them to the punch, running a 1.4Ghz CPU in SMP mode. Intel also embraced the slower-but-cheaper shared memory bus architecture, which is going to kill SMP performance in comparison.
    • Unwise investments. Intel has invested in several dot-coms that are dying or dead already. Intel Capital hasn't been profitable since FY 1999 because they have sunk billions into companies like VA that could never hope to turn a profit.
    Intel still has potential but they will need to get their act together if they want to start competing with AMD again.

    -A former Intel employee

    1. Re:The problem with Intel by VAXman · · Score: 3

      The uniprocessor-only P4. Intel spent years perfecting SMP on their earlier processors, and for what? So that AMD could beat them to the punch, running a 1.4Ghz CPU in SMP mode. Intel also embraced the slower-but-cheaper shared memory bus architecture, which is going to kill SMP performance in comparison.

      You are wrong. The DP capable P4 (known as Xeon) was launched in May, and was launched well before the DP Athlon was released. Moreover, you can buy real dual Xeon systems from Dell, IBM, Compaq, and the like, yet you cannot buy a DP Athlon system from any major vendor, since no major OEM's want it.

  2. Morons by Anonymous Coward · · Score: 5

    Look at the final results:

    bestover2.gif

    Now look at the place where the P4 shows the most improvement over the Athlon: the first data point, Flops 8, with the P4 using the Intel compiler and the Athlon using Microsoft's.

    From the graph, the Pentium 4 clocks in at about 1140 flops while the Athlon gets only 900 flops.

    But wait! We're forgetting something. You're running the Pentium 4 at a faster clock speed! For the love of crumbcake, normalize those values for clock speed, please!

    Pentium 4: 1140 flops / 1.5 GHz = 760 flops/GHz
    Athlon: 900 flops / 1.2 GHz = 750 flops/GHz

    Now things are a bit more fair. Yes, with the absolute latest compiler from the maker of the processor, the Pentium 4 beats the Athlon in one of eight tests by a measly ten flops per gigahertz. With the latest compiler from some big software company, the Athlon beats the Pentium 4 in the other seven categories, hands down.

    Don't believe everything you read.

    1. Re:Morons by JoeBuck · · Score: 3

      It is ignorant to argue that you should normalize for clock speed. The Pentium 4's deep pipelines are present precisely so that the chip can be run at a faster clock speed than otherwise.

      With the exact same technology, same fabs, you can't make the Athlon run at the same clock speed as the Pentium 4.

  3. Hmm. Maybe i'm missing something, but -- by washort · · Score: 4

    Why wouldn't Intel be doing stuff like putting SSE-2 optimisation code into gcc so that all us hacker-types would have a _reason_ to pick the P4 over the Athlon? I know they have their own compiler but to the best of my knowledge it's not free (or at least it's not in Debian... ;-)
    Just seems odd that they'd pass up the opportunity for something like that. *shrug*

    1. Re:Hmm. Maybe i'm missing something, but -- by Chakat · · Score: 5

      Intel's working on a Linux compiler with all of the P4 goodness. Although it's in beta right now, you can bet your sweet butt your going to pay for it once the program gets out of beta. Intel may have good compilers, but they don't give 'em away

      --

      If god had intended you to be naked, you would have been born that way.

  4. .NET to the rescue by samael · · Score: 5

    It occured to me a while back that .NET while affect this immensely.
    Consider, .NET compilers compile to an intermediate code level that isn't actually transformed into machine code until they are run for the first time on the target machine.
    This means that all you have to do to get the most out of your machine is make sure you have the .NET IL->machine code compiler for your specific CPU and all .NET code will be totally optimised for _your_ CPU.

    Of course, this also means that you don't need to recompile to work on any CPU that has the CLR available on it, which makes transferring to IA64 (or any other architecture) a lot easier.
    _____

  5. It's A Different Thrown Now by BRock97 · · Score: 5

    Why bother? Every iteration of processors that comes out has some special optimization that is required to run at peak performance. If you use one or the other, it gets you a marginal performance boost. Sure the P4 can do magic if you turn on this compile flag, and then disable this other. Who cares? Things are fast enough now that price should be considered the king. Why spend $100 - $200 more for a processor when all it gets you is a few more frames at 1600x1200 in Quake3. Until the P4 comes down in price (and they are making big inroads for this), the Athlon will be king.

    Bryan R.

    --

    Bryan R.
    The price of freedom is eternal vigilance, or $12.50 as seen on eBay.....
  6. Thats not the question... by EvilJohn · · Score: 3

    The answer is yes, with SSE-2, it will beat the athlon into the ground. Check http://www.hardocp.com/reviews/cpus/intel/p417 out for more details.

    The real question is the Short lifespan on this P4. With Intel going to DDR (thank god) but changing socket types, how viable is a P4 at this point?

    Even gamers think about TCO.

    // EvilJohn
    // Java Geek

    --

    Less Talk, More Beer.
  7. Re:Strange?? by GroundBounce · · Score: 3

    Of course GCC is available for win2k; however it is very seldom (if ever) used for serious commerical applications. Yes, it is used for porting UNIX applications, and these applications tend to run more slowly than their native windows counterparts. I have nothing against Win2k, and I use Win2K as well as Linux and HP-UX, it's just that the performance of GCC on win32 should be relatively irrelevent to someone who uses Win2k exclusively (and his is sig implies that he uses Windows exclusively), except in the rare circumstance that they are porting a UNIX/Linux app, or are using GCC because it's free in which case they are probably not developing an ultra high performance application.

    On the other hand, GCC *does* matter for Linux. It is true that most apps run just fine on Linux compiled with GCC. But clearly newer x86 processors are becoming more specialized and there are applications where every drop of performance counts. I do large circuit simulations, and a 10% improvement could mean getting results hours sooner. For Linux to compete seriously in these areas the apps will have to be compiled with a compiler who's results can compete with what's available under win32.

  8. Re:excuse me but um... by be-fan · · Score: 3

    I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha? Although benchmarks claim to show factual information, I've always seen them as a bit biased.
    >>>>>>>>>>>
    Not everyone working on a scientific application is blessed to be in a huge project with infintely deep pockets. There are tons of college students/projects doing different types of scientific computing, and x86 provides a very good price/performance ratio for these users.

    --
    A deep unwavering belief is a sure sign you're missing something...
  9. Re:P4 can't dethrone Athlon in Linux by be-fan · · Score: 4

    However, Intel's C compiler is in Beta for Linux. Thus, apps that need vectorizing could simply pony up $500 for a license and compile with that.

    --
    A deep unwavering belief is a sure sign you're missing something...
  10. Re:The answer is by barracg8 · · Score: 4
    • As a result Pentium III would out-perform Pentium 4 in some occasion, as the latter tends to lose more instructions when branch-misprediction rate is too high.
    Your reasons for the P3 outperformning the P4 don't seem to make a lot of sense.

    Take a processor. It hits a branch instruction. While it is working out whether or not to take the branch, it keeps itself busy by executing instructions from one side or other of the branch. It gets it wrong, so when it realizes this, it throws away a bunch of work it has done. Hence branch misprediction it a Bad thing.

    Take a second processor, with more pipelines available for instruction issue. Again makes a branch prediction. Since it has more pipelines available it is able to issue more instructions while waiting for the branch to the calculated. Again it gets it wrong, and since it has been able to issue more instructions from after the branch, more are thrown away when it realizes a misprediction has taken place.

    The point it that while more instructions are thrown away, this is only because more have been issued, and therefore the fact that you have more pipelines in a new generation does not lead to that processor running slower than previous versions. The increased branch misprediction penalty can only diminish the amount of increased performance that the extra pipelines give you, and not lead to an overall speed decrease, right?

    G.

  11. excuse me but um... by joq · · Score: 3
    They also represent the majority of FPU applications. Most applications contain very few FDIV, but some scientific and engineering applications do.

    I may sound like a troll of sorts or anti Intel, but when it comes to high end scientific engineering does anyone actually use anything outside the realms of Sun, Irix, and Alpha? Although benchmarks claim to show factual information, I've always seen them as a bit biased.

    Typical PIV purchaser in my eyes: Gamer, Newbie buying preconfiged pc's. What about this end user where are the stats for the typical purchaser? Sometimes these benchmarks confuse the average person into thinking the PIV is lowly in comparison to others.

    In this article we will try to answer the following three questions:
    • How well will the Pentium 4 and Athlon perform with software that is compiled with newer compilers (MS Visual 7.0 and Intel C 5.0.1)?
      2.Can better compilers automatically create SSE-2 optimized code from simple C++ code?
      3.Can Pentium 4 aware compilers boost the Pentium 4's floating-point performance past the strong FPU of the Athlon?

    Again I may be off my rocker here, but most developers I've met have always customized their own machines, dual processors, other architectures, so again is it completely unbiased to say the PIV lacks? Förvirring om denna skit =\
  12. Marketecture by diablovision · · Score: 3

    It seems Intel may have bet the farm on Marketecture...20 stage pipeline to reach multiple gigahertz speeds, double pumped ALUs that run at twice core clockspeed, a trace cache of recently decoded RISC "micro-ops", and SSE2, almost 200 new floating point SIMD instructions that are supposed to give incredible performance. Yet the Pentium 4 has trouble against a lower clocked Athlon in many many benchmarks.

    Intel is the market leader, but they shouldn't let their marketing team design their chips!

    --
    120 characters isn't enough to explain it.
  13. Compiler vs processor by BradleyUffner · · Score: 3

    After reading the article it looks like Intel is much better at making compilers then it is at making it's processor. The article says that the intel compiler is a "masterpiece", and a work of genius. It looks to me like thier compiler is a lot more impressive then thier CPU.
    =\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\=\ =\=\=\

  14. CPU-specific optimisations by nickovs · · Score: 4
    It seems to me that the tests that were used to give the Flops crown to the Intel CPU are a little biased. Surely for a fair test the Athalon should have been tested with the latest experimental AMD compiler as well.

    As CPU designs get more complex the compliers need to know more ane more about the exact nature of the CPU. Despite the lable of binary compatability given to the CPUs from AMD (and others), those who need to squeeze the best performance out of machines are going to need to run code that is complied for their specific machine. Despite the best efforts of the open source community most end users do not want to recomplile source, let alone spend time finding obscure /QaxW flags to make the most of the system. Really this should be a job for the OS.

    Maybe in the future we will see commercial code being distributed in such a way that parts of the code are compiled on the destination machine as the code gets installed. That way the code vendor can test a variety of complier options and not have to ship 42 different binaries for all the different CPUs in use.

    --
    If intelligent life is too complex to evolve on its own, who designed God?
    1. Re:CPU-specific optimisations by geomcbay · · Score: 3

      The most likely solution for the short term is that developers will compile multiple versions of DLLs (or .so's in Linux/UNIX space) holding their hotspot code and use dynamic library linking to load in the right one after doing a CPU detection routine. This sort of thing is already being done to a certain degree with Windows based games that might support 3dNow, SSE, SSE2, etc.

  15. This appears to be the typical load of slashdot bs by ostone · · Score: 3

    Do you people pay attention... yes AMD is making better processors... yes people who really know choose AMD over Intel... therefor slashdotters choose AMD over Intel...

    But back to the real world: If you turn on a computer out there in happy fun land (aka "The Real World"(TM)) then odds are it will be a running Intel. Linux your precious kernel started out with optimized non-portable code for the i386. You geeks keep falling victim to the same trap year after year... just because it's better doesn't mean people will use it. Linux/BSD/Solaris/Irix/SVR4/MACOS/BeOS... is clearly better than Windows when you look at a track record... and yes in some cases can be *almost* as easy to use... but they have been winning the OS war since they made a *bad* ripoff of the Macintosh (read Xerox) GUI OS. MacOS was better, more stable, and quite cleaner... but Micro$oft had the market share and they won. People listen to money, and Intel is still the processor most people/companies would prefer buying. Hackers are one of the lowest demographics in the computing industry these days and people (outside of their community) don't pay much attention to them.

    Well, I guess thats it... go ahead return to your illusion and mod this down.

    --
    Remove *your pants* to send me email.
  16. Re:The answer is by andr0meda · · Score: 5

    it's due to the fact that Pentium will flush the entire pipelines during branch-misprediction/pipelines stall. As a result Pentium III would out-perform Pentium 4 in some occasion, as the latter tends to lose more instructions when branch-misprediction rate is too high.

    Rumours have it that PentiumIV will have Simultaneous Multithreading(SMT) enabled, which let's the processoor run any instruction from any thread on any unit at any time. Supposedly this feature was allready included in current processor designs but not enabled because the P6-4 is not ready for SMP yet.

    AMD uses On-chip Multiprocessing(CMP) in Sledgehammer, which is basicly the sames as subdividing the resources of the cpu (registers & units) between the threads. The benefit of this technique is that the design can be kept simpeler and the clock can go faster than a similar monolithic chip with the same resources. On the other hand, a lot of resources are wasted if only one thread is operational in this setup.

    Needless to say, SMT has some problems too, for example, CMP lends it self much better for branch prediction through Slipstreaming than SMT does. You can find some good reading in this previous slashpost about how intel and amd deal with multithreading on their single/multiprocessor designs. To be taken with a bit of salt of course, but very sharp.

    My point is that if branch prediction in the form of Slipstreaming is implemented (and Jackson Technology seems to be that kind of SMT), the P6-4 problems with the excessive cache flushing are completely over, and SMT can take full advantage of the smaller RAMBUS latencies, easily outperforming a similar CMP setup like AMD has.

    --
    With great power comes great electricity bills.
  17. Re:It's A Different Thrown[sic] Now by ackthpt · · Score: 5
    Agreed, the market is in a slump and people shopping for computers are going to be bargain hunting for some time. Even more rumblings about layoffs at the ever-optimistic Intel, despite yammering on about how the downturn won't affect Intel, how they expect growth, etc.

    Cheap chips rule in a soft market and AMD has demonstrated the ability to produce wicked fast at cheap prices. This would seem to be the best evidence yet that Intel has lost it's way and the bureaucracy is in need of some serious house cleaning.

    Some blunders:

    Tying themselves legally to Rambus

    Talk of discontinuing the P3, their best mover.

    Pushing the 1.13GHz P3 out the door before it was ready and suffering the consequences.

    Slashing prices and subsidizing RDRAM just to move P4 product.

    The P4 may have some advantages, but imagine what it would be like if AMD had rolled it out... um hm.. It would have killed the Athlon alright, assuming the Athlon were Intel's. ;-)

    The truth is out there.

    --
    All your .sig are belong to us!

    --

    A feeling of having made the same mistake before: Deja Foobar
  18. Intel Compiler costs $???? by Arethan · · Score: 4

    Someone mentioned above that the Intel compiler is selling for a couple hundred bucks per license. I've been in the development market for a few years now, and I've used Intel's "optimized" compiler a few times already. It has a few flaws right out of the box, being that it will only work on Windows systems. It can act as a plugin for MS Dev Studio (which I must admit is a pretty slick IDE), but the bottom line is that Intel is charging money for something that they should be trying to GIVE away. If they want a leg up on the market, they should be making it VERY easy for developers to use their compiler when they build their applications. The result would be a lot more stickers on product boxes labeled "Optimized for Intel CPUs", making the cpu decision much easier for newbies.

    "Oh look, all of these games are optimized for Intel chips. They must be good!"

    Better yet, if they want their cpus to get on top of the server market, they should be releasing the source code for their compiler as well. This would let the gcc crew use the optimizations in their compiler creating better/faster *nix software. (Unix being the server platform of choice for more large companies I've worked with than I can shake a stick at. I won't get into why, as that will probably start a small war.)

    Bottom line, make the compiler free, and open the source, and Intel would definitely take off again.

    Until that day, though, I will stick with AMD since they have better prices for equal performance.

  19. The answer is by jsse · · Score: 5

    Can the P4 dethrone the Athlon

    No.

    Let me explain this way: Pentium III has 6 10-stage pipelines for out-of-order superscaler execution, while Pentium 4(avoid using short-form P4 - Pentium 4 is in P6 family) has 9 20-stage pipelines.

    More pipelines more stages sounds good huh? Unfortunately, in some benchmark tests Pentium III beats Pentium 4, it's due to the fact that Pentium will flush the entire pipelines during branch-misprediction/pipelines stall. As a result Pentium III would out-perform Pentium 4 in some occasion, as the latter tends to lose more instructions when branch-misprediction rate is too high.

    Althon, on the other hand, only flush 1/2 pipelines on averages. They really need to fix this fundamental design glitch before they could beat Althon.

    If you are very interested in this subject you can read this article. You can understand why Intel cannot giveup Pentium III in favour of the market of Pentium 4.
    &nbsp_
    /. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

  20. Intel should.. by geomcbay · · Score: 5
    Intel should consider giving their compiler away. Currently they charge hundreds of dollars per license for it. Considering their market in compiler tools is relatively small beans, you'd think it would make more sense for them to just give the compiler away to entice developers to use it and thus wind up with executables that really showcase the next-gen Intel processor's speed.

    I won't even get into the argument about how it might help them to Open Source the thing so that parts of the technology might be rolled into other compilers like gcc, because I just can't imagine that happening anytime soon.

  21. More meaningless numbers by bryan1945 · · Score: 4

    In this test A beats B, but in this test B beats A, etc. etc. All these different tests try to measure some specific performance parameter, but as hard as you try to standardize the rest of the equipment to isolate that one parameter, you just can't in the real world. And that is the true test- how well does the entire system run? You could slap a P4 3GHz onto a 33MHz bus (well, not really, but you get the point) and get the equivalent performance of a 3-toed sloth. That or the bus wires will glow.

    As for the SSE extensions, Intel tried this first back with MMX, and Apple is trying it now with AltiVec(sp?). Yes these extension can help, but only after software is optimized for them. It not a case of "drop 'em in and watch out!" It takes time to develop.

    Of course, all of this is just marketing. Kinda like the MHz wars. Intel needs some positive press after that oft quoted test where the P3 trounced the P4.

    --
    Vote monkeys into Congress. They are cheaper and more trustworthy.
  22. Short Answer by GreyOrange · · Score: 3

    If they manage to get there chip working properly yes, if it keeps on malfunctioning and overheating no.

    -------------------

    --

    Insert Witty Remark Here ===>____________________________