Can SSE-2 Save the Pentium 4?

Re:Anyone know when M$ VC++ will support SSE2 nati by Anonymous Coward · 2001-06-28 23:18 · Score: 1

This reminds of some comments made by the Great Carmack. Games written with SSE or 3dNow optimizations didn't really benefit from the extra code. What made games go faster is having those optimizations built into the video card's drivers.

Tired... by Anonymous Coward · 2001-06-28 23:18 · Score: 1

* Of people making line graph instead of bar graph
* Of people that don't understand most of what they write, so they put all the data, instead of focusing on the important one
* Of over-verbose hardware sites that make you scan 5 pages before getting on the (rotten) beef
* Of clueless people that pretend being surprised when optimizing compilers on very specific code can get a 240% speedup.

Btw, this sort of shit reminds me of someone:

"Paul Hsieh, our local assembler guru, analyzed the assembler output of the SSE-2 optimized version of Flops. He pointed out that "some of the loops are not fully vectorized, only the lower half of the XMM octaword is being used." In other words, SSE-2 instructions which normally operate on two double precision floating point numbers are replacing the "normal" x87 instructions and are only working on one floating point number at time."

Anyone think that "Paul Hsieh" == "Bob Ababooey" ?

Cheers,

--fred

Re:Tired... by Galactic+Avenger · 2001-06-29 03:39 · Score: 1

> Btw, this sort of shit reminds me of someone:
>
> "Paul Hsieh, our local assembler guru, analyzed
> the assembler output of the SSE-2 optimized
> version of Flops. He pointed out that "some of
> the loops are not fully vectorized, [...]
>
> Anyone think that "Paul Hsieh" == "Bob
> Ababooey" ?

Well certainly not me! Who or what is "Bob Abobooey"?

--
Paul Hsieh
http://www.pobox.com/~qed/

Re:Morons by Anonymous Coward · 2001-06-29 00:19 · Score: 1

It makes no sense to look at flops/GHz. The P4 design, via its longer pipelines, *intentionally* sacrifices flops/GHz so that the chip can run at a higher clock rate.

The only sensible metric is performace at the available clock speed, which for P4 is higher than for Athlon.

If I had a CPU that achieved 5000 flops/GHz but only ran at 1 MHz, would you want it, or would you want the 1.5 GHz P4?

The problem with Intel by Anonymous Coward · 2001-06-28 22:49 · Score: 5

SSE-2 will be nice, but the problem with Intel is that they have fallen behind AMD in the CPU wars. Their stock price is only one of many indicators that they have made several bad business decisions in the past few years, and those decisions continue to haunt them and give AMD a leg up on the market. Consider:

The RAMBUS mess. They tried to leverage their chip/chipset monopoly to control the RAM market through large investments and contracts with RAMBUS. Now RAMBUS is on the brink of death and Intel has lost.
The IA-64 disaster. It's hard to launch a new architecture, and even harder when you keep prices high and don't put enough chips in the hands of developers.
The uniprocessor-only P4. Intel spent years perfecting SMP on their earlier processors, and for what? So that AMD could beat them to the punch, running a 1.4Ghz CPU in SMP mode. Intel also embraced the slower-but-cheaper shared memory bus architecture, which is going to kill SMP performance in comparison.
Unwise investments. Intel has invested in several dot-coms that are dying or dead already. Intel Capital hasn't been profitable since FY 1999 because they have sunk billions into companies like VA that could never hope to turn a profit.

Intel still has potential but they will need to get their act together if they want to start competing with AMD again.

-A former Intel employee

Re:The problem with Intel by VAXman · 2001-06-29 01:31 · Score: 3

The uniprocessor-only P4. Intel spent years perfecting SMP on their earlier processors, and for what? So that AMD could beat them to the punch, running a 1.4Ghz CPU in SMP mode. Intel also embraced the slower-but-cheaper shared memory bus architecture, which is going to kill SMP performance in comparison.

You are wrong. The DP capable P4 (known as Xeon) was launched in May, and was launched well before the DP Athlon was released. Moreover, you can buy real dual Xeon systems from Dell, IBM, Compaq, and the like, yet you cannot buy a DP Athlon system from any major vendor, since no major OEM's want it.
Re:The problem with Intel by MrBogus · 2001-06-29 00:45 · Score: 1

Intel has contracts which require "second source" for their technology. That's how AMD became a licenced producer of Intel tech to begin with, and agreement that continues even today.

--

When I hear the word 'innovation', I reach for my pistol.
Re:The problem with Intel by nekid_singularity · 2001-06-29 00:24 · Score: 1

Intel wants AMD around because it gives some credibility when they say they are not a monopoly. Say AMD went out of buisiness tomarrow, that means Intel would be the only source for high-performence x86 chips, and the government wouldn't like that.

--
Numbers 31:17,18 Now kill all the boys. And kill every woman who has slept with a man,but save for yourselves every virg
Re:The problem with Intel by Shortcut+to+CmdrTaco · 2001-06-28 23:09 · Score: 1

That's true but they are well on their way. When they come out with DDR-supporting chipsets they will be on the right track again. Look out AMD!

Morons by Anonymous Coward · 2001-06-28 23:13 · Score: 5

Look at the final results:

bestover2.gif

Now look at the place where the P4 shows the most improvement over the Athlon: the first data point, Flops 8, with the P4 using the Intel compiler and the Athlon using Microsoft's.

From the graph, the Pentium 4 clocks in at about 1140 flops while the Athlon gets only 900 flops.

But wait! We're forgetting something. You're running the Pentium 4 at a faster clock speed! For the love of crumbcake, normalize those values for clock speed, please!

Pentium 4: 1140 flops / 1.5 GHz = 760 flops/GHz
Athlon: 900 flops / 1.2 GHz = 750 flops/GHz

Now things are a bit more fair. Yes, with the absolute latest compiler from the maker of the processor, the Pentium 4 beats the Athlon in one of eight tests by a measly ten flops per gigahertz. With the latest compiler from some big software company, the Athlon beats the Pentium 4 in the other seven categories, hands down.

Don't believe everything you read.

Re:Morons by cdipierr · 2001-06-28 23:58 · Score: 1

True the Athlon is faster on a per clock basis, but it's a fair comparison to compare the fastest Athlon vs. the fastest P4 since they're both obtainable (although actually the fastest is 1.4 Athlon, 1.7 P4, so we're comparing 1 speed grade down or so).
Re:Morons by JoeBuck · 2001-06-29 00:01 · Score: 3

It is ignorant to argue that you should normalize for clock speed. The Pentium 4's deep pipelines are present precisely so that the chip can be run at a faster clock speed than otherwise.
With the exact same technology, same fabs, you can't make the Athlon run at the same clock speed as the Pentium 4.
Re:Morons by csbruce · 2001-06-29 02:19 · Score: 2

the Pentium 4 clocks in at about 1140 flops

Wow, 1140 flops. With some tight code, my VIC-20 would be competitive with this!
Re:Morons by csbruce · 2001-06-29 02:22 · Score: 2

It is ignorant to argue that you should normalize for clock speed.

A better way to normalize would be bang/buck.
Re:Morons by maraist · 2001-06-29 04:14 · Score: 2

I disagree. This is definately the case with a 486 to Athlon comparison, but we're already taking into account the architectural differences (stages / pipe, etc). Part of the analysis is to monitor efficiency. This is especially true with the Pentium4 / Athlon debate since we can get 1.4GHZ Athlons.. The question is whether to purchase a 1.4GHZ Pentium 4 at significantly higher cost; to say nothing of the added cost of a 1.7GHZ setup.

The difference is more dramatic between the P5-4 and P5-3, since you max out at about 1GHZ for the P5-3, and so I'd be inclined to believe you. The Athlon, however is not yet out of steam for its current. If it can best the P5-4 in 50% of the categories (including legacy apps.. e.g. modern ones), then the value of the P5-4 is limited, even if it can produce top-notch synthetic scores.

The point is that it is not ignorant to normalize, so long as you look at the periferal factors. It's like having taking the average, but also taking the standard deviation. You do find useful information from such numbers.

-Michael

--
-Michael
Re:Morons by randombit · 2001-06-29 07:52 · Score: 1

A better way to normalize would be bang/buck.

Really. I read some review for DVD/MPEG-4 encoding (on Tom's, I think), which basically decided that all for all the Duron 800 was the best CPU if you wanted to do that stuff a lot. I liked that: it's not the fastest, but it was 70 or 80% of the speed of the fastest while being 1/N as much.

I mean, if all you care about is having tons and tons of Flops, go buy an Alpha or R10000 (or IA-64 (<g>)). Performance isn't everything, at least not for me.
Re:Morons by frantzen · 2001-06-29 03:08 · Score: 2

Clock speed is only relevant to marketing droids and those stupid enough to believe them.

The are processors (UltraSparc III) where the core pipeline is not clocked (called wave pipelining). There are caches that are double pumped; they do work on each edge of the clock instead of only latching on one edge.

And an even clearer fact: different processors do different amounts of work per edge of the clock. If you want a _really_ high clock rate, put only one gate between each latch. That clock rate would be obscene. But half of the work done would be latching the values (assuming you could distribute the clock over so large an area).

If you want to normalize anything, normalize over price. Unless you have stupid friends and compete over having the highest clock.

Oh ya. Don't bother talking about FLOPS or MIPS. You'll just end up sounding stupid (and you need all the help you can get). Any benchmark not targetted to YOUR specific application is next to worthless.

Heh, some processors don't even bother to dispatch NOPs. With a little hackert, they could ``execute'' as many NOPs per clock as the depth of their dependancy issue window.
Re:Morons by jsse · 2001-06-29 01:32 · Score: 2

Hey man, be fair, don't just take the graph in favour to your conclusion.

How about this, this and this?

Don't believe everything you read.

Assumed you believe everything in aceshardware, do you believe the graphs above? :D
&nbsp_
/. / &nbsp&nbsp |\/| |\/| |\/| / Run, Bill!

p4 ddr chipsets by vipw · 2001-06-29 06:55 · Score: 1

the p4 ddr chipset (i845) doesn't perform anywhere near as well as their rambus ones. also the ddr mode in that chip won't be working for several months. word on the street is that via's upcoming p4 ddr chipset is a pretty good performer, but nothing much has been published on that, via doesn't have a clear license on the bus, and via chipsets are often buggy. so really, they won't be having ddr chipsets any time soon, and chances are that the chipsets will be terrible performers. amd has to watch out more for a drop in rdram prices more than anything else.

Re:ehh? Thats dumb by Trepidity · 2001-06-29 16:13 · Score: 2

Europeans don't build smaller-engine cars to be more energy efficient, they build them because many EU countries tax engines by the volume they take up. So they make smaller engines but with higher compression ratios, so they end up being about the same in efficiency.

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

Re:The answer is by RelliK · 2001-06-29 01:11 · Score: 2

and SMT can take full advantage of the smaller RAMBUS latencies

huh? Rambus has much higher latencies than SDRAM. That is why P3 with PC133 SDRAM outperforms the same P3 with Rambus on most benchmars. Since you got this part wrong, I take it the rest of your post should be taken with a grain of salt as well.
___

--
___
If you think big enough, you'll never have to do it.

Intel better be careful... by HiredMan · 2001-06-28 23:38 · Score: 2

The only thing that separates the Itanium from the rest of the pack is it's FP performance. If the P4 gets better FP performance it'll show the results of the multi-year Merced project for the dog it really is.

The 800Mhz Itanium has the same SpecInt performance as a 800Mhz PIII... if the 1.7Ghz P4 got only 20% faster SpecFPU performance it would match the Itanium in SpecFP performance to go with it's already 50% better SpecInt performance.

Yeah I know the Itanium is only at 800Mhz but Intel needs to keep cranking out P4s to fend off the Anthlon - they can't afford NOT to release new chips even if the 2Ghz P4 shames their new "top-o-the-line" server chip.

Sure the Merced has a better box around it and huge amounts of onboard cache, but given the same surroundings the P4 would make their VERY expensive "server" chip look pretty bad...

=tkk

--
Bill Gates - Creationist?!?

Hmm. Maybe i'm missing something, but -- by washort · 2001-06-28 22:53 · Score: 4

Why wouldn't Intel be doing stuff like putting SSE-2 optimisation code into gcc so that all us hacker-types would have a _reason_ to pick the P4 over the Athlon? I know they have their own compiler but to the best of my knowledge it's not free (or at least it's not in Debian... ;-)
Just seems odd that they'd pass up the opportunity for something like that. *shrug*

Re:Hmm. Maybe i'm missing something, but -- by csbruce · 2001-06-29 02:08 · Score: 2

Intel may have good compilers, but they don't give 'em away

Well, they should, and they should open-source them as well. Intel is primarily in the business of selling processors, not compilers, so getting their P4 performance optimizations into as many third-party compilers should be their top priority.

Better general compiler support for the P4 would be an effective way to compensate for its hardware inferiority to the Athlon.
Re:Hmm. Maybe i'm missing something, but -- by biohazard99 · 2001-06-28 23:18 · Score: 2

A little Karma-whoring, swiped from intel's site
Compatible with Microsoft* Visual C++* and Visual Studio*, the Intel® C++ Compiler is designed from the silicon up to let developers easily take advantage of the performance and features of the latest Intel® architecture, including the Pentium® 4 processor.
Intel is committed to customer support. See www.intel.com/software/products/prodsupport.htm for further information on product support.

Windows*NT*/98/2000 Full Product Electronic Delivery $399.00
Windows*NT*/98/2000 Full Product CD Delivery $499.00
Windows*NT*/98/2000 Upgrade Product Electronic Delivery $175.00
Windows*NT*/98/2000 Upgrade Product CD Delivery $275.00
Intel® Compilers for Linux* Field Test Intel® Compilers for Linux, field test versions, are available for download only. No CDROM versions are available. Not all of the GNU C language extensions, including the GNU inline assembly format, are currently supported and, due to this, one cannot build the Linux kernel with the beta release of the Intel compilers and the initial product release. The C language implementation is compatible with the GNU C compiler, gcc, and one can link C language objects files built with gcc to build applications. However, the C++ implementation uses a different object model than the GNU C++ compiler, g++, and due to this, C++ applications cannot use C++ object files compiled by g++. For further details, see the FAQs on the support site. Before using the compiler, we recommend you read Optimizing Applications with the Intel® C++ and Fortran Compilers for Linux to learn about the appropriate optimization switches for your application. You should have received the invitation letter that explains how to get started using the Intel compilers for Linux. All support issues, compiler updates, FAQ's and support information will only be available when you register for an account on the Intel Premier Support site. Please register for a support account at http://support.intel.com/support/go/linux/compiler s.htm. To begin the process of downloading...
Click Here!

--
Read my plan to save the Bengals
Re:Hmm. Maybe i'm missing something, but -- by Grishnakh · 2001-06-29 00:36 · Score: 1

Intel would never do any such thing, because they really don't care about Linux. I work here at Intel, and this is definitely not a pro-OSS environment at all. The concept of making their tools freely available in order to promote processor/hardware sales is completely foreign to this Microsoft-worshipping company and its employees. The same way Microsoft will never make their tools freely available in order to promote sales and market penetration of their OS and other software, Intel would not make their compilers freely available either. The difference that they're missing is that MS has enough market share and a large enough monopoly that they can get away with this, and make a lot of money on compiler sales. No one's going to buy the Intel compiler, certainly not OSS types, and not MS developers either (they already paid for VC++; why would they spend extra for another compiler? They just want to sell software; they don't care about making it run better on one particular processor which isn't even doing well in sales).
Re:Hmm. Maybe i'm missing something, but -- by Chakat · 2001-06-28 23:10 · Score: 5

Intel's working on a Linux compiler with all of the P4 goodness. Although it's in beta right now, you can bet your sweet butt your going to pay for it once the program gets out of beta. Intel may have good compilers, but they don't give 'em away

--
If god had intended you to be naked, you would have been born that way.

Re:This appears to be the typical load of slashdot by SEE · 2001-06-30 11:20 · Score: 2

Yes, odds are it will be an Intel, but AMD is looking like it's going to have 30% marketshare this year.

And the difference in processors doesn't change what you can do with the computer (while things like changing OS does). The better analogy here is Dell beating Compaq which beat IBM.

Even the suits listen when you say "This runs everything the Intel does, as well as the Intel does, for less" enough times
Steven E. Ehrbar

Re:.NET to the rescue by PD · 2001-06-28 23:13 · Score: 1

Wow. This is completely unlike Java. Microsoft is really innovating here. Just think how fast interpreted code could run if you optimize the interpreter. I wonder why Sun hasn't thought of that? I'm going to send them an e-mail right now with my suggestion.

--
If tits were wings it'd be flying around.

Re:.NET to the rescue by PD · 2001-06-29 01:04 · Score: 1

Ah, for fuck's sake. My article wasn't a troll. It was either sarchasm, or if your sarchasm detector was broken, I suppose it could pass for a flame.

But a troll? Come on. (eyes roll)

--
If tits were wings it'd be flying around.

.NET to the rescue by samael · 2001-06-28 23:03 · Score: 5

It occured to me a while back that .NET while affect this immensely.
Consider, .NET compilers compile to an intermediate code level that isn't actually transformed into machine code until they are run for the first time on the target machine.
This means that all you have to do to get the most out of your machine is make sure you have the .NET IL->machine code compiler for your specific CPU and all .NET code will be totally optimised for _your_ CPU.

Of course, this also means that you don't need to recompile to work on any CPU that has the CLR available on it, which makes transferring to IA64 (or any other architecture) a lot easier.
_____

--
My Journal

Re:.NET to the rescue by JohnZed · 2001-06-28 23:18 · Score: 2

Actually, it's entirely unlike JVMs on the market today, because .NET does not include an interpreter. It always compiles the code natively before running. It's more like a TowerJ or JOVE Java environment.
--JRZ
Re:.NET to the rescue by mjprobst · 2001-06-29 02:08 · Score: 1

There's another possible conspiracy attached to this . . .
They can make _real_ sure that there are instructions on their Pentiums that behave slightly differently, and that the CLR doesn't generate valid or efficient code for AMD processors.
Re:.NET to the rescue by neuneu2K · 2001-06-28 23:39 · Score: 1
well, hotspot compiles only the most used parts of the code, in my experience it works perfectly well for most apps...
In fact the real slowness of java came not from the application code but from two things:
- the loading of Swing (horrible... Swing was loaded by the normal classloader and was bytecode-validated at each load !)
- and the BAD implementation of AWT...
Of course for small apps, the loading time is all-important and any interpretor overhead is BAD... but for long running apps (server side is a perfect case :-) the adventage of dynamic recompilation and "auto-profiling" is GOOD !

I do not know the details of the .NET CLR but I hope (in reality I do not mind because i doubt that the platform will be ported to any non microsoft OS now that the breakup is void) it has an interpreter, otherwise, dynamic class loading will be much too slow. Self-modifying code would be impossible too (and Yes, I have made self modifing code in Java !)

so much for PC's by um...+Lucas · 2001-06-29 01:10 · Score: 1

Remember when software was labeled "requires IBM or 100% compatible PC"?

Just in the main stream, how many variations are we now or soon facing?

Pentium w/ MMX is the lowest common denominator...
Intel's SSE instructions
AMD's 3D-NOW!
Aren't there separate instructions in the Athlon, like 3D-NOW2, or something?
Now we're heading towards two different x86 64 bit implentations (yes, IA-64 isn't actually x86 anymore, but since they're bolting an x86 processor onto the silicon as well, it may as well be counted as one)...

Either developers will continue as they've been doing, writing software for the lowest commmon denominator, which makes all of intel's and AMD's attempts to add features to their processors useless efforts, which ulitimately just cost us more money since they can't manufacture as many chips per wafer, or else we're going to start seeing "Windows/Pentium 4", "Windows/AMD", "Windows/64-bit AMD" and "Windows/Itanium" sections in compUSA and such....

ANd before the oblicatory comment arrives, i'll state that no, i really would not like to compile my own software, which would be possible if everything ni the computing world was open source/GPLed/etc...

32-bit FP or 80-bit FP? High end guys need more by The+Optimizer · 2001-06-29 01:03 · Score: 2

One thing I don't see mentioned here is what degree of precision that SSE-2 has. I'm guessing that it only works on 32-bit floats.

The SSE instructions on the P-III operate on 32-bit float, while the x87 FPU instructions work on 80 bit floats ( You can load 32-bit, 64-bit and 80-bit floats into the FPU registers and they are all expanded to 80-bits). Intermediate FPU results are computed/stored with 80-bit values. For SSE I believe (I could be wrong) that everything is 32-bit internally and register wise.

For scientific and engineering, 32-bits of floating point (7-8 digits of precision) just doesn't cut it. Most people I know doing that kind of work on a PC (well, both of them) use the FPU but not SSE for that reason. They have apps that take days to perform a single calculation - lots of time for accumulated precision errors to become a factor.

32-bit floats are currently enough for most 3D-graphics work (at PC resolutions), and those games ^h^h^h^h^h apps are probably a bigger consideration in driving mainstream CPU development. Given that the SSE/2 instructions have multiple math units to perform ops in parallel, there has to be a big transistor savings to have less precision.

I would bet that the FPU floating point precision on those Sun, Irix, and Alpha boxes is higher than 32-bits.

-Mp

Re:32-bit FP or 80-bit FP? High end guys need mor by The+Optimizer · 2001-06-29 02:56 · Score: 2

64-bits, Cool. Hey, I said it was guess. :-)

For 3d apps that's an interesting trade off: More precision at 2 data items or more throughput at 4 data items.

That still doesn't invalidate the point about precision for scientific and engineering applications, and understanding that it may be a factor in deciding what systems to run said apps on.

-Mp

Wrong Hardware by BRock97 · 2001-06-29 01:31 · Score: 2

Actually, yes, I have. At my current place of employment, we use four 650 Quad Xeon with 2 Gig of RAM a piece, each with an Adaptec RAID controller on it with 128MB of memory. They grind to a halt, being barely usable, but probably a lot like your situation, that is what we have to use. Another division has 1 Sun Enterprise server doing the equivalent and the thing doesn't break a sweat. Sounds like you, along with some of what we do, are using the wrong hardware for the job. Why use x86 when there is much faster hardware out there for vector crunching?

Bryan R.

--

Bryan R.
The price of freedom is eternal vigilance, or $12.50 as seen on eBay.....

It's A Different Thrown Now by BRock97 · 2001-06-28 22:45 · Score: 5

Why bother? Every iteration of processors that comes out has some special optimization that is required to run at peak performance. If you use one or the other, it gets you a marginal performance boost. Sure the P4 can do magic if you turn on this compile flag, and then disable this other. Who cares? Things are fast enough now that price should be considered the king. Why spend $100 - $200 more for a processor when all it gets you is a few more frames at 1600x1200 in Quake3. Until the P4 comes down in price (and they are making big inroads for this), the Athlon will be king.

Bryan R.

--

Bryan R.
The price of freedom is eternal vigilance, or $12.50 as seen on eBay.....

Re:It's A Different Thrown Now by p0six · 2001-06-28 23:10 · Score: 2

While I'll agree with you that price does make a big difference, don't forget that branding is important too. Intel, with the Pentium (tm), has one of the strongest brands out there, probably on par with big names like Coca Cola. That is one of the reasons that Intel continues to have a big market share even though Althons have been higher performance + lower cost.

So, in a sense, Marketing is King.
Re:It's A Different Thrown Now by Foxxxy · 2001-06-29 09:38 · Score: 1

I agree... my experience with the P4 during everyday work has been non-impressive... as far as I can see, my Athlon 1.2 destroys the P4 1.3 I have at work in normal day to day use including MP3 encoding and digital video rendering... With the price of a P4 I don't understand why people insist on buying them
Re:It's A Different Thrown Now by markh1967 · 2001-06-29 20:32 · Score: 1

Don't count on it being more than you need for very long. Moore's Law still stands true and we'll probably see 5GHz machines as entry level in under four years. I really hope they release a 4.77GHz system. That would be a milestone.

--
Input error. Replace user and press any key to continue.
Re:It's A Different Thrown Now by rseuhs · 2001-06-29 02:55 · Score: 1

That is one of the reasons that Intel continues to have a big market share even though Althons have been higher performance + lower cost.
The main reason is that AMD does not have enough fabs to produce for the whole market.
So in a way, the slowing market is a great chance for AMD...
Re:It's A Different Thrown Now by Derkec · 2001-06-29 04:36 · Score: 1

My personal bet: By the time SSE2 use becomes widespread enough for it to be important, AMD will have it on all their chips. AMD is bright enough to see when instruction sets have to be implemented to stay competitive.

Thats not the question... by EvilJohn · 2001-06-28 22:47 · Score: 3

The answer is yes, with SSE-2, it will beat the athlon into the ground. Check http://www.hardocp.com/reviews/cpus/intel/p417 out for more details.

The real question is the Short lifespan on this P4. With Intel going to DDR (thank god) but changing socket types, how viable is a P4 at this point?

Even gamers think about TCO.

// EvilJohn
// Java Geek

--

Less Talk, More Beer.

I think you are mistaken by cartman · 2001-06-29 05:51 · Score: 1

Take a second processor, with more pipelines available for instruction issue. Since it has more pipelines available it is able to issue more instructions while waiting for the branch to the calculated.

He was referring to pipeline length, not width. In a 20 stage processor at the same clock rate, it takes longer to fill a pipeline and consequently the branch misprediction penalty is worse.

Suppose you have two processors, each at the same clock speed. One has a 5-stage pipeline, the second a 20-stage pipeline. Suppose that there is a branch every 6 instructions (which is typical). For every mispredicted branch, the first processor need only throw away 4 instructions, but the second 19. If most branches were mispredicted, it would kill the second processor.

Pipeline length and clock speed are closely related design parameters. Longer pipes allow faster clock rates (because less is done per cycle per stage), but they increase the branch misprediction penalty. Generally there is a "happy compromise" for a processor, between pipeline length and clock speed. Most recent chips have found that happy medium to be around 10 stages. The Pentium-4 is unusual in the regard that it has 20 stages. Branch prediction therefore becomes extremely important.

Long pipelines tend to benefit Floating Point code more than Integer code, because FP is more loop-intensive, and the branches are therefore more easily predicted. This is why the P4, with its extremely long pipelines, performes poorly on integer performance compared to the PIII, but well on FP.

Re:2 things by BilldaCat · 2001-06-28 23:32 · Score: 2

how is this flamebait? you cannot seriously claim that AMD has overtaken Intel in the average consumer's mind.

--
BilldaCat

Strange?? by GroundBounce · 2001-06-29 00:32 · Score: 1

Why does this matter so much if you're happily running Win2K?

Re:Strange?? by GroundBounce · 2001-06-29 04:59 · Score: 2

The Intel Linux compiler will be optimized for the P4 and so there will be at least one compiler up to the job. It will cost money, but if you are really after top performance, you will probably not let a few hundred dollars stand in the way. It appears Intel is trying to make it compatable with gcc (and eventually g++), so ultimately (though not with the beta) you can link in your high performance modules with the vast array of existing libraries that have been compiled with gcc.

The interesting thing will be to see how well gcc becomes optimized for the Itanium processor, since Intel's long term plans are really to push this as the future workhorse of high performance computing. Since gcc must start over from scratch with this architecture anyway, maybe it will start out more optimized than gcc for x86, which has had to work with everything from the 386 to the P4.
Re:Strange?? by GroundBounce · 2001-06-29 01:31 · Score: 3

Of course GCC is available for win2k; however it is very seldom (if ever) used for serious commerical applications. Yes, it is used for porting UNIX applications, and these applications tend to run more slowly than their native windows counterparts. I have nothing against Win2k, and I use Win2K as well as Linux and HP-UX, it's just that the performance of GCC on win32 should be relatively irrelevent to someone who uses Win2k exclusively (and his is sig implies that he uses Windows exclusively), except in the rare circumstance that they are porting a UNIX/Linux app, or are using GCC because it's free in which case they are probably not developing an ultra high performance application.

On the other hand, GCC *does* matter for Linux. It is true that most apps run just fine on Linux compiled with GCC. But clearly newer x86 processors are becoming more specialized and there are applications where every drop of performance counts. I do large circuit simulations, and a 10% improvement could mean getting results hours sooner. For Linux to compete seriously in these areas the apps will have to be compiled with a compiler who's results can compete with what's available under win32.
Re:Strange?? by be-fan · 2001-06-30 04:20 · Score: 2

Actually, I multi-boot Win2K and Linux. I've been using Linux since Slack 3.5. Should teach you something about looking at the .sig (or the screename!) rather than the post. As for why I care, I was just curious. I do lots of graphics type applications and a good compiler can really speed up matrix processing (which lends itself to pipelining quite well).

--
A deep unwavering belief is a sure sign you're missing something...
Re:Strange?? by Diomedes01 · 2001-06-29 00:50 · Score: 1

Because GCC is certainly available for Win2K - one of it's strengths is availability on so many platforms! Sheesh... jumping down someone's throat because of their .sig is pretty lame. At any rate, to answer the original question, GCC's strength is definitely not speed and/or optimizations. I believe that the GCC team concentrates on having solid support for many different processors, at the expense of speed. I doubt that this will change in the forseeable future, but honestly, anything I've ever needed to compile with GCC has run just fine. Given the current speed of desktop processors, the difference isn't even noticeable.

-------

--
"To hope's end I rode and to heart's breaking: Now for wrath, now for ruin and a red nightfall!"
Re:Strange?? by Diomedes01 · 2001-06-29 02:37 · Score: 1

I agree, I would never use GCC for a performance critical application. For every-day userland-type stuff, it's fine. For large-scale data processing it certainly isn't the way to go.

Regarding the whole sig thing, I can see where you're coming from, but just because he's using Win2k on a desktop doesn't mean he doesn't use GCC for development at work or on personal projects...
At any rate, there is definitely a need for a more optimized compiler under Linux. With Intel releasing their compiler for Linux, this is a small step in the right direction. Unfortnuately, GCC will probably never reach the optimization level that the vendor compilers are at. I would love to see someone write a specialized x86-optimized Linux compiler; maybe use the parsing code from GCC, but redo the code generation. Maybe someone like IBM could get the ball rolling on this in order to show some real support for commercial applications on Linux.

-------

--
"To hope's end I rode and to heart's breaking: Now for wrath, now for ruin and a red nightfall!"

Re:please.... by toofast · 2001-06-29 00:50 · Score: 2

Actually, you could spell it as "Athlon" rather than "Athalon", and you would be much more credible.

Re:excuse me but um... by JBv · 2001-06-28 23:57 · Score: 1

Not really.

Many people (myself included) use cheap pcs to do number crunching for scientific porposes.

Normaly I use the low end machines, like my home PC (linux duron 900), to develop and test the code I will put to run on alphas.

I haven't made any calculations, but i suppose that for poor labs with many sudents, the cost of an alpha (for example) could finance >2 "lower end" systems which are also cheaper & easier to maintain and upgrade.

Still a war worth winning by AlpineR · 2001-06-29 02:20 · Score: 1

Sometimes speed is still king. I recently bought a computer for running floating point intensive simulations. A large part of the cost in my research isn't the expense of the hardware but the expense of my time. So I got the fastest system I could put together. I wanted dual processors and preferred a Dell machine so I was already stuck with Intel CPU's. The only question was whether to go with P-III's or spend $1,000 more for dual P4's. All of my searching on the Web showed that P4's are no better than P-III's for floating point calculations, so I went with the dual P-III system. Intel would now be $1,000 richer if I were aware that the P4 really could perform much faster.

By the way, I run Linux and compile with g++. Does anybody know if the GNU compiler does a good job of processor-specific optimizations?

There are more uses for computers than playing games and reading Slashdot. ;-)

AlpineR

Re:what about gcc? by SpinyNorman · 2001-06-28 23:43 · Score: 2

IMO gcc's optimization is generally weak. gcc doesn't have any MMX/SSE/SSE2 support, and even without considering vertorization it produces code that's around 20% slower than the Intel compiler.

gcc 3.0 apparently has an entirely new x86 back end, but from comments I've heard it produces code that's around 5% SLOWER than the old back end... It'd be nice to see some comprehensive benchmarks of gcc 2.95 vs 3.0 though.

There's a very interesting open source SIMD compiler project (mainly focusing on MMX) at Purdue university:

http://shay.ecn.purdue.edu/~swar/Index.html

Re:This appears to be the typical load of slashdot by SpinyNorman · 2001-06-29 02:37 · Score: 2

Did you check what's been in all the high street GHz+ computers for the last year? Maybe P4 is makign a showing now (at least it's made it to the TV shopping channels), but for a least a year you couldn't even find a high end Intel PC retail - because they didn't have a GHz processor that worked (remember the Intel 1GHz - recalled after about 2 weeks).

AMD is also kicking Intel's ass in Europe, and are expected to continue gaining worldwide market share (from current 20%+ to close to 30% by end of year.

Most consumers don't know enough to make a technical decision anyway - they're going to buy what's cheapest or what their college student geek son/daughter advises.

Re:P4 can't dethrone Athlon in Linux by srwalter · 2001-07-01 03:33 · Score: 1

You forget, however, that Intel's compiler does not support many GCC extentions, specifically the all-important inline asm extension. Without this, the compiler has no chance of compiling the kernel. Not to mention that only GCC is supported.

==================================================

--
Freedom is the freedom to say that 2 + 2 = 4

Re:Uh... Hemos? by oops · 2001-06-29 00:35 · Score: 1

(Yeah, yeah, I know you meant "iteration." But any computer geek who doesn't know how that term is spelled deserves some ribbing.)

Spelt. The word you want is spelt

Re:Uh... Hemos? by oops · 2001-06-29 04:49 · Score: 1

http://www.dictionary.com

spelt is the past participle
spelled is the past tense.

or at least it was when I did my O-level.

Re:excuse me but um... by Buck2 · 2001-06-29 00:07 · Score: 1

We just bought 12 1.4 GHz Athlon machines with
1.5 GB RAM each for $10k for neural computations.

We could have gone with Sun, Irix, or Alpha if
we wanted one machine with 2-4 processors.

I looked into it. It wasn't going to happen.

--

As my father lik@(munch munch)... ....

Re:One big problem by chrysalis · 2001-06-28 23:06 · Score: 2

A stock OpenBSD installation is compiled for 386. Did you recompile the kernel and the whole source code with pentium3 optimizations ?

-- Pure FTP server - Upgrade your FTP server to something simple and secure.

--
{{.sig}}

Re:32-bit FP or 80-bit FP? High end guys need mor by eric17 · 2001-06-29 02:07 · Score: 1

One thing I don't see mentioned here is what degree of precision that SSE-2 has. I'm guessing that it only works on 32-bit floats.

You guessed wrong. SSE2 can operate on 2 64 bit floats in parallel.

Intel on the right path? by kgasso · 2001-06-28 22:58 · Score: 1

It seems Intel is on the right path to giving the Athlon a run for it's money... I'm vaguely reminded of how quickly many companies/software developers/etc. picked up support for 3dNow! (likely due to the large number of customers and potential customers with AMD K6-2/K6-3 chips).

AMD had a fairly large number of developers promising 3dNow! support, and seemed to be doing the "right thing" by helping developers optimize their code.

It seems Intel has picked up on this, and has made it easy to optimize for SSE-2 with their own compiler plugin for VC. I'm just curious if this breaks AMD optimizations.

This is definitely a move in the right direction for Intel, though. I don't necessarily like it though, because I'm an avid AMD fan. :D

Re:Problems? by be-fan · 2001-06-28 23:16 · Score: 2

As of yet, Intel's compiler is the only optimizing game in town. Even AMD uses Intel's compiler when giving Athlon benchmarks.