Pentium 4 Re-evaluated, Again (Again)
An unnamed correspondent writes: "It looks like Tom's Hardware Guide has been busy with the P4. This time a re-compiled version of the MPEG encoder (the same one they benchmarked with in the last article) shows the P4 doing really well. Also interesting is the performance boost that even the PIII and Athlon procs get from the Intel compiler. Take a look at the article here." Seems that as usual, benchmarks are what you make of them. The P4 apparently can perform much better than initial tests have shown. Tom Pabst makes some good (if fawning) points about the complexity and fairness of benchmarking in general, too.
Programs using SSE2 instructions will need those instructions available when they run, elsewise bad things happen. But what a gain! If ever you get the chance to talk to anyone from Intel, say that you'd like to see more of this.
Guess what? We already have. The 486 had instructions that were not available on the 386; programs that use them cannot be run on the 386.
In short, the definition of x86 you seem to be using doesn't, and never, existed. Never have you been able to use all the instructions on post-386 processsors on 386es, any more than you could do so with the 286 or 086. x86 compatibility has always run the other way, and it's still 100% the other way -- all the instructions for the 8086 are available on a P-IV.
There's no "we" in team, only "me"
Hi folks,
I think it's kind of ridiculous that most folks don't understand the concept of benchmarks. It's common knowledge among hackers that benchmarks test specific aspects of performance, and can be made to show better or worse performance depending on what the benchmark author wants to say. Unfortunately, many folks (maybe not you but many other folks like you do this) base purchasing decisions on benchmarks and spend hundreds of dollars more than necessary on hardware they don't really need.
Being a programmer myself, I know just how flippin powerful even the "outdated" CPUs are. Recently, I have worked on the Pentium III, the Celeron, and an old 486 at 66mhz. Most of my recent works are prototypes built for ease of maintainence and clarity rather than performance, and if I may say so myself, they do perform extremely well, even on the 486. I'm sure there are areas in computing that a powerful workhorse CPU like the Pentium III or 4 is needed, but what most readers probably don't know is that there are literally thousands of mission critical, real-time computer systems out there that run on 4- or 8-bit computers at speeds like 1 or 2 mhz, and they get the job done. Every user action is carried out instantaneously. The ridiculous part is that most folks out there don't understand that a newer CPU won't get them better overall performance. The user still needs to wait for the hard drive to churn, the network card to accept incoming packets, and a thousand other things; besides, it's really the software algorithms and implementation that causes the performance, or lack thereof. (These are the reasons I don't like Intel's claim that their newest CPU will give the user a better Internet experience.) The only place a faster CPU will get you performance is in tight code containing nothing but intense computations. Most folks will think of games when thinking of intense computations. In this case, I agree that it is critical to play Quake at 230 fps rather than 200. :)
I apologise for being so blunt in my comment but I need to run out the door so I'm in a hurry, and I'm kind of frustrated at the things that happen because of marketing and "benchmarks" that don't really mean anything (at least to myself). I hope I was able to successfully convey my point without insulting anyone. Hopefully, someone can comment on this and either help me out or prove me wrong... I'm open to others' suggestions
Kind regards,
Nathaniel G H
The GPL requires that you be willing to give the source under a GPL
lincense to anyone who receives the binary. Tom would therefore be
entitled to the source. Unless you receive the binary, you would not.
Tom would then be entitled to get out the source, since according to the GPL, Intel could not restrict him from doing so.
--
Life's a bitch but somebody's gotta do it.
[Glove Slap]
I demand satisfaction.
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
I hate to agree, because (a) I don't like Intel [I think they're out to screw the consumer as much as possibly, instead of providing good value] and (b) I just hate to see yet another freaking instruction set, forcing everyone and their dog to upgrade to overhyped, overpowered machines when, for 90% of people, a Pentium 120 would be just fine (wordprocessing, email, web browsing; not much good for games).
m l for those who want to read it. Or click [this link].
AAAAAAAAnyway, I quote [The Register]:
"Reader John Welter of North West Group, a Canadian Geomatics firm specialising in orthophotography - stretching accurate photographs of the Earth's surface over elevation models of the same area - volunteered us some interesting information on his company's experiences with an early P4 system.
When using the original code, a P4 system took a glacial 19 hours compared with just under 13 hours for a 933MHz PIII. But with code recompiled to use SSE2, the P4 galloped through the test in a shade over seven and a half hours.
Outperforming Alpha
-------------------
"A P4 at 1.5Ghz is now faster when running optimised code then our Alpha production boxes by a sizable margin, where those same Alpha boxes outperformed all our P3 based systems.
"Intel did not take the x87 FPU performance as a prime design goal in the P4. They focused on the SSE/SSE2 unit much more and made sacrifices to the X87 FPU side of things to gain more SSE2 performance. Some may argue this was a bad trade-off but the improvements they have managed on the SSE2 are very impressive.
"Geomatics is extremely CPU intensive and pretty much 100 per cent bound by CPU performance. For this reason we obtained an early 1.5GHz P4 despite the inflated costs in an attempt to determine how much added performance it would give us in reducing our production times."
===
The article then goes on to describe the sweet 'puter setup they use, describes how SSE/SSE2 are an advantage in this particular case, and describes how AMD also plans to support SSE/SSE2 and more.
http://www.theregister.co.uk/content/3/14982.ht
--
--
Don't like it? Respond with words, not karma.
Intel must have signed up for more advertising....
That's why they asked Tom not to release it.
If he distributed it, then they would be obligated to provide the source.
I think their goodwill is probably more important to Tom (and the community) in this case. If they default on that, then Tom might as well distribute the program.
But until then, I'd rather he keep reviewing with their help.
---
pb Reply or e-mail; don't vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.
Is that compiler optimisation is probably going to count for more and more as Intel wring every last bit of performance out of x86. Linux distros like Mandrake, which currently comes in a version optimised for P5, could potentially have big performance benefits if you get the version compiled for your specific processor. Of course, the really good news is that Intel et al will have to take quite an active role in GCC development if they want to make their processor look great under free operating systems.
Unfortunately, what it could potentially mean though is that if Intel were to do some sort of special deal with a proprietry OS maker (MS for example) they could make that OS run far faster than any others, simply because it'd be compiled with a better optimised compiler.
-- Piracy is a vicitmless crime, like punching someone in the dark.
Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.
First Tom decided one way Then Tom decided another day Now Tom is undecided where have i seen this before? ahh yes the US Presidential election. is tom from florida? :-)
That's exactly my point, I don't recall it has ever been so apparent. SSE2 obviously improves performance tremendously, and I hope people realize that if we drop "x86" all together, we could have a nice little leap in performance :)
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
First we have the presidential election all screwed up, now we can't even get a verdict on which processor to use! What is going on here?!
Well, I'll take your word on it. I've never had a rectal surgeon get his head stuck up my ass(or have his head in my ass, even without it getting stuck). Thank you for sharing your experience, such that I might know exactly how much to avoid it in the future. :)
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Of course they read Slashdot - but do they actually listen to what the people here have to say, seriously? Maybe, but not as seriously as someone making a point to find a Intel engineer and then saying, "x86 sucks. Drop it, and make a real consumer-grade, non-x86 CPU. And I don't mean the Itanium, either."
In the first case, it'll go into the statistics - 300,000 people for a new architecture, 460,000 against. In the second case, it'll be, "Well, I met this guy and he was really pissed off that we're still using x86. He seemed to know what he was talking about, and he understood the difficulties involved. However, he really thinks that the sacrifices would be worth it."
Now, which will hold more weight?
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
I'm afraid that I have neither the time nor the bandwidth to explain. Go to Ace's Hardware, and read up on all the processor/architecture reviews. That's a good starting point.
:(
Fact is, Intel and AMD abandoned x86 to get real work done a long time ago. x86 is emulated on a modern processor, but at the hardware level. The core of the processor itself uses a different instruction set and format.
And like most things, x86 is just behind the times. Like all technology, tradeoffs had to be made. x86 was introduced way back when with the 386. It was designed to solve a specific set of problems in a certain way. Today we have a different set of problems, that also need to be solved in a different way. It's not that x86 was never good - it was very good at the time, and as evidenced by its long use, it had quite a bit of life in it.
Unfortunatly, the great strides that all the major chip manufacturer go through to pander to the x86 instruction set really cuts down on performance
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
The real problem is that the cpu makers doesnt give away a compiler for their cpu, or work with
free compilers to create good support for their cpu. Why arent these optimisations in gcc ??
If regular users can not get hold of binaries compiled with good compilers, or the good compilers to compile their own stuff, then their real life usage of the cpu will look worse than the one in the review. The reviews shouldnt be done with special equipment, that being hardware or software, or with the aid of engineers that knows one side only. It should be done with standart equipment so we, the normal users would know what to expect.
Intel, AMD, and other cpu makers, that being x86 or not, give away the compilers, and see your hardware shine, or help GCC getting good support for your CPU, which we, the normal users can benefit from.
ion++
ps: there might be other free compilers than GCC
Ok. You benchmark the Transmeta, a neat processor . It is light years ahead of the "magical underclocker" technology from Intel (slowstep ;)). hey are targetting lower power consumption in your laptop while in Word and other apps (which Win9x doesn't do because of not HLTing the processor). It's designed so that the only significant draw of power is the LCD (the HD spins down while idle, and the proc is self reconfiguring for greater efficiency).
Naturally, since it's not targetting performance, it benchmarks poorly. Do they (the various Quake 3 monkeys) rerun the benchmarks? No.
The Pentium IV comes out. People plug 'em in, benchmark them. They also suck. They benchmark them again, showing the suck by a larger margin. Then they benchmark them again, showing it's actually not such a bad suckage after all.
Isn't that just a bit of Intel favouritism?
--
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
For most regular software, it did not mean squat. It only mattered for software that was specifically set up to take advantadge of the hardware features.
There were certain Photoshop filters that were fantastic at certain settings, but choked when you used others. This is typical of new feature.
I was always amused by comparing processors running at the same clock speed. Typically, when you do that, the gain in performance is usually about 15% to 25% (YMMV) before you add in the differance in clock speed. All too often the clock speed is a huge factor in the performance boost, not just the design changes.
I think I'll go have another beer....
"It is a greater offense to steal men's labor, than their clothes"
That's the problem with real world benchmarks. MPEG-4 encoding is associated (by some) with piracy. On the other hand, the publication of (original) digital video might well become a common pastime in the next few years. In the mean time, it does provide a sort of real world application that (some) can appreciate.
I'm not sure that SPEC2000 is an appropriate solution. Most people don't care about the performance of a "quantum-chromodynamics" simulation, and are not involved in compuutational fluid dynamics. The integer simulations are a little closer to home (word processing, chess playing, perl...) but unless your "real world" approximates the "real world" the benchmarks are trying to simulate, the results of such benchmarks are difficult to appreciate.
I supect that to many people, a Quake/Unreal benchmark is much more valuable than SPEC2000 results.
All things Multimedia. HDTV will offer much higher resolution than DVD, at 1080i, the highest spec, I saw claims 6 months ago that the then highest end x86 chips could not do software decoding. Also, look at mp3 ripping and digital camera usage. These are all things that Joe Consumer are interested in. Given the incredible advances in digital camera technology, a multimegapixel camera that stores hundreds of imgaes may need significant cpu horsepower to convert to jpg or png, or just edit.
DVDs have been around for about 3 years now, and yet DVD decoding chips aren't standardized on motherboards. We can expect the same for HDTV. Software decoding is going to remain pretty popular, as DVD + big mhz/ghz sells in CompUSA whereas selling 400mhz + decoder card = educating consumers = good luck.
Does you opinion affect tides and harvests also?
I'm a loner Dottie, a Rebel.
It IS interesting to note that the recompiled version also helped the P3 and Athlon. It does seem to indicate the original was not compiled optimally.
.00935 per cycle, while the Athlon is just behind it at .00928, and the P3 behind at .00803. This makes sense - the P4 isn't really much faster than the existing technology (Athlon), but it does allow much higher clock speeds, which was Intel's goal all along. Bully for them, they got what they aimed for.
If you divide the benchmarks by the clock rate, you get a more objective view of the processors, independant of clock rate. By that measure, the P4 is turning out about
The small problem of course is that you don't get any of that bang with existing apps, which will undoubtably hurt P4 sales for a while. Strictly looking at bang-for-the-buck, the P4 is a poor choice at the current price points, but most new processors are. If AMD can get the Palomino (smaller,faster Athlon) out the door quickly, with SMP support, they could take a bite out of the server market that would have Intel rubbing their bottoms for some time.
_________________________________
Now!
If anybody tries to build render farms off P4s they will literally go out of business due to missing deadlines. The P4 pipeline is too deep, the whole thing is geared towards flashy performance on consumer codecs and it's not general-purpose enough to perform realworld EFX tasks to deadlines. The high end market will be avoiding this one- it's just too easy to push it out of its 'high performance zone' and make it bog down.
Dude, what the fuck good would a "Linux" FPGA do for anyone? Ohhh, I get DSP performance out of my kernel? Big fucking deal. Its the 3D rendering or Quake3 I want DSP performance out of. And more to the guts of your post. Most modern RISC processors out now have a large amount of specialization either in their instructions or processing units. MIPS binaries don't run on SPARCs now do they? RISC is a good chip architecture but it is no reason to thrash CISC merely because one CISC implimenter is adding beacoup instructions to their chips. Extending your instructions is a good way for people to get high level optimization out of a processor. If certain functions that are popular among an entire class of products are turned into chip level instructions, anyone writing code in that catagory gets to replace a large chunk of code with a single instruction. This is good for programmers as they have finite time to complete a project and writing/debugging a chuck of code takes alot longer than using some hardware optimization.
I'm a loner Dottie, a Rebel.
I'm not trying to knock Intel perse. My main machine is a P3 (Dell laptop, runs like a dream). But you have to wonder if the cost warrants, in this case, the extra 3 fps in compression.
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
Intel, AMD, and other cpu makers, that being x86 or not, give away the compilers, and see your hardware shine, or help GCC getting good support for your CPU, which we, the normal users can benefit from.
Intel does offer their VTune compilers for sale, as they must in order to legally use them in the SPEC benchmarks where they perform so well. Unfortunately, there are widespread complaints and accusations that they are buggy and tempermental and fail to compile much code that works just fine with gcc, VS, etc. The charge that Intel gets its SPEC scores with compilers which are so optimized that they aren't robust enough for every day use has tarnished Intel's very impressive SPEC scores among some. I haven't ever tried to use VTune so I can't comment as to whether this is FUD or not. It is worth noting that VTune is much much faster than anything else in SPEC, yet rarely used in practice, so there must be something wrong with it.
But Intel does also help other compiler makers incorporate optimizations. I know they specifically work with Cygnus to optimize gcc, and would assume they do the same with MS. AMD also works with compiler makers to get support for 3DNow. (For market reasons--i.e. they will always have smaller market share--AMD designed the Athlon to perform well on P3-optimized code, and thus there is not so much to be gained by including K7 optimizations over and above 3DNow. The P4, on the other hand, is very different from both of them and needs a recompile to perform well, as these numbers demonstrate.)
One of the staed goals of the P4 though is to really whomp ass in terms of clock speed which is something I think it can be agreed thay've done. They needed to put something out that could do two things 1) make the home PC buying drool and drivel over the word gigahertz and 2) do things people wanted to do quickly. If you have a processor that really flies at video rendering and sound processing Dell and Gateway stick FireWire ports in it and classify it a multimedia editing PC to get eyes turned away from iMacs. Non-techies see the numerical difference between 1.2 and 1.5GHz and their eyesbrows go up because they understand that the higher number MUST be better. You're very correct though in saying render farms of these things will sink anyone. raytracing and picture editing use algorithms over and over again and are best kept in the cache until served with a cold soup. Coppermines and Willamettes have tiny caches that make it difficult to keep more complex (read larger) algorithms in the L2. When you buy an SGI Onyx system the processors have pretty heavy cache sizes which facilitate repeticious functions like this. If businesses want/need "high-performace" they ought to be buying Xeons if they want to stick with IA32 or maybe the UltraSPARC3. Quality hardware + quality software == justifiable cost.
I'm a loner Dottie, a Rebel.
The software they used was open source, dude.
So onece again we receive the ultimate proof on what benchmarks count for... Whatever independent or dependent the testers are, they can fall into crass errors, if they risk their final word without weighting all factors. And even doing that does not save them from getting burned by some nightrunning hacker, a last minute adding or a dumb tool.
Sincerly I think that we have enough of these benchmark judgements. Playing the game of "the judge" is what benchamark tests should get rid of. Frankly only after a set of benchmarks is run for some time and all levels tested/contacted/patched/retested, then people should take judgement. Until then no benchmark can be taken as a veredict. So, everytime someone tells about things like Linux suxx and everything else rulez, first check if the penguin horde shrinks, then read for a month ZDNet without missing a day, then check the mass media, benchmark sites, testers, then check if freshmeat's submissions lowered, then check what your friends/colleagues/neighbors say. If everyone says that Linux still suxx then you may take for granted the first benchmark. If not then the guys have gotten a check from M$. But until then don't forget to recompile the kernel so that it fits what you really have on your comp. That's the best test benchmark you may do for yourself...
Like you can recomplile all your windows applications, not.
You don't need to.
For 90% of users out there who need the processing power at all, the only thing that matters is the graphics driver, because it's games that are sucking up the CPU time. Graphics driver upgrades are released fairly regularly.
For the rest, it's MPEG CODECs. I'm sure if your favourite CODEC's site posted an update that ran 50% faster, you'd download it; thus, I don't think upgrading it will be a problem.
The (relative) handful of people doing heavy-duty image processing or rendering will likewise be upgrading to the next version of their software package at some point, which will contain SSE2 code.
The OS itself doesn't need a recompile. Neither do your office applications. Where is the vast pile of software that needs to be recompiled?
Urls:. html
http://www.nvidia.com/Home.nsf/nvidianews2.html
http://www.tomshardware.com/cpu/00q4/001120/p4-20
Logically, then, we must all avoid attempting any tasks that the P4 doesn't like ;P
but, you people love big numbers..
The opinions in this post are ficticious. Any similarity to actual opinions, real or imagined, is purely coincidental.
is that we've got so many different CPU's and branches within types (how many x86 extensions are there now?) that compiler optimizations don't come close to keeping up. It's all well and good to have these fancy chips but if you don't have compilers to take advantage of the big registers and special opcodes, what good does it do? Linux is ported to lots of different architectures but doesn't necessarily take full advantage of them...take a look at linux distros for UltraSparc - they're still basically 32bit in userland. *sigh* Competition may be good for consumers but it's not so great for developers who have to support 10 million diverging platforms.
In Soviet Russia, hot grits put YOU down THEIR pants.
But how about 3d games? That's the mass market where performance really matters. Invent a better, cheaper, BSP-tree processor and you'll corner the CPU market.
What a wonderful example of "keep running the benchmark until it looks fast." Or, consider this snippet of dialog:
Readers: Tom, how fast is the new Pentium 4?
Tom: How fast do you want it to be?
Call me a cynic, but I doubt that even this obvious debacle will convince anyone of the pointlessness of benchmarks. (Five years of Apple saying the PPC is "twice as fast as the Pentium" sure hasn't.) Let's go back to using MIPS, at least everyone knew that stood for "Meaningless Indication of Processor Speed"
Any sufficiently advanced civilization is indistinguishable from Gods.
First of all, I'd like to congratulate the author of Flask(the MPEG4 encoder used in the benchmark). You can't buy publicity like this, and I bet your app just got a whole lot better ;)
I also think we should take note of something. *THIS* is the promise of moving to a new architecture, beyond x86. A program compiled to be backwards-compatible right down to the 386 will NOT be able to use SSE2 instructions, nor any other fancy bells and whistles(like 3Dnow! and plain 'ol SSE). At least, as far as I can tell(I think pgcc can use more advanced instructions and still run on older CPUs).
In essence, Intel is moving away from x86, albeit slowly and painfully. SSE2 is obviously a good technology, but an incompatible one. Programs using SSE2 instructions will need those instructions available when they run, elsewise bad things happen. But what a gain!
If ever you get the chance to talk to anyone from Intel, say that you'd like to see more of this.
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
FlaskMPEG (http://go.to/flaskmpeg) is a project written by Alberto Vigata and whose source code is available under the GPL http://www.citeweb.com/flaskmpeg/docs/gpl.html
(As an aside, Alberto has been extremely busy of late, and the project has gone a little stale, but it is by no means abandoned, and he has collaberated with several authors to forward the development of FlaskMPEG, though it is slow going)
Intel has taken this source code, produced a modified binary, and distributed that binary to a third party (Dr. Thomas Pabst). Now, the question is, where is the source code? They are obligated under the terms of the GPL to release it, and so far they havene't. Additionaly, they hint that they don't want it distributed, by asking Dr. Pabst not to make the recompiled version of FlaskMPEG available. Is this a violation of the GPL? Probably. Will Intel get away with it? I'd like to see them not, but they probably will.
I'm surprised no one commented on this before, Slasdot goers a usually more on the up than this.
P.S. The DiVX codec is *not* SSE/SEE2 or 3DNow! optimized, though it does have MMX optimizations. How do I know this for sure? Because DiVX is just a copy of the Microsoft MPEG4v3 codec that has been modified with an assembler/debugger to allow the playback of MPEG4v3 streams inside an AVI, and to stamp streams it creates with the FourCC code of DIV3 instead of MPG4. It wouldn't have been needed at all if Microsoft hadn't artificially restricted the codec from creating or playing back AVI files, instead tying it to the ASF format, and therefor to a Windows only platform. Can we say 'Embrace and Extend (just enough to break compatibility)'? I knew you could...
...if anything, something as CPU intensive as MPEG compression will probably improve significantly better than linearly with CPU power)
...
Nothing can improve *more* than linearly with clock speed (which I assume when you say "CPU power"). Linear increase is the upperbound. Often the increase is lower due to memory (and other) bottlenecks.
the Athlon gets 9.28 frames per GHz, whereas the P4 gets 9.35 frames per GHz.
but it shows the basic idea here -- that most of the P4's advantage is due to its speed in MHz, not the architecture
The P4 is faster (but not by much), normalized, as you said yourself, so why is the advantage due to its speed in Mhz? You defeated your own argument.
The original purpose behind the P4 is to be able to crank up the clock speed. Looks like they have reached their goal and even increase the IPC by just a little. So it looks like this will be a win for Intel. When the Athlon reached its max clock speed, and the P4 continues to crank up its speed all the way up to 10Ghz, you'll see.
AMD was notably slower in New Mexico as well, in fact it was really too close to call.
Meahwhile, AMD officials were quick to point out that the tests had already proven the "will of the program", which showed AMD ahead by a large margin. Intel was smug as they responded "We will wait for the final benchmark, which will show that we are the faster x86".
Transmeta was unavailable for comment.
Hammer of Truth