Pentium 4 Under Linux
A reader writes "I just ran across this article over at LinuxHardware.org that reviews the Pentium 4 under Linux. It gives a lot of insite as to why anyone would want to buy a Pentium 4 and has some great clips from Alan Cox and Jan Hubicka (from the GCC team). Very thorough job."
GCC is hardly the appropriate tool to compare processor performance.
You want to use the best-available compiler for each processor. One that has optimisations for the specific hardware. Often it's one supplied by the vendor.
Sorry, GCC doesn't qualify there, except for specific vendor-tweaked versions. GCC is a swiss army knife. Good enough in most cases but almost never the best.
Slashdot ought to consider local mirrors of the pages that they link to. They're already set up to handle the bandwidth; it would be courteous and not too difficult for them to do so. (Or haven't you ever heard of `wget -r`?)
I presume you're not talking about Sprint's Integrated Network System Interface Terminal Equipment, eh? I'd recommend "insight".
Alex Bischoff
Alex Bischoff
HTML/CSS coder for hire
Actually, most applications of today are, if you are lucky, optimized for the Pentium Pro. Getting most software to support P4 will take about three to five years. Until then, you have to live with the fact that other processors perform better.
War is one of the most horrible things a human can be exposed to. And one of the worlds largest industries.
It appears the power of Linux Hardware is no match for the power of a good, old-fashioned Slashdotting. ;)
Chas - The one, the only.
THANK GOD!!!
Chas - The one, the only.
THANK GOD!!!
Idiot savants, as I understand it, are mentally
retarded yet can perform some skill extremely
well. So, I supposed that it was spelling
in this case (essentially memorizing letter
sequences). Is this not a correct view of the
disorder?
And hey if my C-64 would have had net access
instead of a 300 baud modem I would never have
upgraded.
-Kevin
how the hell is that a wasteful design? It means that there is no support for the P4 arch. Get real.
It was a good troll until this part: You may wonder why I know this. Let's just say I have inside knowledge of Intel products. :-)
All of your "facts" are public information.
Actually, most applications of today are, if you are lucky, optimized for the Pentium Pro.
The same thing happens on Suns. People complain that they didn't see a big jump in speed from UltraSPARC-II based machines to US-III based... when the vast majority of programs are still compiled for simply SPARC32 i.e. not even Ultra 1 optimizations!
Vendors take forever to optimize their products on the newer architectures.
To stay a little more on topic:
I did some tests encoding the same wavs to mp3s on a P2, P4 and Athlon system, and the 1.4GHz Athlon was a good deal faster (about 16%).. the Athlon had DDR RAM and the P4 has RDRAM (and 1Gig vs the AThlon's 512 Meg). What's even sadder is the Athlon was reading the wav file via NFS and the P4 was using local disk... Ultra-160 SCSI disk (IDE on the Athlon). Of course, CPU is more important then I/O in such a situation.
- My favorite error message: xscreensaver, running on an old Sparc 5 w/ 8bit color: bsod: Couldn't allocate color Blue
Rate this up. Let the test results fall where they may, but if they're going to compare to a P4 with RDRAM they need to use DDR 2100 RAM on their 266Mhz FSB Athlon.
Until AMD starts sticking the extremely popular processor ID#, Intel may be the only chip "allowed" to run the "reliable" Micro$oft OS's.
We're all looking forward to that day, aren't we?
Hmmm: Intel == Windows.
AMD == Free O/S ?
As Ace's Hardware discovered, the best way to optimize is to use Intel's latest beta compiler. But you can't use this compiler to compile Linux, because Linux uses gcc-specific extensions to C that the Intel compiler does not support.
It is for this exact reason that libc (and other key libraries) and kernel modules on Solaris can have platform specific optimized versions.
/usr/lib/libc.so.1 but at run time some of the functions actually get run out of: /usr/platform/`arch`/lib/libc_psr.so.1
eg: My program links against
The `arch` in this case isn't limited to sun4c,sun4m,sun4u,sun4u-us3
but can actually be a fully platform spec like SUNW,Ultra-Enterprise-10000
Yours was not the only one, i noticed SEVERAL posts that were modded down when they shouldn't have been. It looks to me that some moderator was simply on crack or something and marked a lot of posts as offtopic when they weren't, redundant when it was only one, and other unfairly moded posts. Thankfully there are a lot of moderators to conteract the idoit ones.
I think what must happen (probably alot) is that someone will recieve moderator access, not understand what it is and end up clicking the nifty boxes to see what will happen. Other times, I'm sure people get bored with moderating and simply burn the points of to no good purpose.
Fascism should more properly be called corporatism, since it is the merger of state and corporate power.
Except the PIII is a PII with a faster clock. The PIII should of been a lot faster on the same opimizaitions, since the core did not change at all. Guess what, it was faster. Are you sure you weren't think of the Pentium and the Pentium Pro / PII?
That's corporate, marketing English, used to confound any effort to draw real, litigable meaning from an advertisement. :)
mefus
--
um, er... eh -- *click*
mefus
In Open Society, GPL Software frees YOU!
There's a really long thread in the archives (some of it is still going on), but this message starts in the middle. The 16 byte stack alignment is on by default.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
Did you actually run benchmarks on a P4?
I didn't know "price/performance ratio" and "real-world application" were in the consumer's heads. hence the whole 1.7GHz P4 we're talking about ;)
11 was a racehorse
12 was 12
1111 Race
12112
That's actually a Dell problem. Check out how much crap they've loaded up by default. When I bought my Inspiron 8000 laptop from them (PIII-850), it took WinME a full 35+ seconds to load from start to finish. For kicks, I did a fresh install with WinME before I installed Mandrake, and it loaded from start to finish in under 10 seconds. Who leaves the default install from a manufacturer, anyways?
Interested in open source engine management for your Subaru?
What is wrong with the chipset? We can use the same one that Apple use(IE: The one made by motorola. It is a long time ago apple made their own chipset)
Intel is not dead, nore will they ever be.
When it comes to large corporations, buying large scale servers they ONLY BUY INTEL. Intel has the coorporate market monopolized and will continue to. AMD is JUST NOW breaking into the Multiple CPU market, and it takes some time to optimize that.
That said, I am an AMD fan. I own a P4 1.7 and an AMD 1.3... I love both machines, they both are faster than I will ever need. Theres no reason to say that AMD will kill intel because the cold hard facts are that they not only wont, but they cant. Intel is in other markets, not just CPUS.
Regards, Ryan McAdams
I presume you meant to write any Australian who writes 'optimize' will appear challenged, since you said that organise was preferred.
The Quake 3 "source" excludes the renderer, Quake VM, and networking code -- the most "interesting" parts of the game. It's just enough for you to write a mod with, but Quake 3 engine itself is hardly "open source".
There's 10 types of people in this world, those who understand binary and those who don't.
Maybe now, but as the article indicates,: "Keep an eye on what the kernel and GCC teams produce though. A couple of releases here and there could really turn the tides on AMD."
If the P4 is a great system for an avid gamer it will be THE system for desktop usage...games can and, I think often, dictate what hardware people buy for their home systems.
I'll agree that Intel's strategy of going after clock cycles above all else is tragic. Similarly, A/I/M is chasing after that integer peformance with great ferocity; their claims that their systems are fastest (which they properly scratch down to only an integer benchmark in the fine-print disclaimer, of course) depend on it.
Now, if only I could buy a G4-based system from someone other than Apple, who presumes -- apparently with moderate accuracy -- that nifty translucent plastic will make the phrases "price/performance ratio" and "real-world application" disappear from consumers' heads... =)
Is there someone out there selling G4 motherboards with standard form factors and accessory support at a competitive price point? Otherwise, there's no basis for comparison.
I seem to remember similar statements and conversations when it came to the benefits/detriments of the PIII over the PII.
---------------
[Darth]Snowbeam
I am Lord Snowbeam. Heed my call!
So all that fancy, schmanchy spelling I learnt in college was just a waste? Dangit. I hate it when they change the rules just after I finished something!
Stop the brainwash
Yes, Intel does the very same thing. I should probably have made that clear. I was just refuting the claim that AMD chips do not do this.
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Wow. I guess I'd better buy lots of RDRAM then, since Intel says it's great. I guess I'd better stop buying Athlons, since Intel says Pentium 4 is better.
The emulators guy explains in detail why the Pentium 4 sucks, with examples, so we don't just have to take his word for it. Could you summarize those examples in one sentence for us too?
Did you know if the L1 cache on the Pentium 4 was increased, the latency also increases? Did you know that the higher latency would hurt performance more than the additional cache?
The Athlon has a much larger cache than the Pentium 4 and it out-performs the Pentium 4 at equivalent clock speeds... and I'm sure you don't want to waste your time explaining how this could be true.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
The Pentium 4 is huge, which makes it more expensive to produce. I'm sure Intel was trying to shrink the die size a bit when they pared down the trace cache to 8K, and thus keep costs more under control. That's not "no reason".
From the preliminary benchmarks, RDRAM /is/ better than SDRAM now that t he frontsidebus is fast enough for the extra bandwidth to matter.
For certain problems, RDRAM is better. In particular, for cranking through lots of data in a sequential order (e.g. encoding or decoding compressed audio or video!) RDRAM is faster. But for random access to data, DDR SDRAM will crush RDRAM due to much lower latency.
The emulators.com guy is just pissed off because the Pentium 4's core doesn't work as well with emulators than the P6 core did. It's more for multimedia, not for heavy logic programs like emulators are.
This is just another way of saying that the Pentium 4 is broken except for multimedia, which is pretty much what I have been saying all along. The Athlon has all-around good performance, and if you look at price/performance ratios, the Athlon totally wins.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
I disagree completely. SSE2 is not the solution to all problems, and besides one of his big points was that the Pentium 4 loses on code that ran fast on earlier chips. Code that runs fast on a Pentium Pro runs even faster on a Pentium II, for example, but with the Pentium 4 that is no longer true. But the Athlon runs existing code very quickly. It's just not good enough to say that because SSE2 can run fast, and there exists a compiler that takes advantage of SSE2, that the Pentium 4 isn't broken.
And he didn't so much blast a "lack of execution units" as the lack of ability to keep them all working. The Pentium 4 can only feed RISC micro-ops to 3 execution units in one clock cycle. Also bad, the Pentium 4 can only decode a single x86 instruction per clock, so instructions that aren't already in the trace cache are unduly expensive.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
the chips are so fast these days that few people will really notice any difference between a good AMD system and a good Intel system.
You and I seem to agree on what the situation is. The difference is that I hold the Pentium 4 in contempt for being broken, and you seem to think it is a good-enough design. I don't think either of us will convince the other.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
The Pentium 4 has several glaring faults that cripple it.
the level 1 cache is way too small
it can only pass the decoded micro-ops to 3 of its internal execution units per clock, so it can only execute 3 micro-ops per clock (compare to the Athlon, with up to 9 micro-ops executed per clock)
instructions that execute very quickly on other Pentium chips now execute slowly (in particular, anything involving bit-shifting)
These faults and more are discussed here.
Unlike the Pentium 4, the Athlon executes exisiting x86 code very quickly. You don't need fancy optimization tricks to get code to run fast on an Athlon; it has no major faults to work around.
A Pentium 4 system, with its expensive high-speed RDRAM, will be very fast for certain uses. And it has the lead in raw clock speed. If Intel can crank the clock speed way up, say to double what AMD can do, it won't matter that the Pentium 4 is broken; it will still be the fastest chip you can get. I predict this will not happen; AMD will continue to make ever-faster Athlon chips, which will remain competitive with anything Intel can make. (And of course if you look at the performance-over-price ratio, the AMD chips totally crush the Intel chips.)
Of course, it must be said that the chips are so fast these days that few people will really notice any difference between a good AMD system and a good Intel system. The AMD may out-benchmark the P4, but if both of them can run Quake 3 nice and fast, few people will actually care about the differences.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
This is so wrong. The AMD core breaks up an x86 instruction into RISC-like "micro-ops" or ROPs, and then various RISC-like execution units go to work executing the ROPs. Up to 9 ROPs can be executed at the same time! This is why the Athlon so thoroughly stomps all over the Intel chips at equivalent clock rates--the AMD chips can get more done per clock. This is especially true for floating point, where the Athlon can execute 3 floating point instructions at once.
Full details here in the AnandTech article. I linked to page 8, the one that has the discussion of how instructions get executed.
This is the reason why Pentiums cost more than AMD's
Total nonsense. Intel chips cost more because Intel charges more. The Pentium 4 is expensive because its die size is freaking huge.
Let's just say I have inside knowledge of Intel products. :-)
You don't seem to know very much about AMD products.
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
The P4 has all the 3d optimizations, just like the old p3's. The only thing is, most of the programs (not all, but most) that depend on those optimizations and dont use athlon optimizations are originally designed as wintel programs, like quake 3. Those programs are also available as binary only, not source. While the P4 is apparently a great system for an avid gamer, for developers the AMD line will probably remain cheaper and more useful to *nix developers like myself.
I am !amused.
Does anyone else find that funny?
Anyway, I didn't even know there was a 1.2GHz Pentium 4.
Intel is Nvidia, and AMD is ATI. ATI has very promising upcoming chips and some alternative solutions to fixing the problem, just like AMD. Intel and Nvidia are both the dominant makers, but ATi/AMD are gaining on them.
Anyway, AMD is adopting SSE/SSE2 now too, so why WOULDN'T you optimize for it? You're not just optimizing for the Pentium 4, you're optimizing for all future Intel 32-bit processors, and probably upcoming AMD 64-bit processors too.
Again, with lack of execution units he's focusing primarily on the weak FPU, and ignores the very fast SSE stuff. With the release of ICL5 which is smart enough to parallelize loops for SSE2 by itself, there's no excuse for that.
The fourth point was the small instruction cache. Intel doesn't use a normal instruction cache on the P4, it uses what it calls a trace cache. P2, P3, P4, Athlon, they all decode instructions into smaller micro-ops, as you know. Unlike the other instructions, the P4 doesn't cache x86 instructions at all. It caches the decoded micro-ops in the trace cache instead, saving the job (and several pipeline stages) of decoding instructions. The theory is that because the P4 works, for the most part, on the level of micro-ops instead of normal instructions as earlier instructions are, it doesn't need as much cache."
25MB/sec. Is that an IDE disk. No shit. 30-35mb is all an top notch IDE-DISK/ATA100 will do. In the beginning of the disk that is. You better use SCSI RAID for your movies.
Until there is software out there written to exploit the p4 archtecture, p5 is already on the market. Do you get today, software writte especially for p3 ?
Possibly because it's fairly easy to figure out. Hence, having a post about something that everyone already knows is redundant. Likewise, if there was an article linked to http://www.cnncom, then posting "it should be http://www.cnn.com" would be redundant.
The only "intuitive" interface is the nipple. After that, it's all learned.
"The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
Well, the Slash source is available, nobody's stopping you...
The only "intuitive" interface is the nipple. After that, it's all learned.
"The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
1. the long pipeline means if you stall or miss a branch prediction you lose a lot more cycles
2. the L2/L1/trace caches are too small and programs will wind up going to main memory
3. RDRAM is great for streaming sequential bits, but it has high latency for random access. The P4 needs a much larger L2 cache to sit in front of the RDRAM to reduce the random access to main memory.
So that 3.2Gbs figure is not the whole story.
The P4 Xeons have the potential to be great chips if they get some more cache on them. They're going to get a die shrink and more L3 which will help greatly. They could also use larger L2/L1/trace caches to reduce cache thrashing during context switches. The vanilla P4 will probably always have too little cache and will suck hard, though.
It looks like Intel made a decision to go after the high end of the market in a few years time and in the mean time to produce crippled chips that just have really high MHz ratings. And I'd guess that they're going to be fucking the consumer market over pretty hard for awhile to come.
What I'd really like to see: the interleaved DDR SDRAM from the nForce chipset in a multiprocesser server chipset like the 760MP. Ideally something like 4x DDR interleaving with a quad CPU chipset.... *droool*
Well the site's slashdotted anyway.
"A good conspiracy is an unprovable one." -Conspiracy Theory
i don't think the G4s are 64-bit, are they?
Why not. If you are shipping your software on CD or even more so with DVD you could put half a dozen optimised versions on there.
_O_
_O_
.|< The named which can be named is not the true named
for even more insite, try learning to spell.
Treatment, not tyranny. End the drug war and free our American POWs.
Treatment, not tyranny. End the drug war and free our American POWs.
See my user info for links.
Now THIS post is redundant.
chris@xanadu:~$ whatis /.
/.: nothing appropriate.
I assume http://www.linuxhardware.org%3C/a means http://www.linuxhardware.org ;)
chris@xanadu:~$ whatis /.
/.: nothing appropriate.
Did anyone ever consider that there is no good reason to recompile for P4. THERE ARE OTHER CHIPS IN USE!! Should every vendor make a special version for every modern architechture? Most people I know are still running PII's and III's.... ATM I'm on a 200mhz pentium one because my athlon's in for warranty. Maybe games and renderers could be released in multiple binaries.... but generally it's not worth it. Another good thing about open source though is that my entire system is custom compiled :-D
It seems to me that the P4/Athlon debate has brought out a lot of bashing of the P4, as it benchmarks slower than comparable or even slower Athlon CPU's.
These same people, however, don't seem to be bashing the GeForce 3, which in many cases benchmarks slower than some GeForce 2 ultra cards. Sure, it's OK for a video card to change its architecture but not the CPU????
People seem to understand that eventually the GF3 will be the card to get IF games are written to that architectures. The same could be said of the P4 IF APPS are written to the new architecture.
Praying for the end of your wide-awake nightmare.
You hardly need to be a Savant to know how to spell. And besides, aren't Slashdot editors supposed to actually edit the submissions?
Well the fact that a 64-bit processor can do 64-bit calculations in the same time as a 32-bit processor of the same clock speed can do 32-bit calculations, I'd say it's hardly something worth 'getting over'.
While you're accurately describing the situation today, it need not stay this way. There are some very interesting projects, out in academia, which might address this very important issue.
Take a look at "slim binaries" and "Dynamic Code Reoptimizers" here for a starting point.
The interesting aspect to this, from a social and economic perspective, is that it is projects like this which could reduce the benefit of any existing monopolistic position on the desktop. Given this, I'm somewhat saddened that these ideas haven't been picked up by companies like SUN or communities like Linux. Perhaps this really isn't ready for 'prime time', but cash and interest from SUN could go a long way to aiding this work.
Intel would also gain from this. As you've pointed out, software tends to be optimized towards the least common denominator of hardware. That eliminates much of the advantage of newer architectures. Techniques such as these would increase the incentive for hardware upgrades, as existing softwares' performance would be immediately improved.
What, me worry?
Somebody needs to work on an ispell module for slashcode; in theory it shouldn't be that difficult. Put computers to work for you. Everybody would be happier, and would look smarter to boot!
Under Linux, I would not buy a P4. It is just too damn expensive. If you want performance and are running something other win 9x, go with the Dual Athlon. It kicks butt and costs less than a P4. The only thing faster right now is the dual 1.7 P4 Xeons, but you could buy a least couple of dual Althons and cluster them for the price of a Dual P4 Xeon.
This comment should be "Score 1; Duh!"...
Karma whorin' since 1999
Of course, those claims do sell computers.
Donate background CPU time to fight cancer.
If I recalled in the article that the G4 was about %25 faster per magahertz then an equilivant p3.
Most apple benchmarks do only tests with adobe photoshop sadly, but they are better processors and they have alot less transitors and use less power. I wish I could remember the url to show you. Apple has some multimedia extensions built in the chip that photoshop uses that are far supperior to mmx2 in the p3's which make it run photoshop really well. Anyway Apple pressured Motorolla to make faster G4's to combat the high speed p3 problem. The newer 733mhz G4's are out and should be close to the same speed as a 1 ghz p3 or 1.4 ghz p4 for ordinary unix/app use. I am sure for a photoshop user the results would be even better. Apple should of saw this comming. The G4 and G3 powerpc processors are truly RISC unlike the p3 and p4 which are a combo cisc/risc.
If I had money to burn I would love an apple powerbook where I can save battery power due to the fact that the powerpc has less transitors and runs equilivantly on less magahertz. Running Linux on it of course.
http://saveie6.com/
This is not quite true - I guess certain things are slow on any architecture (bad algorithms), and compiler/interpreter should be the one to decide what works great on the platform at question. Nowadays people just don't have time to optimize, and it's a bad idea anyway - look at The art of unix programming.
The number of threads is one thing to consider, though... and anyone knows that more processors => better multithreading.
I think programmers (and geeks in general) know stuff about processors because it's interesting and fun, not because they really need to.
Save your wrists today - switch to Dvorak
People only look at the clock speed when picking out a machine. Here's a did you know: The 500 MHz G4 processor by Apple performs roughly the same in benchmark tests as the 1 GHz Pentium III.
Which benchmarks would those be?
"And like that
By the way, do you think that people buy ECC memory only for servers?
Is this (optimise) a new spelling? (It's spelled this way many times in the article.)
Got friends?
Take a quick look on pricewatch. Does anyone do their research and form their own oppinions anymore or just automagically adopt the high-scored ones on slashdot? Intel's slashed prices to unprecidented (for them anyway) levels... obviously in response to AMD, but none the less, the _PRICE_ motivation simply isn't as great as it used to be.
Looks like the P4 goes pretty much as fast as anything else unless they turn on the chip specific optimisations, but I don't think that will matter at all, since the average PC purchaser will look at the 2Ghz(ish) ratings and go "Ohmygod - must be weelly fast!" I'm suprised they didn't have a 40 stage pipeline and really get people excited.
I wonder if they'll consider on-board 802.11b when they hit 2.5Ghz?
Reliable, Great Value Hosting: $7.95/mo 2.4G/120G
If any gcc hackers out there are reading, just le me know where to start poking and I'll try and implement a solution.
Ryan T. Sammartino
Ryan T. Sammartino
"Ancora imparo"
Duh, where did you get the idea that everyone who reads /. is a programmer? Second point, any really good programmer will know quite a bit about chip architectures and what works best in a given situation. A programmer who doesn't understand the architectures he works with won't produce code that gets the best performance. A real nerd, geek, or hacker depending on your bent and preferance will know quite a bit about software and the hardware it runs on. You can be competent and not know about both but you will never be really good without knowing a lot about both.
"If there is nothing you are willing to die for, then you are not really alive." Myself
If you have enough dough to waste on a P4, get a Mac instead...
Macs are cheaper than P4 systems?
G4 - from Apple's Online store$2,199.00
P4 System - from Gateway.com
$1,889.00
And since P4 chips are only around $320, building a system yourself will be even cheaper. Is it even possible to assemble your own G4 system, or is Apple afraid that people might house G4s in non-translucent cases?
I don't know whether I am just blowing hot air, because I am not sure where the bottlenecks where in the test, but it seems to me that if you are going to use expensive RDRAM for the Intel, you should be using inexpensive DDR RAM for the AMD. Also, I wonder why they chose 256MB of RAM, instead of, say, 512MB? I could definitely understand why you wouldn't want to spend too much on the RDRAM, but what about for the AMD? On a cost basis, 512MB of DDRAM is a lot more affordable. Lastly, why aren't any of these guys using the latest 1.4GHz AMD chips? Is there some type of problem with these chips?
i urge you to seep professional help it doesn't really matter if someone is a communist, and there's a big difference between what communists think a government should be like and what communist governments turned out to be
Oh, and they say the P4 is only useful for high-intensive graphics and gaming. Even that is wrong. If you have enough dough to waste on a P4, get a Mac instead for graphics and an N64 and PS for games.
It is "insight". Sometimes people use spellings like "insite", "tonite", "drive-thru", etc. to sell something or on aim because we are lazy and don't want to type more than necessary.