Double-Whammy Look At The Pentium 4
SystemLogicNet writes: "We at SystemLogic.net have just taken a technical look at the Pentium 4 architecture. In the article we go over all the basics that all the other sites cover like the double pumped ALUs, iSSE2, the longer pipeline, etc, but in addition we have some discussion about how different program structurings have an impact upon the design, and performance of the Pentium 4. One of the major areas where this comes into play is how complex data structures interact with the underlying philosophy that the Pentium 4 is built upon -- extreme bandwidth. This Pentium 4 technical background can be read over here. At the same time, we've done a rigorous analysis, including benchmark description and discussion regarding the Pentium 4's performance, and this can be read over this way."
Umm... This isn't a very good analogy. Imagine instead:
STOP . AMERICA . NOW
Personally, I don't agree with the Brute Force methodology by Intel; I prefer simpler, cleaner and more elegant solutions. It is difficult to deny, however, that the brute force method has worked so far. Yes, yes, I know that the "x86 suxx0rs" crowd is now going to come out of the woodwork. Let me just say this: It may not be the best architecture, and it may be kludged for backwards compatability, but... it works, and it's cheap. With any luck, the 64-bit processors will be able to buck the trend of backwards compatability (has anyone heard anything about this with regards to Itanium and/or AMD's 64-bit chip?).
"To hope's end I rode and to heart's breaking: Now for wrath, now for ruin and a red nightfall!"
Tom's Hardware Guide or AnandTech
Sorry, but comparing a 1.1gig/200Mhz FSB Athlon to a 1.7gig P4 is laughable at best. What hardware review site uses a processor that's over a year old (Athlon 1.1gig/200FSB) in a comparison to one of the latest processors from the competition?
also, note that the 1.7 ghz p4 has a 600 mhz advantage over the 1.1 ghz athlon and usually the performance difference was only 10-40%. the p4 has over 50% more processor mhz than the athlon. what an unfair comparison, especially when the 1.33 ghz athlon is out and available for purchase. processor mhz for processor mhz, the athlon beat the p4.
The point of the extremely long (20-stage) pipeline of the Pentium 4 is the ability to reach extremely high clock speeds - much higher than the Athlon could ever reach. Of course, Mhz-for-Mhz, the Athlon is going to beat the Pentium 4 performance-wise, but it wouldn't tell us anything except the obvious differences in the two's design philosophies.
Light a fire for a man and he'll be warm for a day. Light a man on fire and he'll be warm for the rest of his life.
Using the new process of W.attage H.alting R.esistance E.ngineering, Intel can reduce pent-up system tension at an even lower cost.
Also, the WHORE system is fully compatible with the C.omposite R.ecursive A.lgorithm C.reation K.it used for extreme overclocking.
"The CRACK/WHORE combination should be a killer setup for many of our users, and we have already had several U.S. senators make inquiries" says John Thompson, head of engineering at Intel. "We even allow for massive clustering with the P.arallel I.nsulating M.ultipartite P.olymer, or PIMP management process.
Thompson also spoke of project BITCHSLAP for correcting wayward systems, but could not elaborate on it...
Did you just cut and paste a press release onto the front page of Slashdot?
I sure hope you Slashdot isn't selling Front Page space to any little company that pays...
the graphs are not done fairly. they almost never started at 0 to a result slightly higher than the higher result of the two processors, they were always done so that the intel bar was much longer (and therefore appeared to do much better) than the athlon when the actual results were that the two processors were pretty close.
also, note that the 1.7 ghz p4 has a 600 mhz advantage over the 1.1 ghz athlon and usually the performance difference was only 10-40%. the p4 has over 50% more processor mhz than the athlon. what an unfair comparison, especially when the 1.33 ghz athlon is out and available for purchase. processor mhz for processor mhz, the athlon beat the p4.
The 1.4 ghz athlon has been out for a couple months now... the 1.1 ghz athlon has been out for at least 10 months.
Here is a june 6 pcworld review where an amd 1.4-GHz system is "the fastest system yet tested by PCWorld.com" beating out 5 systems based on the 1.7 ghz p4.
Here is a tech report review of an amd 1.33 vs intel 1.7 where they conclude: "Intel's new entry, the 1.7GHz Pentium 4, performs about like a 1.2GHz Athlon in most situations."
You cant get duel processing power from a pentium 4 like you can with an athlon.
We have the best government that money can buy.
I was shocked by this review site. Most all the graphs are misleading. Most magnify the area of differenc between the two processors to make the margin look larger. For example, in the benchmark "Content Creation Winstone" (http://www.systemlogic.net/reviews/hardware/proce ssors/intel/p41700/i/c7.gif), the difference is only 3.6 points, yet the scale is nearly 1/3. That's nearly 3x magnification.
Some only differ by a few percent, the lowest about -4.5% of P4 score, yet the distance represented on the graph would suggest nearly a 60% difference or more.
This review site needs to get a clue about statictics and start using proper graphing according to real differences, not magnified margins.
"I'll just chip in a bit for RedHat: I actually have that installed on my university machine." - Linus, '95
The Pentium 4 is, by Intel, considered to have "Hyper pipelined technology."
I can see the ads now..... "The Pentium 4 - Because our Pipline is bigger than theirs!
S.t.e.v.e.
Kjella
Live today, because you never know what tomorrow brings
Actually, it's not exactly "hard" stuff from an implementation point of view. Cycle times are short so you want a predict equation that you can do quickly and in one cycle. In fact, you can get pretty good results with a simple 4 state strongly not taken (00) - weakly not taken (01) - weakly taken (10) - strongly taken (11) saturating counter that updates when a branch is confirmed to be taken or not taken. If your BHT (branch history table) is sufficiently large, you can get decent results. Sprinkle in some voodoo magic by adding a GHR (global history register) which hashes the opcode address based on the state of the last n taken branches and you can get a couple of extra percentage points. I've seen upwards of 95%-97% prediction rates with such implementations but that's in a RISC environment which also provides fairly accurate branch hints in the opcode itself (much like the Itanium does). (The compiler knows what the code should do and what the semantics of a branch are: an "if", "for", "switch" construct, etc.)
Where things probably get weird for Intel is that their BHT probably suffers a bit of address aliasing/underutilization due to the fact that x86 opcodes are variable length. With RISC architectures (fixed length opcodes), you can chop off the last couple of address bits since the 0,1,2,3 cases don't matter == less address aliasing over a greater range of addresses.
Mispredict bypass buffers are another nicety that help back out of branch mispredicts because you don't have to go running back to the I$ and wait two cycles. In fact, while you're going down the codestream for the "predicted taken" path, you can also load up the "not predicted taken" path into a line buffer from an alternate cache such as a BTB (branch target buffer: if the data is available, a TLB entry exists, etc) and bypass the 2 cycle hit on the I$ on the mispredict. Two cycles are two cycles...
Engineers have a very big bag of tricks to work from..but they do have to know when to cut the apron strings and say "out with the old, in with the new." I think the key to major ramp-ups in speed for the x86 architecture is going to be when Intel proclaims "The Great Simplification" (a la "A Canticle for Leibowitz") and deprecates a whole slew of ancient modes (e.g., 286 type stuff) such that they must be emulated through an OS trap. By that time, DOS based OSs like W9x will be about as common as Win311 is now so it won't even matter. About the only people who I can see complaining then are VMWare, Netraverse, Plex86, and the WineHQ Team.
If he was writting in C and using asm for the most preformance intensive functions as is now standard practice for the non lazy (who know their target platform and optimize for it)it would not be such a chore.
;-)
Damn... that's the first time I've seen someone who programs in C/C++ tell someone who programs in ASM that he's lazy. What balls, man! Way to go!
"And like that