Double-Whammy Look At The Pentium 4
SystemLogicNet writes: "We at SystemLogic.net have just taken a technical look at the Pentium 4 architecture. In the article we go over all the basics that all the other sites cover like the double pumped ALUs, iSSE2, the longer pipeline, etc, but in addition we have some discussion about how different program structurings have an impact upon the design, and performance of the Pentium 4. One of the major areas where this comes into play is how complex data structures interact with the underlying philosophy that the Pentium 4 is built upon -- extreme bandwidth. This Pentium 4 technical background can be read over here. At the same time, we've done a rigorous analysis, including benchmark description and discussion regarding the Pentium 4's performance, and this can be read over this way."
There is a good page about wh the Pentium 4 sucks. It's written by an assembly-level programmer, so he know quite a bit about processors.
While the Artical makes some good points, it is obviously biased against Intel. Some of the statements are not accurate. The artical should be taken with a grain of salt.
Mod the parent down to -1.
I will second that! What an incredibly stupid analogy! In one case, you have something that results in an almost imperceptible slowdown while, in the "analogy", you have a computer completely failing to work one out of ten times.
I agree that the P4 is not the best CPU at this time. However, Intel has designed this new architecture looking out 10 years or so. Many of these choices are dictated by the laws of physics, and all other processors will be heading this direction over time.
The fundamental problem is that propagation speed of a signal on a chip is essentially fixed (that's why the minor improvement from a special trick like copper wiring was a big deal). As you speed up the transistors, the signal propagation delay becomes more of a bottleneck.
To avoid this, you have to break the logic steps into smaller pieces that live in a smaller portion of the chip. The standard way to do this in synchronous logic is to pipeline the work into more stages. The total signal propagation delay to do one instruction remains about the same, but at least you can pipeline alot of instructions to try to get more work done.
This processor is not very competetive today, but in 5 years there won't be any other way to make forward progress. By that time, Intel will have worked out the kinks (problems with branch prediction, memory interface snafus, etc.), and this core will probably be as wildly successful as the Pentium Pro/PII/PIII/Celeron core was.
BTW, remember how sucky the Pentium Pro was when it came out? It was a piece of crap on 16-bit code and it would generate huge pipeline bubbles for no good reason. Over time, they fixed these problems and made countless $billions in the process. Watch for a repeat with this new architecture.