Understanding Pipelining and Superscalar Execution
Zebulon Prime writes "Hannibal over at Ars has just posted a new article on processor technology. The article uses loads of analogies and diagrams to explain the basics behind pipelining and superscalar execution, and it's actually kind of funny (for a tech article). It's billed as a basic introduction to the concepts, but as a CS student and programmer I found it really helpful. I think this article is a sequel to a previous one that was linked here a while ago."
One thing that his excellent analogy leaves out is the concept of branch prediction.
For those of you who didn't major in CS...
Imagine that we finish the first stage of building our SUV (building the engine) and commence with stage 2 (putting the engine in the stasis). While we are doing that we are building another engine for SUV #2. However, what if the next customer didn't want an SUV, but instead wanted a compact car. We have to throw away our engine for SUV #2 and start over. We wasted an entire stage!!
This analogy doesn't work so well it seems. So we'll stick with computers. If you have 5 instructions in your pipeline and one of them is a conditional branch (think, If the user hit ENTER, print a message to the screen. If they hit escape, BSOD).
If the conditional instruction is high up in the pipeline then every instruction under it could be wasted. Obviously, if the processor could predict which path the branch would follow it would waste less instructions.
Branch predicting algorithms are extremely interesting. The early ones were very simple with:
Prediction: Never take the branch
OR
Prediction: Always take the branch
People soon realized that most branches were in loops, so they came up with a new algorithm
Prediction: If the last time we were here we took the branch, take it again, otherwise don't take it. Basically, repeat what we did the last time we ran this instruction.
IIRC there are lots of branch prediction algorithms, some of which are eerily accuratae (above 90%). Unforunately, branch prediction requires cache which takes away from the cache your programs need.
Thank you Mario! But our princess is in another castle!
to a certain point, but the P4 is a bit excessive
Actually, there is a lot of research about pipeline depths, and here is a paper that calculates the optimal pipeline for x86 to be around 50 stages. In fact, they theorize that you could see up to a 90% increase in performance in the P4 by making the pipeline even deeper. So not everybody thinks that the P4 pipeline is "a bit excessive."
think rambus (bad idea... memory width is a good thing!)
I'm a little confused here- until the past few months, Rambus still offered superior memory bandwidth. It wasn't until DDR333 and higher that SDRAM started to catch up. Rambus didn't lose in the market because of performance.
Itanitium (scrap an entire architechture for one that allows you to disable instructions, so that it is gauranteed that part of the processor won't be used at that point)
That is a pretty strange complaint about Itanium. In fact, I think that it is weird that you even think that is a problem.
"The defense of freedom requires the advance of freedom" - George W Bush