Slashdot Mirror


Understanding Pipelining and Superscalar Execution

Zebulon Prime writes "Hannibal over at Ars has just posted a new article on processor technology. The article uses loads of analogies and diagrams to explain the basics behind pipelining and superscalar execution, and it's actually kind of funny (for a tech article). It's billed as a basic introduction to the concepts, but as a CS student and programmer I found it really helpful. I think this article is a sequel to a previous one that was linked here a while ago."

6 of 87 comments (clear)

  1. P&H - Pipelining by minesweeper · · Score: 5, Interesting
    from the read-hennessy-and-patterson-first dept.

    I just finished a CS course co-taught by Professor Patterson, and our primary text this semester was Patterson and Hennessy's Computer Organization and Design.

    When we discussed pipelining this semester, the analogy used was the four stages of doing laundry: washing, drying, folding, and stashing. Here are the lecture notes (both PDF). The notes spend a good deal of time going over the hazards of pipelines and how to avoid them.

    1. Re:P&H - Pipelining by Jucius+Maximus · · Score: 5, Interesting
      "I just finished a CS course [berkeley.edu] co-taught by Professor Patterson, and our primary text this semester was Patterson and Hennessy's Computer Organization and Design [berkeley.edu]. When we discussed pipelining this semester, the analogy used was the four stages of doing laundry: washing, drying, folding, and stashing. Here are the lecture [berkeley.edu] notes [berkeley.edu] (both PDF). The notes spend a good deal of time going over the hazards of pipelines and how to avoid them."

      I just finished an engineering course with the exact same book, and we did that exact same pipelining example. I must say that the books is really very good at explaining the workings of the CPU.

      The best teacher, though, is design. As part of the course, we all formed groups and actually designed and implemeted basic non-pipelined CPUs in VHDL and, if they fit, we implemented them on FPGA (field programmable gate array) boards. And by 'simple' CPUs, I mean that the CPUs have like 4K RAM, maybe 8 registers, ran at ~3 cycles per instruction, and only have about 12 instructions total. But it was REALLY informative because it forced us to learn the exact purpose of everything in the CPU.

  2. And you're a CS student?!!! by sparkz · · Score: 5, Interesting
    I knew this before I did a BSc in CS; what I learned later, was the real stuff - how memory speed affects pipelining; memory has not got significantly faster in the past decade, where CPUs have gone from 30MHz to 3GHz. Therefore even more pipelines are required now, as we sit around, cycle upon cycle, waiting for memory to feed us some data.

    In fact, Alan Cox gave a talk on this recently: UMeet2002.

    --
    Author, Shell Scripting : Expert Re
  3. Re:A request for a future Ars Technica story by Ross+Finlayson · · Score: 3, Interesting

    Oops, that should, of course be: "Understanding why black text on a white background is easier to read than *white* text on a *black* background."

    That'll teach me for trying to make a snide comment :-)

  4. Enough with the bad analogies! by Wesley+Felter · · Score: 3, Interesting

    Am I the only one who finds this stuff easier to understand when the author just explains what actually happens instead of using analogies? I thought the Hennessey & Patterson version of this was better, but then it wasn't free...

  5. Re:Branch Prediction in PowerPC 970 by CyberDave · · Score: 3, Interesting

    The forthcoming IBM PowerPC 970 CPU is supposed to have a very sophisticated branch prediction unit. (I'm not sure how it compares to that of the POWER4, from which the PPC 970 was derived, or how it compares to other CPUs, though.)

    (Disclaimer: recalling all of this from memory based on the paper I wrote a few weeks ago on the PPC 970. Forgive me if I over-simply or mis-state something.)

    The PPC 970 hast three branch history tables (BHTs). Each one has 16k (2^14) entries of one bit each. One BHT follows the more or less traditional method of tracking whether or not the branch prediction from a previous execution of the instrution was successful. One BHT has its entries associated with an 11-bit vector which tracks the last 11 instructions executed by the CPU (and using this to determine if the branch prediction was successful. The third and final BHT is used to determine which BHT has been more successful for the corresonding instructions. For each individual branch instruction, the third BHT is used to determine which method has had better success in the past and then that BHT is used as the branch prediction method for this execution of the instruction.

    CyberDave