More on the PowerPC 970
functor writes "Ars Technica's Jon Stokes has a treatise up covering the microarchitecture of the high-performance 64-bit PowerPC 970 microprocessor, due to be released by the end of the year, that goes over in detail how this chip is put together, and how we can expect it to perform. This is the follow-up to Stokes' article detailing the PPC 970's design philosophy. 'It appears to hold quite a bit of promise in bolstering Apple's currently almost obsolescent product line, and it appears to have been designed explictly to fulfil Apple's requirements. To say the least, the second half of this year looks to be pretty interesting as Apple's product line promises to become competitive performance-wise with IA-32 and x86-64-based PCs again.''
This is still a PPC chip. No changes to programs are necessary for them to run on it. The only change that will have to be made is if a software vendor decides to run in 64-bit mode which many don't have to do. Performance of the new chip is not dependent on whether the program runs at 32 or 64 bits. This is not a migration like moving from the 680x0 line of processors to the PPC which was an overall change in architecture.
They can't make the FSB DDR or QDR without appropriate support from the processor, and that's exactly what they haven't been getting from Moto.
As Nethack would say, "Ugh! This meat is tainted!"
The 970 is fundamentally a 64-bit processor, and its performance must be evaluated in that context. The fact that the 970 will pull off amazing speed in the 32-bit arena only shows how well-designed this processor is.
Keep in mind, the Hammer is only shipping at 1.4, 1.6, and 1.8 GHz - the same speeds the 970 is targeted at. And the 970 has the advantage of an ISA that was designed from the beginning to do 32 and 64 bit addressing, versus one that's a 64-bit extension of a 32-bit extension of a 16-bit micro with full compatibility to an 8-bit redesign of a 4-bit processor.
Interesting, if you look at the pipeline design of the PowerPC it is much closer to Intel than AMD. The PowerPC pipeline has sixteen stages, the Pentium 4 twenty, and the Athlon ten.
Presumably the P4 can reach higher clock speeds than the Athlon because there is less work to do at each pipeline stage. On the other hand a longer pipeline increases the probability of a stall, so the work done per clock cycle goes down.
I'd speculate that the PowerPC ought, therefore, to be able to achieve clock rates approaching but not equalling the P4, since they are both comparatively "over-pipelined". At the same time, the PowerPC ought to deliver slightly more throughput per clock cycle because the pipeline is slightly shorter.
Meanwhile, the Athlon will be running at a significantly lower clock rate, but delivering comparable throughput.
No, but IA-32 motherboard manufacturers go a good number of steps further. ;)
I recommend that you investigate Intel's Placer (E7505) chipset and motherboards based on it (several of Supermicro's offerings, as well as offerings from Tyan and other manufacturers, e.g. the Iwill DPL533 and DP533.
These motherboards support 133 MHz QDR system buses (coming to 533 million transfers a second), matched (quite well) with two channels of PC2100 DDR SDRAM (resulting in 4.267 GB/s of memory bandwidth that is actually utilizable by the processors, since the memory bandwidth matches the system bus bandwidth, unlike Apple's offering, which is bottlenecked by the system bus at just 1.333 GB/s, whether you have one processor or two).
(And I'm certain that 200 MHz QDR Xeon chipsets are not far off in the future, since Intel in general appears to be headed in that direction.)
The Pentium 4 is, in fact, designed to scale to high clock speeds exactly so that it can tolerate lots of pipeline bubbles in flight without ending up stalling for too long.
A lot of these tricks (high decode bandwidth, multiple instruction queues [really buffers meant for reordering the instruction stream], branch prediction, etc.) are meant to reduce hazards such as pipeline bubbles as far as possible, and the PPC 970 does these hazard-reducing operations rather well, too.
And, yes, we're now in the post-RISC world where instruction complexity (particularly in the realm of SIMD and streaming/explicit cache manipulation instructions) is growing because simple instructions clearly aren't enough to allow for great throughput increments.
(Read some of Stokes' older articles in the Ars Technopaedia; I'm sure you'll find them interesting.)