It's great and all that the time to double capacity of mass storage devices is less than the time to double 'capacity' (usually measured in transistors) of modern microprocessors, but it's fallacious to suggest that mass storage is doing 'better' overall. In fact, you can't really say which one's 'better' since they're so different in nature.
Moore's law is largely due to manufacturing improvements in which the feature size of transistors keeps becoming smaller, such that you can get (approximately) twice as many transistors in the same amount of space. (yes, yes, I know, die sizes keep growing, but not nearly at the pace at which transistors shrink.) The tricky part here is that this shrinking has generally been coupled with ramping up frequency. Increasing the capacity of a disk has no such benefit due to the fact that mechanical parts (disk heads, spinning platters) are the overwhelming determining factors for performance. Hence, the gap between processor performance and disk performance is being exacerbated - we can only make a disk spin & heads move so fast.
It's an interesting comparitive trend to notice (between processor performance growth & disk capacity to see the effect on the overall system), but you can't really compare the way disks have improved with the way microprocessors have.
As I'm sure has been pointed out, you'd still need a decode stage in almost any but the most useless risc architecture - got to convert those opcodes to control lines (regwrite, memwrite/read) and register fields. This doesn't come "for free", even with simple risc architectures like MIPS. Ergo, you need a decode stage. More complexity simply means more stages dedicated to decode (a la x86, power series.)
Another point: POWER is not the same thing as the PowerPC ISA. POWER (which is cisc) supports some wacky instruction types such as vector load/store - these get cracked into IOPs. The vast majority of PPC instructions do not need to be cracked, though cracking can be useful depending on the microarchitecture that implements the ISA (for instance, fused multiply-add which is common in multimedia apps may get cracked into 2 separate instructions.)
Did you read the report for power4 I just posted? Obviously not, because if you had you'd have seen the "CPU(s) enabled: 1" notice. Yes, I know that each physical CPU is a chip multiprocessor (CMP - an SMP would describe the overall system.) However, one core can be disabled - this is how IBM tests power4 for SPEC (see below explanation of how SPEC works & how it's tested.)
Furthermore, you clearly haven't done your homework on SPEC. SPEC applications are run one-at-a-time, and are specifically uniprocessor applications. They are not threaded. (They *can* be manually threaded to compare performance, but SPEC forbids this for purposes of generating official reports.) SPEC measures single-processor performance. This is why IBM uses only one processor core when testing - cache synchronization overhead and interprocessor interrupts actually hurt (application, not overall system) performance on single-threaded code. SPEC measures application performance.
The above has nothing to do with mac & everything to do with simply reading & understanding benchmarks & computer architecture. H&P is a good starter book for those of you out there who are interested.
The 2 ALU units w/ group dispatch architecture is lifted directly from the POWER4 architecture, which has fairly good integer performance. If you had read the power4 system microarchitecture specification, you would know that. The guys at IBM aren't asleep at the switch afterall, perhaps?
Comparing what the power4 gets on specint 2000 @ 1.45Ghz (Score of 935) versus Intel's P4 @ 3.06Ghz (Score of 1091), the power4 holds its own in integer performance and in fact, so should the 970. (AMD's athlon 3000+ comes in at 995, btw.) To claim that "clock-for-clock integer performance will be worse" is utterly bogus. (What are you talking about with resepect to a lack of FSB performance? The 970's 900Mhz blows away intel...)
Give the microarchitectures (and performance numbers) a glance next time before going on a dilettantish ramble about architectures.
Transmeta hasn't succeeded for various reasons, all of which are arguable, but I don't think performance really was one. Most notably, their wild claims & marketing 'buzz' (lies) about power consumption didn't pan out. However, the point that x86-to-VLIW emulation in hardware actually works reasonably well (performance-wise) is usually lost to the masses that see a failed dot-bomb. This was an incredible feat. While they didn't get 700Mhz p3 performance out of a 1Ghz crusoe as they claimed, they got fairly close - and with a far simpler (and far different) processor than the p3. My point is, the technology for hardware JIT compiling & optimization to a VLIW from x86 is in place - intel should have at least looked at using it before now.
I wonder if the emulation technique they'll be using will be similar to Transmeta's 'code-morphing'. I always wondered why intel didn't license that idea & use it on their Itanium. 'Code-morphing' achieved middle-of-the-road x86 performance on a VLIW (sound like a familiar goal?), but it was still far better than what Itanium gets with its current x86 support.
Moore's law is largely due to manufacturing improvements in which the feature size of transistors keeps becoming smaller, such that you can get (approximately) twice as many transistors in the same amount of space. (yes, yes, I know, die sizes keep growing, but not nearly at the pace at which transistors shrink.) The tricky part here is that this shrinking has generally been coupled with ramping up frequency. Increasing the capacity of a disk has no such benefit due to the fact that mechanical parts (disk heads, spinning platters) are the overwhelming determining factors for performance. Hence, the gap between processor performance and disk performance is being exacerbated - we can only make a disk spin & heads move so fast.
It's an interesting comparitive trend to notice (between processor performance growth & disk capacity to see the effect on the overall system), but you can't really compare the way disks have improved with the way microprocessors have.
Another point: POWER is not the same thing as the PowerPC ISA. POWER (which is cisc) supports some wacky instruction types such as vector load/store - these get cracked into IOPs. The vast majority of PPC instructions do not need to be cracked, though cracking can be useful depending on the microarchitecture that implements the ISA (for instance, fused multiply-add which is common in multimedia apps may get cracked into 2 separate instructions.)
Furthermore, you clearly haven't done your homework on SPEC. SPEC applications are run one-at-a-time, and are specifically uniprocessor applications. They are not threaded. (They *can* be manually threaded to compare performance, but SPEC forbids this for purposes of generating official reports.) SPEC measures single-processor performance. This is why IBM uses only one processor core when testing - cache synchronization overhead and interprocessor interrupts actually hurt (application, not overall system) performance on single-threaded code. SPEC measures application performance.
The above has nothing to do with mac & everything to do with simply reading & understanding benchmarks & computer architecture. H&P is a good starter book for those of you out there who are interested.
Comparing what the power4 gets on specint 2000 @ 1.45Ghz (Score of 935) versus Intel's P4 @ 3.06Ghz (Score of 1091), the power4 holds its own in integer performance and in fact, so should the 970. (AMD's athlon 3000+ comes in at 995, btw.) To claim that "clock-for-clock integer performance will be worse" is utterly bogus. (What are you talking about with resepect to a lack of FSB performance? The 970's 900Mhz blows away intel...)
Give the microarchitectures (and performance numbers) a glance next time before going on a dilettantish ramble about architectures.
Transmeta hasn't succeeded for various reasons, all of which are arguable, but I don't think performance really was one. Most notably, their wild claims & marketing 'buzz' (lies) about power consumption didn't pan out. However, the point that x86-to-VLIW emulation in hardware actually works reasonably well (performance-wise) is usually lost to the masses that see a failed dot-bomb. This was an incredible feat. While they didn't get 700Mhz p3 performance out of a 1Ghz crusoe as they claimed, they got fairly close - and with a far simpler (and far different) processor than the p3. My point is, the technology for hardware JIT compiling & optimization to a VLIW from x86 is in place - intel should have at least looked at using it before now.
I wonder if the emulation technique they'll be using will be similar to Transmeta's 'code-morphing'. I always wondered why intel didn't license that idea & use it on their Itanium. 'Code-morphing' achieved middle-of-the-road x86 performance on a VLIW (sound like a familiar goal?), but it was still far better than what Itanium gets with its current x86 support.