The 9 stage length is in the case of a load/store instruction. Since it is usual for the minimal pipeline length to be reported (ie: for single-cycle integer instructions), 7 stages is a more accurate figure for the Alpha.
Dell doesn't have any dual-core processors in the $400-$500 range. heck, the dual-core 2.0 GHz processor in the iMac retails for over $400 by itself. Anand's benchmarks put it on a pair with an Athlon X2 3800+ or a P4 3.0-3.2 GHz.
Which Alpha had a 9 stage pipeline? The '064-'264 were all 7-stage designs. The 604 had 6 pipeline stages, not 4. The Pentium Pro had 10 pipeline stages, not 14 (though, some sources erroneously report 14). US I and US II had 9 stage pipelines.
All processors lengthen the pipeline in order to reach higher frequencies. It's a basic engineering tradeoff. The difference between the P4 and the Alpha was that the P4 massively traded-off IPC for clock-speed, while the Alpha didn't. The Alpha's 7 stage pipeline (which was substantially longer than only the simplist RISC chips) was offset by sophisticated branch prediction (for the time), so its IPC remained quite high. The P4's long pipeline wasn't offset by correspondingly good branch prediction. Thus, the P4's IPC is only about 2/3's as much as a 21264's.
The Opteron isn't the "evolution" of the Alpha, but it takes many many basic design cues from the Alpha. An Alpha and an Opteron are much more similar to each other than either are to a P4. The Opteron's design is *very* similar to a 21264's, especially in the back-end. Consider:
1) Both CPUs cluster their integer units into ALU/AGU pairs. The 21264 has two such pairs, the Opteron has three such pairs.
2) Both CPUs use 64KB/64KB L1 i/d caches with 2-way set associativity.
3) Both CPUs have a similar number of in-flight instructions (72 versus 80).
4) Both CPUs move some decode work to the i-cache by using pre-decode bits in the cache. Both use the same number (3) of decode bits.
5) Both CPUs have asymmetric dual FPUs, one for FADD one for FMUL. The 21264 puts complex float instructions (sin, cos, sqrt) in the FADD unit, the Opteron puts it in the FMUL unit.
6) Both CPUs handle register renaming in the FPU pipeline almost exactly the same way. The Athlon's FPU pipeline is basically the 21264's with one extra unit (the FPMISC unit), some extra registers (for SSE), and some extra pipeline stages (since it runs at a much higher clockspeed).
7) Both CPUs use a memory crossbar.
The Opteron and 21264 are cleary different architectures --- the decoders are (necessarily) completely different, the two CPUs handle register renaming in the integer units completely differently, and the load-store units and TLBs in the two CPUs are different. However, its hard to miss the similarities in the high-level organization of the two chips. To a first approximation, calling the Opteron the offspring of a K6 and a 21264 would not be a strech.
All the Alphas had a 7-stage pipeline. This was long compared to, say, a PPC 601 or a 603, but the Alpha was also a much more sophisticated CPU than those chips. Compared to chips like the UltraSPARC I and II (in-order designs), which had 9-stage pipelines, and the PPC 604e, which had a 6-stage pipeline, the 7-stage length of the 21164 and 21264 was quite reasonable. This is especially true for the '264, whose 7 stage pipeline is quite short for a massively out-of-order 4-issue processor.
AMD's Opterons aren't Alpha's and it's a good thing. Alpha's sucked and the P4 looks much more like and Alpha than the Opterons do.
Um, what? You're right that the Opterons aren't Alpha, but wrong in saying they are closer to the P4. The Opteron's FPU pipeline looks incredibly like the 21264's, right down to the dual assymetric pipes. The load/store setup and L1 cache setup of the two architectures are very similar, down to the 64kb/64kb L1 cache sizes. The 21264's integer pipeline is much shorter than the Opterons (7 versus 12), and much much shorter than the P4's (20+). The 21264, in its more modern iterations (eg: the 21264B and 21264C) are higher IPC designs than the Opteron, in both integer and FP. What is your basis for the claim that the Alpha is closer to the P4?
haha, Alpha had the grimmest, most threadbare instruction set imaginable. It's strength was it's ferocious clock rates that were enabled by abnormally deep pipelines and instructions that did relatively little (no integer divide!).
The 21264's pipeline's were actually quite shallow for an out-of-order processor. At 7 stages, it was closer to the Pentium's 5 than the Pentium Pro's 10. It was quite a bit shorter than the pipelines of any modern, comparable processor. The PPC970 has a 16 stage pipeline, the UltraSPARC III has a 14-stage pipeline, the SPARC64V has an 11 stage pipeline, the Opteron has a 12 stage pipeline, etc.
relatively poor IPC, very deep pipelines, very high clockrates, huge caches to cover it's design weaknesses, and excessive power consumption.
I'll give you the power consumption, but take exception to everything else. The 21264's IPC was good, even by modern standards. The original 21264 got 626 SPECint_base per GHz and 844 SPECfp_base per GHz. The 21364 got 766 and 1299 for the same benchmarks. These figures put the original '264 substantially ahead in IPC versus its competition, and makes the '364 very competitive with modern processors like the Opteron and Power5. As for cache --- large caches aren't a design flaw, they are ncessary to sustain a chip that is substantially faster than memory. The P4 actually has fairly small caches, if you consier the fact that its L1 caches are tiny.
One more thing about the $900 price. It doesn't count the cost of OS X. How much does the development of OS X and the iApps add to the cost of the system? It's hard to say, but consider that its rumored that Dell pays $30 for each Windows license. The total cost to develop OS X and the iApps is probably not less than the cost to develop Windows, unless Apple's developers are much more talanted that Microsoft's. However, Apple ships far fewer copies of OS X than Microsoft does of Windows. Given Apple's marketshare, they probably ship 1/20 as many copies of OS X as Microsoft does copies of Windows. Even if we assume OS X costs only half as much as Windows to develop, and that of the $30 Microsof charges Dell, the marginal cost to them is only $20, that still means OS X adds $200 to each machine Apple sells.
To be fair, that research firm failed to account for quite a few things in each machine's cost. They didn't count the keyboard, mouse, packaging, etc. In reality, the iMac 20" is very competitively priced in its market. Try pricing out a Dell Dimension XPS200 (the most small-form-factor Dell desktop I could find), with a 3GHz Pentium-D, X600SE, 250GB HDD, 20" CD, dual-layer DVD burner, remote, (basically, close to the iMac's specs). The machine costs $1658, or about $42 less than the iMac, and has a much slower graphics card (1600xt versus x600se) to boot.
1) SPEC isn't a "PC fanboy" benchmark. It is the industry standard benchmark for comparing processor performance. That's why every press release about a new CPU is accompanied with "the processor has an estimated SPECint/etc of blah". The reason why The G4/G5 do so poorly in SPEC is because they really aren't very good general-purpose processors. SPEC is generic C code, not optimized for any specific platform. It's also compiler-dependent, and compilers for PPC aren't as good as compilers for x86 (especially GCC). The G5, in particular, can be quite fast on code painstakingly optimized for it, but the simple truth is that most code is not so optimized.
2) Application benchmarks like this one are completely useless for comparing processors. They don't run the same code on each machine. How many hours went into tuning Quicktime for Altivec on the G5? How many went into tuning Quicktime for SSE on the Core Duo? Do the benchmarks even use SSE on the Core Duo? Are they multithreaded?
3) As for x86 versus the G5 --- my 2.2 GHz Athlon64 dual-core is significantly faster (20%) than my 2.3 GHz G5 dual-core. For some important software I use (GCC), it's 50% faster. Each Athlon64 core is nearly identical to the Opteron core that came out in 2003. Each G5 core is actually an improvement over the G5 that came out in 2003, having twice the cache. Its safe to say x86 has been schooling the G5 pretty much just months after it was released.
Let's try this again. A 30W Yonah laptop chip results in a system that is 10-20% faster overall than a 60W G5 desktop chip. The CPU itself is twice as fast, and the resulting speedup is entirely consistent with software that isn't multithreaded (thus using only one core), and software that has been heavily optimized for PowerPC over the years, and is relatively immature on x86. I say the overall result is quite embarrassing for the PowerPC line.
What's more embarrassing? This post to xcode-users, which says that XCode 2.2 on the new iMac is just a hair slower than the Quad. This is pretty entertaining too --- apparently the 1.83 GHz Yonah (which appears in the MBP) is faster at compiling than the dual 2.7 GHz G5.
I've got a dual G5, and I use Matlab a lot, and I prefer to use it on my dual core amd64. Matlab code often contains a very large integer component (since Matlab itself is an interpreted language), and for most tasks an Opteron is faster. Just try the built-in "bench" utility on a G5 versus an Opteron. The G5's FPU performance also suffers heavily if the data is in complex data structures in memory. The PowerMac's memory controller is rather high latency, which means accessing big structured data sets is slow on the G5. The G5 is just fine if your code is low-level and your data is simple arrays (eg: signal processing), but there is a lot of scientific and especially engineering code that doesn't fit that mold.
I strain to see how you get "the G5 is still quite the chip" from those benchmarks. Overall, the benchmarks suck. The xbench results are inconsistent (look how low the UI result is for the Intel Mac, yet read the review which says the Intel Mac is snappier than the G5 iMac). The Rosetta results are irrelevent, since they test a 512MB iMac against a 4.5 GB PowerMac. 512MB is already marginal for OS X, Rosetta (being a JIT) is going to eat up a lot of RAM on top of that. So what's left for Photoshop? The results are completely meaningless.
The only two decent results are the QuickTime and iTunes results. The Intel Mac makes quite a strong showing, even considering the fact that Quicktime is a piece of software that has been highly optimized for PPC/AltiVec over the years.
The only thing you'll miss with an X800-class card is some of the more advanced features like parallex mapping. All the particle stuff and deformation stuff is in there in PS2.0.
You don't need to spend $500 for a card to handle today's games. I just got an X800 GTO for the princely sum of $185 (and that was the more expensive fanless model). It plays F.E.A.R just fine. These days, the best deals to be had are to buy the previous generator's high-end cards once every year or two. Buying a $150 card every year will get you a lot better average-gaming-experience than buying a $450 card and replacing it after three years.
I don't know if you remember correctly. Back in the day, the Voodoo 2 came out at the astounding price of $200. Same for the Voodoo 1. I remember Canopus released a really expensive Voodoo 2 bundle for $250. Back then, you could spring for an SLI Voodoo 2 rig (with an astounding 16MB of net graphics memory) for $400. These days, $400 won't even buy you a single 7800GTX.
Most of the new advances these days have to do with shader support, and glslang (GL shading language) is quite a bit more general than pixel shaders 3.0.
The slashdot cynicism about improvements in graphics is getting old. Have you played F.E.A.R? I'm not much of a gamer (indeed, I'm not a gamer at all --- aside from the occasional Halo match, the last game I played before F.E.A.R was Xenosaga a couple of years ago) but it really it really is an example of new technology (graphics, physics, AI) being used to create a better gaming experience. The F.E.A.R graphics and animation engine really does a good job of reproducing the weight of people and objects, making them realistic and making the story that much more engrossing. The flying shell casings and special effects really do add to the gun battle scenes, and shadows and lighting are used to good effect to create a creepy mood.
They use gigabits, because like RAM modules, a flash module contains multiple memory chips. Your 256MB RAM stick is usually made using 8 256 megabit modules. Similarly, the 16 gigabit flash chip will be used in devices featuring a lot more than 2GB of total storage.
The reason people are excited about Apple switching to Intel processors is because people who use Apple machines for OS X now no longer have to compromise hardware performance. There is also the potential of running Windows at full speed in VMWare, greatly reducing the software-related hassles of switching.
As for gamers --- who cares about gaming? That's not Apple's market, and it doesn't make a lot of sense for them to persue it.
Um, you're not going to find dual-core 1.8 GHz processors in a $500 Dell. Just go on Dell.com and try building a machine comparable to the 20" iMac. I ended up with an XPS 200 with a 3.2 GHz P4 dual-core, 128MB X600SE graphics card, 512MB of DDR2-533, and a 20" 2005FPW monitor, for $1789. For $1700, the iMac G5 20" has a faster CPU (2.0 GHz Intel Core), a faster graphics card (128MB X1600), and faster RAM (512MB of DDR2-667).
The 9 stage length is in the case of a load/store instruction. Since it is usual for the minimal pipeline length to be reported (ie: for single-cycle integer instructions), 7 stages is a more accurate figure for the Alpha.
Dell doesn't have any dual-core processors in the $400-$500 range. heck, the dual-core 2.0 GHz processor in the iMac retails for over $400 by itself. Anand's benchmarks put it on a pair with an Athlon X2 3800+ or a P4 3.0-3.2 GHz.
Which Alpha had a 9 stage pipeline? The '064-'264 were all 7-stage designs. The 604 had 6 pipeline stages, not 4. The Pentium Pro had 10 pipeline stages, not 14 (though, some sources erroneously report 14). US I and US II had 9 stage pipelines.
All processors lengthen the pipeline in order to reach higher frequencies. It's a basic engineering tradeoff. The difference between the P4 and the Alpha was that the P4 massively traded-off IPC for clock-speed, while the Alpha didn't. The Alpha's 7 stage pipeline (which was substantially longer than only the simplist RISC chips) was offset by sophisticated branch prediction (for the time), so its IPC remained quite high. The P4's long pipeline wasn't offset by correspondingly good branch prediction. Thus, the P4's IPC is only about 2/3's as much as a 21264's.
The Opteron isn't the "evolution" of the Alpha, but it takes many many basic design cues from the Alpha. An Alpha and an Opteron are much more similar to each other than either are to a P4. The Opteron's design is *very* similar to a 21264's, especially in the back-end. Consider:
1) Both CPUs cluster their integer units into ALU/AGU pairs. The 21264 has two such pairs, the Opteron has three such pairs.
2) Both CPUs use 64KB/64KB L1 i/d caches with 2-way set associativity.
3) Both CPUs have a similar number of in-flight instructions (72 versus 80).
4) Both CPUs move some decode work to the i-cache by using pre-decode bits in the cache. Both use the same number (3) of decode bits.
5) Both CPUs have asymmetric dual FPUs, one for FADD one for FMUL. The 21264 puts complex float instructions (sin, cos, sqrt) in the FADD unit, the Opteron puts it in the FMUL unit.
6) Both CPUs handle register renaming in the FPU pipeline almost exactly the same way. The Athlon's FPU pipeline is basically the 21264's with one extra unit (the FPMISC unit), some extra registers (for SSE), and some extra pipeline stages (since it runs at a much higher clockspeed).
7) Both CPUs use a memory crossbar.
The Opteron and 21264 are cleary different architectures --- the decoders are (necessarily) completely different, the two CPUs handle register renaming in the integer units completely differently, and the load-store units and TLBs in the two CPUs are different. However, its hard to miss the similarities in the high-level organization of the two chips. To a first approximation, calling the Opteron the offspring of a K6 and a 21264 would not be a strech.
All the Alphas had a 7-stage pipeline. This was long compared to, say, a PPC 601 or a 603, but the Alpha was also a much more sophisticated CPU than those chips. Compared to chips like the UltraSPARC I and II (in-order designs), which had 9-stage pipelines, and the PPC 604e, which had a 6-stage pipeline, the 7-stage length of the 21164 and 21264 was quite reasonable. This is especially true for the '264, whose 7 stage pipeline is quite short for a massively out-of-order 4-issue processor.
AMD's Opterons aren't Alpha's and it's a good thing. Alpha's sucked and the P4 looks much more like and Alpha than the Opterons do.
Um, what? You're right that the Opterons aren't Alpha, but wrong in saying they are closer to the P4. The Opteron's FPU pipeline looks incredibly like the 21264's, right down to the dual assymetric pipes. The load/store setup and L1 cache setup of the two architectures are very similar, down to the 64kb/64kb L1 cache sizes. The 21264's integer pipeline is much shorter than the Opterons (7 versus 12), and much much shorter than the P4's (20+). The 21264, in its more modern iterations (eg: the 21264B and 21264C) are higher IPC designs than the Opteron, in both integer and FP. What is your basis for the claim that the Alpha is closer to the P4?
haha, Alpha had the grimmest, most threadbare instruction set imaginable. It's strength was it's ferocious clock rates that were enabled by abnormally deep pipelines and instructions that did relatively little (no integer divide!).
The 21264's pipeline's were actually quite shallow for an out-of-order processor. At 7 stages, it was closer to the Pentium's 5 than the Pentium Pro's 10. It was quite a bit shorter than the pipelines of any modern, comparable processor. The PPC970 has a 16 stage pipeline, the UltraSPARC III has a 14-stage pipeline, the SPARC64V has an 11 stage pipeline, the Opteron has a 12 stage pipeline, etc.
relatively poor IPC, very deep pipelines, very high clockrates, huge caches to cover it's design weaknesses, and excessive power consumption.
I'll give you the power consumption, but take exception to everything else. The 21264's IPC was good, even by modern standards. The original 21264 got 626 SPECint_base per GHz and 844 SPECfp_base per GHz. The 21364 got 766 and 1299 for the same benchmarks. These figures put the original '264 substantially ahead in IPC versus its competition, and makes the '364 very competitive with modern processors like the Opteron and Power5. As for cache --- large caches aren't a design flaw, they are ncessary to sustain a chip that is substantially faster than memory. The P4 actually has fairly small caches, if you consier the fact that its L1 caches are tiny.
One more thing about the $900 price. It doesn't count the cost of OS X. How much does the development of OS X and the iApps add to the cost of the system? It's hard to say, but consider that its rumored that Dell pays $30 for each Windows license. The total cost to develop OS X and the iApps is probably not less than the cost to develop Windows, unless Apple's developers are much more talanted that Microsoft's. However, Apple ships far fewer copies of OS X than Microsoft does of Windows. Given Apple's marketshare, they probably ship 1/20 as many copies of OS X as Microsoft does copies of Windows. Even if we assume OS X costs only half as much as Windows to develop, and that of the $30 Microsof charges Dell, the marginal cost to them is only $20, that still means OS X adds $200 to each machine Apple sells.
To be fair, that research firm failed to account for quite a few things in each machine's cost. They didn't count the keyboard, mouse, packaging, etc. In reality, the iMac 20" is very competitively priced in its market. Try pricing out a Dell Dimension XPS200 (the most small-form-factor Dell desktop I could find), with a 3GHz Pentium-D, X600SE, 250GB HDD, 20" CD, dual-layer DVD burner, remote, (basically, close to the iMac's specs). The machine costs $1658, or about $42 less than the iMac, and has a much slower graphics card (1600xt versus x600se) to boot.
1) SPEC isn't a "PC fanboy" benchmark. It is the industry standard benchmark for comparing processor performance. That's why every press release about a new CPU is accompanied with "the processor has an estimated SPECint/etc of blah". The reason why The G4/G5 do so poorly in SPEC is because they really aren't very good general-purpose processors. SPEC is generic C code, not optimized for any specific platform. It's also compiler-dependent, and compilers for PPC aren't as good as compilers for x86 (especially GCC). The G5, in particular, can be quite fast on code painstakingly optimized for it, but the simple truth is that most code is not so optimized.
2) Application benchmarks like this one are completely useless for comparing processors. They don't run the same code on each machine. How many hours went into tuning Quicktime for Altivec on the G5? How many went into tuning Quicktime for SSE on the Core Duo? Do the benchmarks even use SSE on the Core Duo? Are they multithreaded?
3) As for x86 versus the G5 --- my 2.2 GHz Athlon64 dual-core is significantly faster (20%) than my 2.3 GHz G5 dual-core. For some important software I use (GCC), it's 50% faster. Each Athlon64 core is nearly identical to the Opteron core that came out in 2003. Each G5 core is actually an improvement over the G5 that came out in 2003, having twice the cache. Its safe to say x86 has been schooling the G5 pretty much just months after it was released.
Let's try this again. A 30W Yonah laptop chip results in a system that is 10-20% faster overall than a 60W G5 desktop chip. The CPU itself is twice as fast, and the resulting speedup is entirely consistent with software that isn't multithreaded (thus using only one core), and software that has been heavily optimized for PowerPC over the years, and is relatively immature on x86. I say the overall result is quite embarrassing for the PowerPC line.
What's more embarrassing? This post to xcode-users, which says that XCode 2.2 on the new iMac is just a hair slower than the Quad. This is pretty entertaining too --- apparently the 1.83 GHz Yonah (which appears in the MBP) is faster at compiling than the dual 2.7 GHz G5.
The Alpha is only 14 years old at this point.The first was released in 1992.
That's true, but that's not really a chip design triumph so much as it is a process engineering and circuit layout triumph.
I've got a dual G5, and I use Matlab a lot, and I prefer to use it on my dual core amd64. Matlab code often contains a very large integer component (since Matlab itself is an interpreted language), and for most tasks an Opteron is faster. Just try the built-in "bench" utility on a G5 versus an Opteron. The G5's FPU performance also suffers heavily if the data is in complex data structures in memory. The PowerMac's memory controller is rather high latency, which means accessing big structured data sets is slow on the G5. The G5 is just fine if your code is low-level and your data is simple arrays (eg: signal processing), but there is a lot of scientific and especially engineering code that doesn't fit that mold.
I strain to see how you get "the G5 is still quite the chip" from those benchmarks. Overall, the benchmarks suck. The xbench results are inconsistent (look how low the UI result is for the Intel Mac, yet read the review which says the Intel Mac is snappier than the G5 iMac). The Rosetta results are irrelevent, since they test a 512MB iMac against a 4.5 GB PowerMac. 512MB is already marginal for OS X, Rosetta (being a JIT) is going to eat up a lot of RAM on top of that. So what's left for Photoshop? The results are completely meaningless.
The only two decent results are the QuickTime and iTunes results. The Intel Mac makes quite a strong showing, even considering the fact that Quicktime is a piece of software that has been highly optimized for PPC/AltiVec over the years.
The only thing you'll miss with an X800-class card is some of the more advanced features like parallex mapping. All the particle stuff and deformation stuff is in there in PS2.0.
PC RPG's suck goat nuts. If I wanted a game so damn non-linear as to have no particular storyline, I'd go outside and wander around...
You don't need to spend $500 for a card to handle today's games. I just got an X800 GTO for the princely sum of $185 (and that was the more expensive fanless model). It plays F.E.A.R just fine. These days, the best deals to be had are to buy the previous generator's high-end cards once every year or two. Buying a $150 card every year will get you a lot better average-gaming-experience than buying a $450 card and replacing it after three years.
I don't know if you remember correctly. Back in the day, the Voodoo 2 came out at the astounding price of $200. Same for the Voodoo 1. I remember Canopus released a really expensive Voodoo 2 bundle for $250. Back then, you could spring for an SLI Voodoo 2 rig (with an astounding 16MB of net graphics memory) for $400. These days, $400 won't even buy you a single 7800GTX.
Most of the new advances these days have to do with shader support, and glslang (GL shading language) is quite a bit more general than pixel shaders 3.0.
The slashdot cynicism about improvements in graphics is getting old. Have you played F.E.A.R? I'm not much of a gamer (indeed, I'm not a gamer at all --- aside from the occasional Halo match, the last game I played before F.E.A.R was Xenosaga a couple of years ago) but it really it really is an example of new technology (graphics, physics, AI) being used to create a better gaming experience. The F.E.A.R graphics and animation engine really does a good job of reproducing the weight of people and objects, making them realistic and making the story that much more engrossing. The flying shell casings and special effects really do add to the gun battle scenes, and shadows and lighting are used to good effect to create a creepy mood.
RAM modules are measured in bits you nimrod.
They use gigabits, because like RAM modules, a flash module contains multiple memory chips. Your 256MB RAM stick is usually made using 8 256 megabit modules. Similarly, the 16 gigabit flash chip will be used in devices featuring a lot more than 2GB of total storage.
The reason people are excited about Apple switching to Intel processors is because people who use Apple machines for OS X now no longer have to compromise hardware performance. There is also the potential of running Windows at full speed in VMWare, greatly reducing the software-related hassles of switching.
As for gamers --- who cares about gaming? That's not Apple's market, and it doesn't make a lot of sense for them to persue it.
Um, you're not going to find dual-core 1.8 GHz processors in a $500 Dell. Just go on Dell.com and try building a machine comparable to the 20" iMac. I ended up with an XPS 200 with a 3.2 GHz P4 dual-core, 128MB X600SE graphics card, 512MB of DDR2-533, and a 20" 2005FPW monitor, for $1789. For $1700, the iMac G5 20" has a faster CPU (2.0 GHz Intel Core), a faster graphics card (128MB X1600), and faster RAM (512MB of DDR2-667).