Well, that really depends on your perspective. It's great for people interested in doing advanced research on compiler technology, that runs our existing crappy C programs at the same speed on an architecture that makes life harder on the compiler. It sucks for people who are interested in doing research on compiler technology to make higher-level languages more competitive with low level ones.
I don't see the point of writing a super-compiler that can schedule C code at compile time, when processors can do that just fine at runtime. I think its far more interesting to focus on writing super-compilers that can make high-level languages perform better.
The G6 would've been a POWER5 derivative. The POWER5 is a massively out-of-order RISC. Itanium is an in-order VLIW. They share nothing in common. The IP would've been useless.
I wouldn't call the difference *tremendous*. On integer code, the Athlon 64 (running 64-bit code) is maybe 30% faster. Running 32-bit code, the difference is probably more like 20%, per clock. On floating-point code, the difference is maybe 10-15%, depending on how well-optimized the code is. Significant, yes, tremendous, no.
The new Macs are twice as fast as the old Macs because they have a dual-core processor. Yes, the current line of x86 CPUs is faster than the G5, but its not an enormous difference. I've got a dual core Athlon X2 and a dual core G5 (2.2 Ghz and 2.3 GHz, respectively), and the X2 is maybe 20% faster (less for floating-point, a bit more for integer).
People whose apps are CPU-bound? Eg: the SPEC results for the Core Duo are a pretty good indicator of GCC performance. The Core Duo is 3x as fast as the iMac G5 in SPECint, and according to the xcode mailing list, the Core Duo iMac is just a hair slower than a quad. There are lots of apps that are CPU-bound: 3D rendering, many scientific codes, etc. Things like SPEC are a good indicator for the performance of such apps.
Galactica isn't the same sort of show as Firefly. It's not supposed to be light natured, humorous, etc. I liked Firefly, too, but it was cleary a different type of show, even though both are Sci Fi (which is a stupid label, because you can have lots of different types of shows that just happen to be set in the future). I can't think of a lot of other sci fi to compare Galactica too, perhaps Deep Space 9, but comparing it to a drama like "The West Wing" would probably be a bit more accurate.
Regarding the jingoism: it's supposed to be like that. Remember the US after 9/11? Think about that times a thousand. What else do you think the society is going to be like after such an event? Regarding the characters: you're right they sometimes seem to be "characters", but I think that is one of the purposes of the show. Firefly was a show about a bunch of unique individuals. Galactica is a show about people and society. The characters themselves are less important than what they represent. Consider Adama. He has a personality, sure, but he often seems like an archetypal father figure. Well, that's because an archetypal father figure is precisely his role in the story.
That's the point of the article. The Macworld article never considered processor useage. They said the new Intel Mac is "10-20% faster" without considering whether their benchmarks used the full capacity of the processor. They claimed that Jobs' statement that the new Mac was "2x faster" was wrong because they got smaller speedups. What this article s howed is that if you used Macworld's methodology (showing benchmark results without showing processor usage) you could argue that the quad-core G5 is only 14% faster than the Intel iMac running Quicktime. They're not saying that such a conclusion is correct, they're using it as an example to show what conclusions you can arrive at if you use Macworld's logic.
The basic problem was that Macworld's benchmarks were not CPU benchmarks and didn't make full use of the second core in the Intel Mac. The '2x' number Apple said was for the CPU --- even SJ mentioned that it doesn't mean apps will be 2x faster since the disks and everything else are the same. This article shows that in cases where the benchmark is CPU bound, the new Intel Mac can be almost twice as fast.
Wash died when his job was done. Its a recurring theme in the movie. Book dies after he does his job (shelters them for the time being), the scientist in the tape from Miranda dies after she does his job (creating the tape with the evidence of the Alliance's doings) and Mr. Universe dies when his job is done (he leaves the tape for Mal). Wash dies when he gets the crew to the place where they can air the tape. For me at least, all of these things make sense overall.
As for Zoe, she doesn't freak out because it doesn't fit her character to freak out in the middle of a mission. She expresses her distress by nearly giving up on her own life when she runs out from behind the cover in front of the door in the fight against the Reavers.
The whole point of Galactica is that it doesn't try to make the future look, well, futuristic. A lot of Sci Fi series go to great lengths to create this complicated, detailed future world, and then have nothing left over to tell an interesting story. In Galactica, the "sci fi" aspect is merely a vehicle for the story. Aspects of the world that aren't pertinent to the story are kept from distracting from the main point. People wear ties because the clothing of the future really isn't important, and doesn't deliver any meaning to the story. Their leader is the "president", because that's what we can relate to. She could have been called "chancellor" or "director" or whatever, and it wouldn't make any difference, or add anything to the story. "President" shows that she's the leader of a formerly democratic society, and that's all that needs to be said.
The real reason to watch Galactica is for how it deals with people and society. There are a lot of interesting themes going on in the show. Some are fairly conventional (eg: tensions between military and civilian leadership), but are considered in an unconventional context. Others are less conventional (eg: Cylons hate humanity, yet at some level want to become more human). It's a very interesting drama, that just happens to be in space.
Adding 64-bit is not a mere port. 100+ registers to support renaming, major ISA changes (doubling GP registers and adding 4-level page tables) - the gain from implementing those is astounding. Easily more than a 4-issue will show.
The reason I consider 4-issue (+ micro-ops fusion) a bigger change than 64-bit is because the former is a much more fundamental change in the internal CPU design. The issue width is a fundamental property of a particular core design. Increasing the issue with exponentially increases complexity, and causes rippling changes throughout the entire CPU. The same thing for micro-ops fusion. Once the basic unit of instruction scheduling inside the CPU goes from being an instruction (as in the P4), to a batch of instructions (as in Merom), the nature of the core is changed.
Opteron is mostly a wider Athlon with an IMC. The Opteron has the same execution units as the Athlon, in the same layout, they are just wider. It handles register renaming using the same mechanism as the Athlon, the register file is just larger. It handles branch prediction the same way as the Athlon, the history buffer is just larger. It handles instruction scheduling and queing the same way as the Athlon, the queue is just deeper. Merom is a completely different core from the P-M. Changing the issue width changes how your handle dispatching instructions, which changes your your scheduling mechanism, which changes the layout of your execution units, etc.
1) Is good for getting revenge, but talented CPU designers are few and far between, and when working on tasks of this magnitude, mistakes will inevitably happen.
2) Increasing the headcount doesn't necessary make things better. Often times, it just slows things down.
3) Better software is an easy thing to demand, a harder thing to produce.
4) This is the ridiculous one. If they could improve the hardware/software domain intersection, they would. But its really hard. Really really hard. As an engineer, I can sympathize with their plight. Consider how airplanes are designed. In practice, an airplane is a highly coupled system, lying at the intersection of the disciplines of thermodynamics, structural analysis, aerodynamics, stability and control, etc. Actually designing an airplane as a fully coupled system is impossible, at least with our current level of science and capacity for thought. So the system gets decoupled, and where the various parts meet, there are inevitably imperfect joints. These manifest themselves as everything from quirky handling in some cases to lower engine efficiency than would otherwise be possible. Things don't always line up perfectly, some components are stronger (heavier/more expensive) then they need to be, some components aren't strong enough and thus need increased maintainence, etc. Approximations get made, things get simplified, and the final product has some bugs, simply because doing it any other way would be just plain impossible.
It's not just Intel. All commercial processors have errata. The Opteron family has 87 unique ones. The PPC970 and the MPC7447A (the G5 and the G4, from which Apple transitioned), have 24 and 26, respectively. CPU errata are usually well known, in contrast to software errata, are usually well-known, and most can be worked-around in software.
Its interesting to note that the various forms of the Opteron have, between them, 87 unique errata. That doesn't stop them, however, from being used in some very mission-critical systems.
All CPUs have issues. My G5 has at least 24 outstanding errata, and that's assuming IBM's move from 970FX (single-core) to 970MP (dual-core) was accomplished without adding a single erratum. The machine has yet to crash in months of useage.
I didn't say it was. My point was that Itanium, like the majority of other RISC architectures, have had limited success because they don't scale well down to smaller machines. Itanium is quite dependent on its huge L1 caches to offer competitive performance. Without those, its a very mediocre performer relative to consumer x86 chips. The same is true for Power5 or SPARC64 --- these chips offer very good performance partially because they can afford to have 400mm^2 dies with very expensive system interconnects. Strip those things away, by making the caches smaller and the interconnects simpler (eg: as IBM did with the PowerPC 970), and the chip that's left isn't that great relative to x86.
The Opteron shows up as a K8, but its just a very straightforward 64-bit translation of the K7 core, plus an integrated memory controller. The P4 64-bit and 32-bit are almost identical internally as well, which is why I categorize both as 7th generation cores. There are no 64-bit P-Ms.
Conroe/Merom are in a way more advanced than a modern Opteron. As I said, the Opteron isn't a new core, its a 64-bit K7 with an IMC. It's a very advanced chip, but its not "modern", in the sense that the basic design is quite old. Merom is a completely new design, different from either the P4 or the P6, although it draws elements from both.
My organization of the generational effects is below. The differences in the definition of the 7th and 8th generation make the categorization easier to correlate with the historical progression of RISC architectures.
3rd-6th: Same.
7th: uOps-based architecture doesn't really fit, since both the PPro and the K6 did translation to uOps (RISC86 in K6 lingo). A better distinguishing factor would be that both are massively out-of-order chips, having many more instructions in flight than the P6 core. This classification is useful because it allows these x86 chips to be grouped with other massively OOO RISC chips, like the 21264.
8th: Merom falls here, because its the first 4-issue x86 chip, and the first one to heavily use instruction grouping as a means of increasing the real issue rate. The K8 uses groups to a limited extent, in that it combines LOAD + OP uOps into a single "double" instruction, but Merom will use the technique much more extensively.
Um, why would Intel think we'd be using Itanium by now? If they'd release a good performing Itanium chip that was actually priced competitively with x86, I'm sure there'd be a lot of takers. The basic problem with Itanium (and most RISCS, btw) is that you cannot make one at a workstation pricepoint that competes with x86 in terms of performance. Without being able to offer that, how do they expect people to switch?
iMacs are immediately available. Your local Apple store may just suck. Order from Amazon --- they've got a $150 discount on 20" iMacs, and they're showing 24 hour shipping dates.
Actually, you're right that the 601 and 603e, as well as the G3 have a 4-stage pipeline. The 604e has a 6-stage pipeline to support its larger instruction window.
Quibble: P3 and P-M are 6-th generation (PPro-based chips). Athlon/Opteron are 7-th generation (K7-based chips), as is the P4 (32-bit or 64-bit). Conroe/Merom will be Intel's first true 8th generation chip. AMD doesn't have an 8th generation chip yet, though a new core is planned for '07.
Yonah is already almost clock-for-clock equal in performance with the X2, and it doesn't even have an on-die memory controller. Conroe and Merom will almost assuredly turn the tables on AMD for the immediate future in performance.
The Opteron is still a very fast chip in floating-point, when measured using the workstation programs in SPECfp instead of consumer programs like AnandTech used. The Opteron's per-clock FP performance is about 40% higher than a Pentium-M's. Yonah is probably better, and Conroe may be better still, but its far from assured.
AMD reports peak values, Intel reports average values. Take an Athlon64 4400+ rated at 110W, and put it next to a P4 rated at 130W. Compare the total system power draw. There is no *way* the P4 uses only 20 more watts. 130W-rated P4s have been tested as drawing 170W+ during peak loads.
I'd have to content that point. No P4 comes close to a fast Opteron for doing "real work" (scientific calculations, etc). In 64-bit mode, programs like Matlab absolutely scream on an Opteron.
Well, that really depends on your perspective. It's great for people interested in doing advanced research on compiler technology, that runs our existing crappy C programs at the same speed on an architecture that makes life harder on the compiler. It sucks for people who are interested in doing research on compiler technology to make higher-level languages more competitive with low level ones.
I don't see the point of writing a super-compiler that can schedule C code at compile time, when processors can do that just fine at runtime. I think its far more interesting to focus on writing super-compilers that can make high-level languages perform better.
The G6 would've been a POWER5 derivative. The POWER5 is a massively out-of-order RISC. Itanium is an in-order VLIW. They share nothing in common. The IP would've been useless.
I wouldn't call the difference *tremendous*. On integer code, the Athlon 64 (running 64-bit code) is maybe 30% faster. Running 32-bit code, the difference is probably more like 20%, per clock. On floating-point code, the difference is maybe 10-15%, depending on how well-optimized the code is. Significant, yes, tremendous, no.
The new Macs are twice as fast as the old Macs because they have a dual-core processor. Yes, the current line of x86 CPUs is faster than the G5, but its not an enormous difference. I've got a dual core Athlon X2 and a dual core G5 (2.2 Ghz and 2.3 GHz, respectively), and the X2 is maybe 20% faster (less for floating-point, a bit more for integer).
OS X's multithreading was improved a lot in Tiger, with finer-grained locking.
Yes, but who really cares about CPU benchmarks?
People whose apps are CPU-bound? Eg: the SPEC results for the Core Duo are a pretty good indicator of GCC performance. The Core Duo is 3x as fast as the iMac G5 in SPECint, and according to the xcode mailing list, the Core Duo iMac is just a hair slower than a quad. There are lots of apps that are CPU-bound: 3D rendering, many scientific codes, etc. Things like SPEC are a good indicator for the performance of such apps.
The basic problem is that the main pieces of code in Darwin (Mach and 4.4BSD) are no longer maintained independently.
Galactica isn't the same sort of show as Firefly. It's not supposed to be light natured, humorous, etc. I liked Firefly, too, but it was cleary a different type of show, even though both are Sci Fi (which is a stupid label, because you can have lots of different types of shows that just happen to be set in the future). I can't think of a lot of other sci fi to compare Galactica too, perhaps Deep Space 9, but comparing it to a drama like "The West Wing" would probably be a bit more accurate.
Regarding the jingoism: it's supposed to be like that. Remember the US after 9/11? Think about that times a thousand. What else do you think the society is going to be like after such an event? Regarding the characters: you're right they sometimes seem to be "characters", but I think that is one of the purposes of the show. Firefly was a show about a bunch of unique individuals. Galactica is a show about people and society. The characters themselves are less important than what they represent. Consider Adama. He has a personality, sure, but he often seems like an archetypal father figure. Well, that's because an archetypal father figure is precisely his role in the story.
That's the point of the article. The Macworld article never considered processor useage. They said the new Intel Mac is "10-20% faster" without considering whether their benchmarks used the full capacity of the processor. They claimed that Jobs' statement that the new Mac was "2x faster" was wrong because they got smaller speedups. What this article s howed is that if you used Macworld's methodology (showing benchmark results without showing processor usage) you could argue that the quad-core G5 is only 14% faster than the Intel iMac running Quicktime. They're not saying that such a conclusion is correct, they're using it as an example to show what conclusions you can arrive at if you use Macworld's logic.
The basic problem was that Macworld's benchmarks were not CPU benchmarks and didn't make full use of the second core in the Intel Mac. The '2x' number Apple said was for the CPU --- even SJ mentioned that it doesn't mean apps will be 2x faster since the disks and everything else are the same. This article shows that in cases where the benchmark is CPU bound, the new Intel Mac can be almost twice as fast.
Wash died when his job was done. Its a recurring theme in the movie. Book dies after he does his job (shelters them for the time being), the scientist in the tape from Miranda dies after she does his job (creating the tape with the evidence of the Alliance's doings) and Mr. Universe dies when his job is done (he leaves the tape for Mal). Wash dies when he gets the crew to the place where they can air the tape. For me at least, all of these things make sense overall.
As for Zoe, she doesn't freak out because it doesn't fit her character to freak out in the middle of a mission. She expresses her distress by nearly giving up on her own life when she runs out from behind the cover in front of the door in the fight against the Reavers.
The whole point of Galactica is that it doesn't try to make the future look, well, futuristic. A lot of Sci Fi series go to great lengths to create this complicated, detailed future world, and then have nothing left over to tell an interesting story. In Galactica, the "sci fi" aspect is merely a vehicle for the story. Aspects of the world that aren't pertinent to the story are kept from distracting from the main point. People wear ties because the clothing of the future really isn't important, and doesn't deliver any meaning to the story. Their leader is the "president", because that's what we can relate to. She could have been called "chancellor" or "director" or whatever, and it wouldn't make any difference, or add anything to the story. "President" shows that she's the leader of a formerly democratic society, and that's all that needs to be said.
The real reason to watch Galactica is for how it deals with people and society. There are a lot of interesting themes going on in the show. Some are fairly conventional (eg: tensions between military and civilian leadership), but are considered in an unconventional context. Others are less conventional (eg: Cylons hate humanity, yet at some level want to become more human). It's a very interesting drama, that just happens to be in space.
Adding 64-bit is not a mere port. 100+ registers to support renaming, major ISA changes (doubling GP registers and adding 4-level page tables) - the gain from implementing those is astounding. Easily more than a 4-issue will show.
The reason I consider 4-issue (+ micro-ops fusion) a bigger change than 64-bit is because the former is a much more fundamental change in the internal CPU design. The issue width is a fundamental property of a particular core design. Increasing the issue with exponentially increases complexity, and causes rippling changes throughout the entire CPU. The same thing for micro-ops fusion. Once the basic unit of instruction scheduling inside the CPU goes from being an instruction (as in the P4), to a batch of instructions (as in Merom), the nature of the core is changed.
Opteron is mostly a wider Athlon with an IMC. The Opteron has the same execution units as the Athlon, in the same layout, they are just wider. It handles register renaming using the same mechanism as the Athlon, the register file is just larger. It handles branch prediction the same way as the Athlon, the history buffer is just larger. It handles instruction scheduling and queing the same way as the Athlon, the queue is just deeper. Merom is a completely different core from the P-M. Changing the issue width changes how your handle dispatching instructions, which changes your your scheduling mechanism, which changes the layout of your execution units, etc.
1) Is good for getting revenge, but talented CPU designers are few and far between, and when working on tasks of this magnitude, mistakes will inevitably happen.
2) Increasing the headcount doesn't necessary make things better. Often times, it just slows things down.
3) Better software is an easy thing to demand, a harder thing to produce.
4) This is the ridiculous one. If they could improve the hardware/software domain intersection, they would. But its really hard. Really really hard. As an engineer, I can sympathize with their plight. Consider how airplanes are designed. In practice, an airplane is a highly coupled system, lying at the intersection of the disciplines of thermodynamics, structural analysis, aerodynamics, stability and control, etc. Actually designing an airplane as a fully coupled system is impossible, at least with our current level of science and capacity for thought. So the system gets decoupled, and where the various parts meet, there are inevitably imperfect joints. These manifest themselves as everything from quirky handling in some cases to lower engine efficiency than would otherwise be possible. Things don't always line up perfectly, some components are stronger (heavier/more expensive) then they need to be, some components aren't strong enough and thus need increased maintainence, etc. Approximations get made, things get simplified, and the final product has some bugs, simply because doing it any other way would be just plain impossible.
It's not just Intel. All commercial processors have errata. The Opteron family has 87 unique ones. The PPC970 and the MPC7447A (the G5 and the G4, from which Apple transitioned), have 24 and 26, respectively. CPU errata are usually well known, in contrast to software errata, are usually well-known, and most can be worked-around in software.
Its interesting to note that the various forms of the Opteron have, between them, 87 unique errata. That doesn't stop them, however, from being used in some very mission-critical systems.
All CPUs have issues. My G5 has at least 24 outstanding errata, and that's assuming IBM's move from 970FX (single-core) to 970MP (dual-core) was accomplished without adding a single erratum. The machine has yet to crash in months of useage.
I didn't say it was. My point was that Itanium, like the majority of other RISC architectures, have had limited success because they don't scale well down to smaller machines. Itanium is quite dependent on its huge L1 caches to offer competitive performance. Without those, its a very mediocre performer relative to consumer x86 chips. The same is true for Power5 or SPARC64 --- these chips offer very good performance partially because they can afford to have 400mm^2 dies with very expensive system interconnects. Strip those things away, by making the caches smaller and the interconnects simpler (eg: as IBM did with the PowerPC 970), and the chip that's left isn't that great relative to x86.
The Opteron shows up as a K8, but its just a very straightforward 64-bit translation of the K7 core, plus an integrated memory controller. The P4 64-bit and 32-bit are almost identical internally as well, which is why I categorize both as 7th generation cores. There are no 64-bit P-Ms.
Conroe/Merom are in a way more advanced than a modern Opteron. As I said, the Opteron isn't a new core, its a 64-bit K7 with an IMC. It's a very advanced chip, but its not "modern", in the sense that the basic design is quite old. Merom is a completely new design, different from either the P4 or the P6, although it draws elements from both.
My organization of the generational effects is below. The differences in the definition of the 7th and 8th generation make the categorization easier to correlate with the historical progression of RISC architectures.
3rd-6th: Same.
7th: uOps-based architecture doesn't really fit, since both the PPro and the K6 did translation to uOps (RISC86 in K6 lingo). A better distinguishing factor would be that both are massively out-of-order chips, having many more instructions in flight than the P6 core. This classification is useful because it allows these x86 chips to be grouped with other massively OOO RISC chips, like the 21264.
8th: Merom falls here, because its the first 4-issue x86 chip, and the first one to heavily use instruction grouping as a means of increasing the real issue rate. The K8 uses groups to a limited extent, in that it combines LOAD + OP uOps into a single "double" instruction, but Merom will use the technique much more extensively.
Um, why would Intel think we'd be using Itanium by now? If they'd release a good performing Itanium chip that was actually priced competitively with x86, I'm sure there'd be a lot of takers. The basic problem with Itanium (and most RISCS, btw) is that you cannot make one at a workstation pricepoint that competes with x86 in terms of performance. Without being able to offer that, how do they expect people to switch?
iMacs are immediately available. Your local Apple store may just suck. Order from Amazon --- they've got a $150 discount on 20" iMacs, and they're showing 24 hour shipping dates.
Actually, you're right that the 601 and 603e, as well as the G3 have a 4-stage pipeline. The 604e has a 6-stage pipeline to support its larger instruction window.
Quibble: P3 and P-M are 6-th generation (PPro-based chips). Athlon/Opteron are 7-th generation (K7-based chips), as is the P4 (32-bit or 64-bit). Conroe/Merom will be Intel's first true 8th generation chip. AMD doesn't have an 8th generation chip yet, though a new core is planned for '07.
Yonah is already almost clock-for-clock equal in performance with the X2, and it doesn't even have an on-die memory controller. Conroe and Merom will almost assuredly turn the tables on AMD for the immediate future in performance.
The Opteron is still a very fast chip in floating-point, when measured using the workstation programs in SPECfp instead of consumer programs like AnandTech used. The Opteron's per-clock FP performance is about 40% higher than a Pentium-M's. Yonah is probably better, and Conroe may be better still, but its far from assured.
AMD reports peak values, Intel reports average values. Take an Athlon64 4400+ rated at 110W, and put it next to a P4 rated at 130W. Compare the total system power draw. There is no *way* the P4 uses only 20 more watts. 130W-rated P4s have been tested as drawing 170W+ during peak loads.
I'd have to content that point. No P4 comes close to a fast Opteron for doing "real work" (scientific calculations, etc). In 64-bit mode, programs like Matlab absolutely scream on an Opteron.