G4 vs. Athlon Review
heatseeka writes "There is a great article at Ars Technica comparing the Motorola G4 and the AMD Athlon. They discuss every detail of the design of the CPU's, and give credit where credit is due. " Hannibal does a great job dissecting the different chips, as well as explaining the background behind each chip.
IBM and Motorola are working on putting multiple PowerPC cores onto a single die. IBM has already done this with it's Power CPU's (a sibling to the PowerPC). This is feasible with the PowerPC, since its power consumption is so very low. A G3 at 400 MHz (on .22 micron process) for example uses 8 W max, 5 W on average. A single PIII or Athlon uses at least 4-5 times that much on average. This is due in large part to the complex instruction set that must be decoded and executed. With IBM's and Moto's superior Copper interconnect and SOI technologies, the power comsumption and core size can be reduced further, allowing even more cores on each die. Modern Multitasking, multiprocessing OS's with well written multithreaded apps will scream on these multiple-core CPU's.
AMD doesn't have the luxury of being able to throw away (x86 ISA) whatever it wants to. They are competing with Intel for the PC market. The G4 is a lock to appear in new Macs.
This artice is very informative with respect to the architectures but a useful follow up would be to look at the performance in practice. Both of these processors can be used to run Linux and it would be rather interesting to see how a pair of workstations faired in a side-by-side test.
While such a test would be interesting I expect that the results would, in practice, be as much a test of compiler maturity rather than a test of the speed of the underlying system. Despite the best efforts of the processor designers (out-of-order execution and all) these sorts of processor tend to be very sensitive to the compiler technology. Furthermore many of the multimedia and vector processing performance enhancments (SMID vs Altivec) really need to be accessed from assembler at the moment.
Still, rather interesting stuff.
If intelligent life is too complex to evolve on its own, who designed God?
Let me prefece all this with, I know very little about how these things work exactly so bear with me.
It would seem from the article that although the K7 & the 7400 are pretty comaparble at the moment, the 7400 would have much more room to grow as well as being a much more efficient chip.
To me the fact that the K7 has to decode all this x86 legacy stuff would suggest that the K7 is basically like a brute little monster truck that basically rampages over its flaws by packing a lot of punch, in this case by basically bumping on more and more transitors.
The 7400 seems to produce a sleeker more elegant solution to the whole thing. It's more the sleek sports car with speed, and elegant, efficient power VS the brute force of the K7. So I guess in that regard, the 7400 wins out on efficiency and future sustained growth potential...
Must be said though that a mate of mine owns an Athlon and it rocks the house down, so even if it's like a brute little monster truck instead of the sleek sports car of the G4, it still packs a pretty hefty punch. I guess they both kinda rock the house down...
The Athlon is an amazing chip, even more so given the need to maintain backwards compatibility with real-mode X86 code and the hack that is MMX. The only performance improvements I really expect to see going forward in X86 architecture are going to be due to process improvement rather than architectural development. MMX and 3DNOW are kludges on the architecture. In light of that, Athlon stands out even more.
The G4, though, has the advantage of being a lighter-weight chip (fewer transistors needed, fewer instructions, less microcode). As for speed, RISC versus CISC aside, the Motorola/IBM designs have not shown the ability to drive the high clock speeds that Intel and AMD are playing with. Until about a year ago, the two were neck-and-neck, but the X86 chips are now up around 800 MHz while the G4 is just passing 500 now. But given the efficiency of not having to deal with all the microcoded X86 instructions the G4 minimizes the difference in a well-implemented OS.
Another thing to keep in mind (mentioned in the article) is that the G4 is not strictly designed for desktop computers. PowerPC chips are very popular in the embedded market, where they go into single-board computers, automobiles, and all sorts of dedicated hardware. Sales to Apple alone wouldn't keep a chip family alive. Interestingly, Intel sells a lot of older 386 processors to the embedded market too - the too-cool Blackbery 2-way pagers use a 386 processor among other devices.
The best thing that PowerPC has going for it IMHO is that Motorola didn't build backwards compatibility with the M68K series processors. They made an architectural clean break - and the few companies that needed compatibility did it through emulation (parts of the MacOS are still in 68K code today). The ample shortcomings of the MacOS tend to cover up what is a first-rate processor family.
My suspicion as to the 'real' reason Intel has been funding Linux ventures is this: they know that Windows is hopelessly tied to X86, and they are hoping to eventually leave that baggage behind in the IA-64 architecture. Ultimately, X86 will be a drag on clock speeds.
Sorry to have rambled about here some, but I'm still a bit sleep-deprived from the weekend.
- -Josh Turiel
-- Josh Turiel
"2. Do not eat iPod Shuffle."
Not to boost your ego any more, but get a grip buddy. Have you seen any other articles that talk about this kind of material? It may be too simple for you, especially since I'm sure that you "know it all" already, but I think this article is a great example of what I would like to see more of on the 'net: some discussion rooted in real technology instead of PR claims and advocate BS. I don't know it all, I'll readily admit, and such an article is very useful for me to see the hows and whys and wherefores. But for you to say that there is "no real information here" is a joke. You're either an elitist prick, or someone who just wants to kick dirt at an informative article.
While it is one of the first truly unbiased and highly technical articles on the K7 and G4 chips (instead of rumors and performance "benchmark" drivel), it does not say that much about the chips in the end. It should have concluded with a stronger statement about the efficiency of the G4 chip.
The US Govt is often criticized for implementing obscenely expensive solutions to problems when simple ones would have done the job better. This can be applied the the K7 vs G4 question, for it is always better to have efficiency when the performance is the same.
Rumor has it that Intel is running the new Itanium chips currently with 30 watts of power consumption, over twice that of the G4. If I upgraded my motherboard to the itanium (tm), I would have to get a new case or power supply because of the incredible inefficiency of the chip. That is not novel engineering, but sluggish engineering, something which is not prized in this day in age.
"In individuals, insanity is rare, but in groups, parties, nations, and epochs it is the rule." -Nietzsche
True, but the gap will lessen (or disappear) in the near future. The G4 has been limited (in clock speed) by it's exceptionally short 4-stage pipeline. Motorola has demonstrated a version of the G4 with a longer 7-stage pipeline that hits much higher clock speeds (~700 MHz range at the demo - higher in production). Each stage is simpler and faster, resulting in the higher clock speed. The K7 already has a very deep pipeline, which is a large factor in its high clock speed.
is here
:)
It shows power consumption of the major chips in use. Note where the PPC chips are!
Enjoy.
Pope
It doesn't mean much now, it's built for the future.
As for speed, RISC versus CISC aside, the Motorola/IBM designs have not shown the ability to drive the high clock speeds that Intel and AMD are playing with.
The PowerPC doesn't need the high clock speeds of the Intel/AMD chips. On average, it does about twice as much per clock cycle than the X86 chips do.
Comparing clock speeds without consideration of clock efficiency is like comparing the version numbers of the various Linux distributions.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Motorola/IBM designs have not shown the ability to drive the high clock speeds
I think this is probably not an issue of the Microprocessor design as it is the fabrication capabilities.
Also, the clock speed is not a valid comparison between two different chips, especially two vastly different chips; what you get per MHz is not the same thing; otherwise you would be able to buy and 800MHz Z80. It's like the truck company that built a 2 stroke pickup: people got freaked by the fact it redlined at 4000 RPM.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
One important part of the PowerPC architecture which this article fails to mention, is the multiple condition code registers of the PPC. (These date back to the older Power architecture, BTW.)
Unlike all the other similar features mentioned in the article, these can not be retrofitted into the K7, because it is limited to the x86 instruction set, which does not have this concept.
Basically, any instruction which needs to check the result of an operation (such as a compare, or overflow from an arithmetic operation) has to use condition codes. But in a pipelined processor, the result of the operation usually has to wait until the instruction has finished going through the pipeline. Rather than wait this long to decide what to prefetch, branch prediction tries to guess whether or not the branch will be taken. The predictions are usually right, but not always. What if there is more than one such comparison close together, particularly if the result is not being used directly for a branch, but for a boolean expression?
What the PPC does is have multiple (7?) condition code registers. When an operation such as a compare is done, you select a condition code register to receive that result. In the same way that code can be optimized for RISC by interleaving multiple threads of operations such that the result of an operation isn't used until three or four instructions later, the condition code register usage can also be interleaved.
With out-of-order execution (OOO), the CPU automatically rearranges instructions to achieve this interleaved usage of registers. And thusly, the PPC will gain this advanatage with condition code register usage as well.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Moto didnt have a choice in whether to implement 68K opcodes in PPC, IBM owns PowerPC, Motorola is just licensing it. Trying to add an opcode to the PPC is a real nightmare because you have to get IBM's approval. Moto went to PPC beacuse Apple demanded it, not because it was better than the Moto RISC, the MC88110. Actually, clock for clock the '110 was faster than the 601, and it had a better bus design than the Power architecture, so they put the MC88110 bus on a Power core, and that became the PPC601. The '110 also had a graphics Execution unit as well as FP and Integer units. Of course, its internal design and transistor technology limited it to about 65MHz and it was a few years late, so Apple wanted an architecture with some industry backing, IBM. NeXT was well under way in designing a dual '110 machine, I wonder what ever happened to it. One CPU did the color Display Postscript, and the other ran the NextStep OS. I'm sure Jobs was having a bit of DejaVu when Moto couldnt deliver the 500MHz G4's on time/quantity.
Starman97@Gmail.com (bring it on spammers)
Also the G4+ will have the 2 Altivec units, 2 FPU units and 4 Integer units each with 32 dedicated registers. Plus it will use the 256bit data paths, integrate up to 1MB level 2 cache on to the die and support up to 4MB level 3 cache. Thats why the PPC will eventullaly pull away they have the space to do more.
They include the L2 cache on some chips, but not others and don't bother to mention size if the cache. Just look at the three different 200MHz PPros that each consume vayring amounts of power since they have 256KB, 512KB, and 1MB caches.
This would be a good graph if your main concern is raw power consumption of a normal processor purchase (I'm sure that you could get cacheless Athlons if you buy enough).
-- "Well, Hello, Mr. Fancy-pants. I've got news for you pal, you ain't in control but two things right now, Jack and s
Yes and back to the comment "comparing these two processors just isin't as exciting...because they don't run the same software" - no but there are metrics you can measure - ie float and int performance and someone has already done the work for you in SpecInt95 SpecFlt benchmarks.
Now, I am too tired from the weekend (partying Friday night and working early Saturday morning - y2k testing) to chase the links and give you the exact comparisons between the fastest of each chip or MHZ comparisons but someone has already done the testing and they can be easily found with a web search.
--cheers & happy new year!
Dan
Adults are obsolete children. - Dr. Seuss
Yep, along with a whole slew of other different factors that make the truly geeky give a damn about any of this crap. :)
Power consumption is *IMPORTANT* if you're making laptops or embedded systems, correct?
Pope
It doesn't mean much now, it's built for the future.
What would *really* be a nice boost is if it were possible to access the RISC component of the Athlon (ie. bypass the x86 decoder).
:-)
Basically, that would allow one to run legacy apps by allowing the Athlon to operate as a smokin' fast x86... and run new apps by allowing the Athlon to operate as a smokin' RISC machine.
Why does evryone assume that x86 is somehow bad? Has the Apple "RISC is god" propoganda gotten to you? This issue has been discussed on Slashdot before. x86 is fine. Don't fix what isn't broken.
Besides, the Athlon goes 700 Mhz. If that's not fast enough, I suggest buying a Cray
---
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
I haven't seen any comments on the fact that the G4 can easily be modified to work in mobile devices while the Athlon runs rather......warm.
I can't do better. Given the range of talent represented here, I wouldn't be surprised to find someone who could, but that's irrelevant.
The point is that there is someone out there who can do better. Not necessarily the person posting.
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
If x86 is fine, why is it that Motorola and IBM can pretty much keep par with Intel and AMD despite vastly smaller R&D budgets? Read the article, it talks about all the hoops the K7 has to jump through to support the aging x86 instruction set. This set of instructions was never designed for a high-performance processor, and lends itself extremely poorly to such things as pipelining. The x86 instruction set does work, but there are better things out there. Its only virtue is that it's what everybody else uses. Some virtue.
Imagine if AMD could take all that effort it spent on making that aging x86 instruction set work with their spiffy new processor and put it into making the processor fast instead? Rather than a 700MHz x86 processor, you'd probably have a 1GHz or higher RISC processor that would make the current K7 and G4 look like snails.
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
Regarding some other comments:
If x86 is fine, why is it that Motorola and IBM can pretty much keep par with Intel and AMD despite vastly smaller R&D budgets?
IIRC, some people were talking about why the G4 is 450Mhz (stable), whereas you can buy 1Ghz (stable, kryotech) K7s. This is more than 2x the Mhz rating. You can also buy 700Mhz K7s without Kryotech stuff, which is 1.5x the Mhz rating..
On par? No....
The designs of the G4 and the K7 chip are completely different. The K7 is like the Saturn V, whereas the G4 is like the concorde. They use different fuels and different speeds, but are both fast enough to get me from point A to point B faster than I can appreciate. So the comment that "Rather than a 700MHz x86 processor, you'd probably have a 1GHz or higher RISC processor that would make the current K7 and G4 look like snails." makes no sense, as to me the G4 (at its paltry 450Mhz) looks really damned fast. I can't even concieve of how fast 1Ghz would be. (Besides, how do you know that removing the x86 translator unit would speed anything up? Where's your EE Phd?)
Here are my points:
1) x86 works fine. I have plently of working knowledge of how to program in asm for this instruction set, and have plenty of proven working software for it (think Linux). The "flaws" you talk about are the same ones the RISC community rolled out when Intel had 200Mhz PPros out for more than 6 months before releasing a new CPU design. ISA zealotry annoys me, and doesn't help your possibly legitimate case at all.
2) The x86 is currently really lots faster, even if it's still too fast for me to notice (except in RC5 rate). So why strive to go even faster, faster, faster, when things are already faster, faster and getting faster (Moore's law).
3) An all new RISC design (like the implementation in the K7, K6-*, PPro, PII, PIII, Celeron cores) would not have any software support comming out the doors. The reason they have these micro implementations is to allow them to add a layer of complexity to the chips, and make them perform. They change the internals every generation, using different micro RISC cores. Once they sat down and used one for their flagship chip, they'd be stuck with it and lose the flexibility that the cores give in the first place. x86 is a nice general instruction set with instructions for whatever you need, which allows them to emulate it in any way they want (think Transmeta).
4) Have you noticed how the Sparc32, Sparc64, m68k, and a few other branches of the Linux kernel are not really well supported? It would take time even for Linux to come to bear on this new architecture. I'd rather have 1Ghz Linux now, rather than a "possible" 1Ghz Linux on some new architecture.
5) AMD, Intel, et all, have an investment of years in the x86 chip business. It's what makes them their tons and tons of money. Why would they throw away the backwards compatibility that gives them oodles of dollars, just to become another bit player in the RISC business (which isn't worth nearly as much)?
Anyways, I'm ranting a bit because you are acting just a bit like a Zealot. I praise you for being able to look ahead, but you seem to have a bit of a problem looking to now. AMD wants x86 dollars, and they are getting them.
---
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
I hold no claim that removing the x86 unit from the K7 would make it faster. However, the money that was spent on that unit could have then been channeled into something else, say making the chip faster. I have no problem with AMD making x86 processors, they can do whatever the heck they want. I do have a problem with people who think that, because of the gains that Intel and AMD have been able to wrest from the instruction set, x86 has no problems.
:)
Here's an example: in IA32, an instruction can start anywhere. Addresses divisible by 4, addresses divisible by 2, odd addresses. They can be of many varying lengths. This causes massive problems for pipelining and instruction decoding. If you're trying to decode three instructions at once, how do you know where instruction #2 is until you've finished with instruction #1? After all, they can be many different lengths. Now, clearly, this is not an insurmountable problem, as Intel and AMD have both pulled it off quite well. However, it does go for added expense. That money could be either translated into lower-cost chips, or more features for more speed, were it not for the necessities imposed by the aging instrucion set. The PowerPC ISA (I talk of the PPC because it's what I know best, I believe others are like this as well) has instructions that start on addresses divisible by four. They are four bytes long. Period. Thus, it's very easy to see where instructions #1, #2, and #3 are, because each one is the same length. Barring a branch, it's simple to start decoding multiple instructions at once.
Now, you sound like you don't seem to care too much about speed. I don't blame you. I'm typing this on a 300MHz computer that I got, new, not more than two months ago. It's plenty speedy for me. Cost, however, is another thing. Rather than putting those saved dollars toward more features for more speed, they could simply pass those along to the consumer. Also, fewer transistors around to support cruddy legacy "features" means less power consumption which means a smaller electricity bill. Particularly significant if your computer is on all the time. Maybe it's not that big a deal for you, but it's something to consider.
I agree that IA32 works fairly well. Cars with carburetors worked fairly well too, but now almost everything has fuel injection. Propellors on airplanes work pretty nicely in a lot of cases as well, but whenever a big job needs doing, jets have replaced the propellor. Even in a lot of smaller prop-driven airplanes, a turbine drives the prop instead of a piston engine. I could name more, but I think the point is made. IA32 works, but that's no reason to not wish for something better. I realize it's unrealistic, but I hold out hope that these companies that make so much money off of this market will decide to use their massive resources to make something truly new and good, rather than just sucking up profits from more of the same.
Btw, about the concorde, it still takes three hours in an unreasonably small cabin to cross the Atlantic. Or so I hear.
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!