G4 vs. Athlon Review

← Back to Stories (view on slashdot.org)

Posted by Hemos on Monday January 3, 2000 @04:30AM from the covering-the-chips dept.

heatseeka writes "There is a great article at Ars Technica comparing the Motorola G4 and the AMD Athlon. They discuss every detail of the design of the CPU's, and give credit where credit is due. " Hannibal does a great job dissecting the different chips, as well as explaining the background behind each chip.

9 of 176 comments (clear)

Min score:

Reason:

Sort:

Perfomance in practice by nickovs · 2000-01-02 23:58 · Score: 3

This artice is very informative with respect to the architectures but a useful follow up would be to look at the performance in practice. Both of these processors can be used to run Linux and it would be rather interesting to see how a pair of workstations faired in a side-by-side test.

While such a test would be interesting I expect that the results would, in practice, be as much a test of compiler maturity rather than a test of the speed of the underlying system. Despite the best efforts of the processor designers (out-of-order execution and all) these sorts of processor tend to be very sensitive to the compiler technology. Furthermore many of the multimedia and vector processing performance enhancments (SMID vs Altivec) really need to be accessed from assembler at the moment.

Still, rather interesting stuff.

--
If intelligent life is too complex to evolve on its own, who designed God?
Re:AMD Disadvantage by asparagus · 2000-01-03 00:16 · Score: 3

Back when Apple started using PPC's, they threw the entire 68k instruction set out the window. They provided an emulator for PPC, and then let the raw speed of the PPC platform gradually replace the older programs, which were quickly rewritten for the new processor. Now, (9?) years down the road, the PPC is slim and trim. It's a pity Intel/AMD/Whatever doesnt' have the balls to kill the x86 instruction set. (And don't get me started on Merced.)
Both are awesome chips-the difference is degrees by jht · 2000-01-03 00:18 · Score: 3

The Athlon is an amazing chip, even more so given the need to maintain backwards compatibility with real-mode X86 code and the hack that is MMX. The only performance improvements I really expect to see going forward in X86 architecture are going to be due to process improvement rather than architectural development. MMX and 3DNOW are kludges on the architecture. In light of that, Athlon stands out even more.

The G4, though, has the advantage of being a lighter-weight chip (fewer transistors needed, fewer instructions, less microcode). As for speed, RISC versus CISC aside, the Motorola/IBM designs have not shown the ability to drive the high clock speeds that Intel and AMD are playing with. Until about a year ago, the two were neck-and-neck, but the X86 chips are now up around 800 MHz while the G4 is just passing 500 now. But given the efficiency of not having to deal with all the microcoded X86 instructions the G4 minimizes the difference in a well-implemented OS.

Another thing to keep in mind (mentioned in the article) is that the G4 is not strictly designed for desktop computers. PowerPC chips are very popular in the embedded market, where they go into single-board computers, automobiles, and all sorts of dedicated hardware. Sales to Apple alone wouldn't keep a chip family alive. Interestingly, Intel sells a lot of older 386 processors to the embedded market too - the too-cool Blackbery 2-way pagers use a 386 processor among other devices.

The best thing that PowerPC has going for it IMHO is that Motorola didn't build backwards compatibility with the M68K series processors. They made an architectural clean break - and the few companies that needed compatibility did it through emulation (parts of the MacOS are still in 68K code today). The ample shortcomings of the MacOS tend to cover up what is a first-rate processor family.

My suspicion as to the 'real' reason Intel has been funding Linux ventures is this: they know that Windows is hopelessly tied to X86, and they are hoping to eventually leave that baggage behind in the IA-64 architecture. Ultimately, X86 will be a drag on clock speeds.

Sorry to have rambled about here some, but I'm still a bit sleep-deprived from the weekend.

- -Josh Turiel

--
-- Josh Turiel
"2. Do not eat iPod Shuffle."
Wrong... by MacBoy · 2000-01-03 00:31 · Score: 4

To quote from the article:
Since the K7's FPU handles vector operations, it's not always totally free to do fp ops like the G4's FPU is. But considering that vector and regular fp calculations aren't normally mixed, the K7's fp performance should exceed that of the 7400 under most circumstances...
The G4's vector unit (AlticVec) is way more complex than the K7's. It can do Floating Point operations - four SP (single precision) or two DP (double prec) in fact. In combination with the FPU of the G4 (which can do one SP or DP FP op), the G4 can do no fewer than five SP FP ops or three DP FP ops per cycle. Any application that does FP ops and is compiled on an AtliVec enabled compiler (such as Codewarrior or Mototrola's) will take advantage of this superior capability. AltiVec's 32, 128 bit-wide vector registers and it's 155 vector instructions make it a formidable number-cruncher.
Article doesn't discern much by Paolo · 2000-01-03 00:32 · Score: 3

While it is one of the first truly unbiased and highly technical articles on the K7 and G4 chips (instead of rumors and performance "benchmark" drivel), it does not say that much about the chips in the end. It should have concluded with a stronger statement about the efficiency of the G4 chip.

The US Govt is often criticized for implementing obscenely expensive solutions to problems when simple ones would have done the job better. This can be applied the the K7 vs G4 question, for it is always better to have efficiency when the performance is the same.

Rumor has it that Intel is running the new Itanium chips currently with 30 watts of power consumption, over twice that of the G4. If I upgraded my motherboard to the itanium (tm), I would have to get a new case or power supply because of the incredible inefficiency of the chip. That is not novel engineering, but sluggish engineering, something which is not prized in this day in age.

--
"In individuals, insanity is rare, but in groups, parties, nations, and epochs it is the rule." -Nietzsche
Re: MHz differences will fade soon enough... by MacBoy · 2000-01-03 00:42 · Score: 3

...As for speed, RISC versus CISC aside, the Motorola/IBM designs have not shown the ability to drive the high clock speeds that Intel and AMD are playing with. Until about a year ago, the two were neck-and-neck, but the X86 chips are now up around 800 MHz while the G4 is just passing 500 now...

True, but the gap will lessen (or disappear) in the near future. The G4 has been limited (in clock speed) by it's exceptionally short 4-stage pipeline. Motorola has demonstrated a version of the G4 with a longer 7-stage pipeline that hits much higher clock speeds (~700 MHz range at the demo - higher in production). Each stage is simpler and faster, resulting in the higher clock speed. The K7 already has a very deep pipeline, which is a large factor in its high clock speed.
By Favourite bar graph by Pope · 2000-01-03 00:45 · Score: 4

is here

It shows power consumption of the major chips in use. Note where the PPC chips are! :)
Enjoy.

Pope

--
It doesn't mean much now, it's built for the future.
Condition code registers and branch predicition by Megane · 2000-01-03 01:23 · Score: 3

One important part of the PowerPC architecture which this article fails to mention, is the multiple condition code registers of the PPC. (These date back to the older Power architecture, BTW.)

Unlike all the other similar features mentioned in the article, these can not be retrofitted into the K7, because it is limited to the x86 instruction set, which does not have this concept.

Basically, any instruction which needs to check the result of an operation (such as a compare, or overflow from an arithmetic operation) has to use condition codes. But in a pipelined processor, the result of the operation usually has to wait until the instruction has finished going through the pipeline. Rather than wait this long to decide what to prefetch, branch prediction tries to guess whether or not the branch will be taken. The predictions are usually right, but not always. What if there is more than one such comparison close together, particularly if the result is not being used directly for a branch, but for a boolean expression?

What the PPC does is have multiple (7?) condition code registers. When an operation such as a compare is done, you select a condition code register to receive that result. In the same way that code can be optimized for RISC by interleaving multiple threads of operations such that the result of an operation isn't used until three or four instructions later, the condition code register usage can also be interleaved.

With out-of-order execution (OOO), the CPU automatically rearranges instructions to achieve this interleaved usage of registers. And thusly, the PPC will gain this advanatage with condition code register usage as well.

--
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Re:Even Faster by HeghmoH · 2000-01-04 06:38 · Score: 3

I hold no claim that removing the x86 unit from the K7 would make it faster. However, the money that was spent on that unit could have then been channeled into something else, say making the chip faster. I have no problem with AMD making x86 processors, they can do whatever the heck they want. I do have a problem with people who think that, because of the gains that Intel and AMD have been able to wrest from the instruction set, x86 has no problems.

Here's an example: in IA32, an instruction can start anywhere. Addresses divisible by 4, addresses divisible by 2, odd addresses. They can be of many varying lengths. This causes massive problems for pipelining and instruction decoding. If you're trying to decode three instructions at once, how do you know where instruction #2 is until you've finished with instruction #1? After all, they can be many different lengths. Now, clearly, this is not an insurmountable problem, as Intel and AMD have both pulled it off quite well. However, it does go for added expense. That money could be either translated into lower-cost chips, or more features for more speed, were it not for the necessities imposed by the aging instrucion set. The PowerPC ISA (I talk of the PPC because it's what I know best, I believe others are like this as well) has instructions that start on addresses divisible by four. They are four bytes long. Period. Thus, it's very easy to see where instructions #1, #2, and #3 are, because each one is the same length. Barring a branch, it's simple to start decoding multiple instructions at once.

Now, you sound like you don't seem to care too much about speed. I don't blame you. I'm typing this on a 300MHz computer that I got, new, not more than two months ago. It's plenty speedy for me. Cost, however, is another thing. Rather than putting those saved dollars toward more features for more speed, they could simply pass those along to the consumer. Also, fewer transistors around to support cruddy legacy "features" means less power consumption which means a smaller electricity bill. Particularly significant if your computer is on all the time. Maybe it's not that big a deal for you, but it's something to consider.

I agree that IA32 works fairly well. Cars with carburetors worked fairly well too, but now almost everything has fuel injection. Propellors on airplanes work pretty nicely in a lot of cases as well, but whenever a big job needs doing, jets have replaced the propellor. Even in a lot of smaller prop-driven airplanes, a turbine drives the prop instead of a piston engine. I could name more, but I think the point is made. IA32 works, but that's no reason to not wish for something better. I realize it's unrealistic, but I hold out hope that these companies that make so much money off of this market will decide to use their massive resources to make something truly new and good, rather than just sucking up profits from more of the same.

Btw, about the concorde, it still takes three hours in an unreasonably small cabin to cross the Atlantic. Or so I hear. :)

--
Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!