Design Philosophy of the IBM PowerPC 970
D.J. Hodge writes "Ars Technica has a very detailed article on the PowerPC 970 up that places the CPU in relation to other desktop CPU offerings, including the G4 and the P4. I think this gets at what IBM is doing: 'If the P4 takes a narrow and deep approach to performance and the G4e takes a wide and shallow approach, the 970's approach could be characterized as wide and deep. In other words, the 970 wants to have it both ways: an extremely wide execution core and a 16-stage (integer) pipeline that, while not as deep as the P4's, is nonetheless built for speed.'"
As evidenced by this review
tcd004
2h = second half
03 = 2003
In the case of Blizzard that means Fall 2005.
-- Knowing too much can get you killed, but knowing who knows too much can make you rich.
What's the difference between the Power4 and the PowerPC 970? As a Mac guy, I've been following all of the rumors and announcements with interest but I keep seeing the PPC970 referred to as a scaled-back version of the Power4.
Why wouldn't Apple go with the Power4 over the PPC970? And I already know that nothing official has been announced by Apple and that this is all probably going to be a lot of sturm und drang signifying nothing, but that's what keeps us Mac guys going I guess.
Second half 2003. Which almost always slips so the real meaning is assumed to be Q403(4th quarter 2003) or even Q104(First quarter 2004).
Some of you may have read an extremely wide execution core and a 16-stage (integer) pipeline in this article's write-up and been extremely confused. I took a few computer architecture courses back in my undergrad days, so I can refresh some of your memories as well as teach basic processor design to those of you who never got to attend a 4-year college and study computer chips in-depth.
... this takes a much shorter amount of time.
Basically, all modern processors are pipelined. This means that they execute various instructions at the same time. Whereas doing a load of wash, waiting for it to finish, putting it into the dryer, waiting to finish, and then folding would take 30 minutes * 3 steps * 3 loads = 4.5 hours, one could PIPELINE such a process, thus removing sequentialism and doing the first load, then while that's drying put the second load into the washer, and so on
This is all a processor really does. It does a FETCH, an INSTRUCTION DECODE, then an EXECUTION, then perhaps a MEMORY READ/WRITE, and then a WRITE BACK, perhaps. So this 16-stage pipeline can have 16 different instructions executed all at the same time, but just in different points of its execution. The example in CAPS above is a 5-stage pipeline that's similar to those in MIPS processors.
Hope this was helpful!
Department of Physics and Atmospheric Science, Dalhousie University, Halifax, N.S., Canada, B3H 3J5
I recall IBM's PPC boards going for over a grand, which is (to me) far too much. Especially when it was a 'G3' chip.
Even if the new chip is faster, will I be able to buy 2 pentium 4's (5?) for the price of it?
In the case of Duke Nukem' Forever, that means just after our sun expands to a red giant and swallows the earth and moon.
Kudos to the Ars team yet again for going deeper into CPU designs than 99% of the populace need to go :)
I think that since this is a 64 bit chip, why not compare it with other 64 bit consumer desktop chips (ie, AMD Clawhammer)? A lot of Intel's questionable moves (12K micro-ops instruction cache?) for the P4 were obviously not copied by AMD, and x86-64 seems to be the 64 bit desktop chip of the future.
Make sure everyone's vote counts: Verified Voting
The PowerPC 970 has other potential customers as well, though, not the least of which is IBM itself who, with its large investments in Linux, would love to see a high-performance, 970-based 4-way or 8-way SMP Linux desktop workstation halt the steady flow of former 64-bit *NIX workstation users who began switching to Wintel hardware in the late 90's.
Before all my fellow Mac users start A) thinking about going to Linux B) drooling C) wondering about Darwin or D) some combination of the above, let me remind you that Darwin scales very well. You can now return to your previous state of awe.
PS - How much you want to bet good ol Steve is already having wet dreams about doing the traditional Photoshop test at a Macworld with 4-way SMP?
This chip could be the start of something big in the Linux space as well. Think about it, we are now at a point where a few companies other than Intel are now poised to take the center stage in the next gen workstation, most notably AMD, Apple, and now IBM themselves.
While Linux has run on PPC chips for a long time, it is difficult to come upon a G4 chip without paying the "Apple Tax" for the hardware. If IBM steps up to the plate with this chip, which can then run OS X, Mach, Linux, *BSD, (insert other OS'es here), and can be purchased directly or in a package from IBM, we may see a good set of Windows challengers for the desktop and server room. Obviously OS X will still only run on Apple derivatives.
These chips will be big, I guarantee it, and not just for Apple. It will be interesting to see if Microsoft ports Win XP to these chips.
Why would they want to?
Intel and AMD have the x86 market pretty well locked down.
More importantly, why would ANYBODY want to implement the x86 ISA (Instruction Set Architecture or smtn like that). It's the most horrid instruction set in use today.
Some instruction sets can't really be mapped to others easily, and optimizing for good performance with PPC would probably not have good x86 performance anyways.
In Pentiums and Athlons, the instruction set isn't really emulated. It's translated to a smaller instruction set (uops, iops, pick whatever term you like and run with it). However, these smaller sets are still made pretty much specifically to cover the overlying ISA (x86 in this case).
I think Apple will stick with a company that it knows, IBM, since they have been working together for years. It doesn't seem that Apple will just jump ship to the x86 platform. This would also mean redoing the Mac OS X code and optimization (not like they will have to do some anyway, but they will have to do more). It is highly unlike that Apple will go with a heat producing, energy wasting x86 Intel chip.
One of the early planned PPC chips had that idea in mind, by pretty much adding an x86 processor and logic to figure out what instruction set it was getting and where to send it. The idea was that then there would be no barrier to using PPC code - it could run x86 code to replace existing systems and run PPC as well. Thus it would be a transitioning thing.
The x86 world seemed to move faster than the design for this and it fell away. It made more sense to concentrate on PPC stuff rather than try to do PPC and changing x86 stuff. Also, if it ran x86, why should anyone bother to write for PPC?
The difference is Pentiums and Athlons are intended to be x86 family upgrades, while the PPC is not. The PPC 970 is meant as an upgrade to earlier PPCs. One could as well ask why AMD doesn't make an Athlon that can run PPC code.
I don't subscribe to RMS's GNUtopian vision.
Apple will almost certainly go with 970 and not switch to Intel for the following reasons: 1. It is difficult to emulate PowerPC with Intel (although the reverse isn't *that* difficult). Apple would need a PowerPC emulator so that all that software (including OSX software) isn't lost. 2. Apple wants to differentiate itself somewhat from the PC. 3. IBM appears to be moving up after the several years of problems with Motorola. The downside is that by the time a 970 board is out it will definitely be in the middle of the pack relative to the PC world. That means that Apple still will have computers that are more expensive than the PC world and that aren't as fast. Of course I think OSX is sufficiently better than either Linux or WinXP for a workstation that I'll stick with it. But Apple best hope that IBM gets large yields on time and perhaps with better speeds than expected.
finally they'll see that clock speed does make a difference Clock speed is something Intel uses to bolster their performance claims and give people an excuse to upgrade to the newest model. Clock speed tells very little about the performance of a computer. Look at AMD's athlon. Many reviews like the ones on tom's hardware show that running Windows on a "slower" athlon yeilds better performance than a comparably clocked P4. If you meant that finally, if apple runs on x86, there will be a better benchmark between Windows and MacOS, you would be more accurate. Until that happens you are comparing two different fruits.
0xfeedface
It is highly unlike[ly] that Apple will go with a heat producing, energy wasting x86 Intel chip.
...because PPC chips completely disobey the laws of physics, producing neither heat, nor "waste energy" (perhaps through the production of heat?). Yes, it is PPC, miracle of modern technology, standing up for the common man against the perils of Thermodynamics!
The other responses to your question have pretty much hit it dead-on. I just wanted to comment that the PowerPC has always been the little brother of the Power architecture used originally in the RS6000 ... and now in almost everything IBM makes - AS400, E9000, etc.
... because of this the migration of everything into one die for the PowerPC was amazing.
;)
The first generations (601, 603/604 and the ?aborted? 620) of the PowerPC line were scaled-back versions of the Power and Power2 architectures respectively [the original Power architecture was mounted on a 3x5 daughter card with 4-5 separate chips [I'll have to go looking for my tech papers] making-up the core
Additionally, IBM has tended to work-out new capabilities -- such as the move to 64-bit and dual cores -- on the larger scale Power architecture, before attempting to stuff it into the smaller PowerPC pacakge [besides, IBM has to keep something to distinguish its pricier iron from the OEMs.
Natty
Maybe the rain Isn't really to blame. So I'll remove the cause, But not the symptom!
Go look on their site, they have specs on the first of their G5 chips:
s um mary.jsp?code=MPC8540&nodeId=01M98655
http://e-www.motorola.com/webapp/sps/site/prod_
*Sigh* The size of a CPU's address/data bus does not reflect a processor's "bitness". 64-bit means that it has a 64-bit word size (as opposed to the 32-bit word size on x86 processors), 64-bit registers, etc. Most 64-bit CPUs don't actually have a 64-bit address bus. Like you said, the Alpha's is 48-bit. This is usually done to keep the pin count down to a sane level (if you need all the physical RAM that a 64-bit address bus would provide, you need something bigger than a desktop CPU). You can expect that as 64-bit chips become more common on the desktop to see somebody introduce a 64-bit CPU that has a 64-bit address bus just so that they can say, "Hey, look, we have a 64-bit address bus, the other guys only have a 48-bit one!" and (like you are trying to do) will insinuate that this means the competition's CPUs aren't "true" 64-bit (even though they are).
I dunno 'bout Macs (I don't know the M68k's "bitness"), but Intel introduced the 386 (their first 32-bit CPU) in 1986. And I certainly don't think the M68k was a RISC processor.
at current prices and projected prices, 512 gigabytes or RAM will barely cost more than a couple of the fastest processors of this type.
Really? I would LOVE to be able to buy 512 gigabytes of RAM for the cost of a couple of fast desktop processors. Don't forget that the PowerPC 970 is meant to be a desktop processor.
Because that is where most of the desktop CPU money is going, some of the high end, and frighteningly enough a fair bit of embedded CPU money too.
In short if you can navigate the patent mine field, the brutal competition mine field, and deal with the instruction set making things a royal bitch doing an x86 CPU is a total no-brainer.
Other then needing a whole new decoding front end, and being forced to use a trace cache because decoding multiple instructions in x86 land is very hard... the instruction thing isn't a big deal. Handling the odd-ball 80 bit FP format is. So is emulating all of the trap stuff and the other little odd bits close to the instructions set (like the MMU).
A big pain. But with much of the effort not being where folks think it is!
When I had the Pentium 4, she complained about its narrowness, but its was great. With the G4 Mac, it was nicely wide, but too short, she would note. I'm sure that with the 970, I can fulfill all her dreams!
// file: mice.h
#include "frickin_lasers.h"
...an extremely wide execution core and a 16-stage (integer) pipeline that, while not as deep as the P4's, is nonetheless built for speed.
:-)
For those not planning to read the article, I wanted to mention the following so you do not get the wrong impression. The speed that the article refers to (of a long integer pipeline, like a 16-stage or like the Pentium IV's 20-stage) is clockspeed, not necessarily actual performance. The P4's super long pipeline, for example, allows it to run at higher clock speeds, but less work gets done in the same number of clock cycles. This is the "braniac" vs "speed demon" philosophy (with a high clock speed but low instructions-per-clock representing "speed demon") and neither is necessarily better than the other (though one is obviously better for the marketing dept.)
Just don't assume that "built for speed" always means "built to be fast" -- a confusing but important distinction.
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
A lot of Intel's questionable moves (12K micro-ops instruction cache?) for the P4 were obviously not copied by AMD, and x86-64 seems to be the 64 bit desktop chip of the future.
The P4 has its flaws, but IMO cacheing decoded instructions isn't one of them. It shortens the pipeline, and paves the way for a true trace cache (cache of decoded basic blocks indexed by entry point; very handy for renaming and scheduling).
What I really want to know is how much this chip is going to cost. If its cheap for Apple to put 2 or 4 of these in a machine, then how much will it matter that an expensive P4 (P5) out performs it? Hmmm.... The current Wind-Tunnel G4s raised a few eyebrows when it first came out do to the new case design. These things were designed to disapate heat! A HUGE (7 lbs) heat sink w/ matching fan, a small case fan, 2 fans on the power supply, and a ton of ventalation in the back. WAY more cooling that those 2 little G4s require. I think Apple is trying to avoid the fiasco it had with the Sawtooth (1st gen) G4s where they just slapped a G4 onto a G3 mobo. This time around, I believe they're releasing a new mobo first and then put a new proc in it down the road. I've also read stuff in forums suggesting that the power supply for the Wind-Tunnel had way more juice than the system currently demands. Can anyone out there do the math on this? We know how much power the PPC 970 eats. Can we figure out how much heat the Wind-Tunnel case is designed to disapate? What about how much power the power supply is putting? With these numbers, can we figure out how many PPC 970 the Wind-Tunnel case could power and cool? I've been suffering with a 266MHz G3 iMac, and I refuse to upgrade until Apple comes out with a system that really is worth that premium they charge, and a G4 is not it.
" OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs." All that would have to be added to gcc are the "compiler directives", as the "library routines" and "environment variables" aren't directly a part of the compiler.
Now, openMP is good for programming extremely high-performance shared-memory applications, like scientific computation applications and stuff like that. It really sounds like overkill for a desktop environment where it's probably easier to program a multithreaded application with standard IPC mechanisms where communication is required. And really high-performance applications could also be programmed using MPI and a message passing communication scheme, which is far more widely used (compare the # of people who know about openmp versus those who know about mpi), probably wouldn't be much less efficient, and would quite likely scale much better than a shared memory implementation.
IT IS ONLY 40 BITS not 64
Your desire to use address pins (or is it max pinned space per process?) to measure size puts you in a distinct minority. That doesn't make you wrong. But neither does it help make you right in this particular jungle.
Systems whose physical addressing match their claimed "bitness" are probably in the minority.
Some systems provide more physical addressing than register width (later PDP-11s, 8086, S/390), some less (68000, classic CDCs, early POWER). The 970 falls into the less category. Nothing unusual there.
Apple, like EVERY OTHER OS KNOWN, will steal a bit or two
Some bits come from physical addresses, some from virtual addresses. These should be addressed [pun slipped in, sorry] separately. AIX, btw, steals less than one bit. Linux can also be configured to steal less than one bit. (Assertions I can get away with no loss of credibility, since AC's have none to start with.) Were you frightened by a VAX in your formative years?
Why do fanboys mod stuff like this down?
Because we can't figure out why someone who needs 512GB, or 1TB, or more (which is it?) cares that a Linux process is limited to 1GB and not 2GB or 4GB.
40 bit address bus. Not a 40 bit data bus. BIG difference there.
And the 40 bit address bus is most likely a pin packaging limitation. They did not see a need to bring those extra 24 address lines out to the chip package. Internally, it is 64 bit. Much like the venerable MC68000 was 24 bit externally, but 32 bit internally.
But seriously, in the life-span of THIS processor implementation - do you seriously see ANY desktop manufacturer even thinking about putting that much RAM in their CPUs?? Heck 1GB of RAM is not 'standard' yet. Extrapolating w/ Moore's Law, we'll be approaching 40bits in 8 years. Apple will undoubtedly have another chip before then!
If you truly need THAT much physical storage today, you'll need to shell out for a SERIOUSLY large server. IBM's high-end p690 currently maxes out at 256GB. The virtual address space is undoubtedly much higher.
Tom