IBM PowerPC 970 Architecture
riclewis writes "Hannibal from Ars Technica offers an explanation of some of the internals of the new IBM chip. It's certainly more powerful than anything on the desktop now, but by the time it's released a year from now, it looks to be middle-of-the-pack (which could still be a step up for Apple...) This excitement over the early release of hardware specs kinda reminds me of all the hype surrounding the Sony's Emotion Engine when it was introduced a couple years ago. In fact, some are suggesting the PPC 970 chip might be closely related to the PS3's 'Cell' processor..."
Depends....where do you want to go today?
Cheers,
Ian
"In fact, some are suggesting the PPC 970 chip might be closely related to the PS3's 'Cell' processor..."
Even though it's really doubtful, it'd be extremely cool to see a PS3 emulator on the mac if the processors are that closely related.
I remember running Mac OS 6.0.5 on my Atari ST. Because it had the same processor, it didn't need much to make it run.
Oh well, I can at least dream, can't I?
Not that hot NOW! They will have a lot of competition in that space with Opteron/Clawhammer, and the new Sparcs.
Still, glad to see something other than incremental progress.
The law is a weapon of the government, not a protection for the likes of you. Surely you understand that.
This could help push Apple back to a respectable market share over a couple of years. A *nix box with a decent processor and lots of commercial software? Of course, Apple has proved to be just as fierce in protecting their proprietary code as Microsoft, so I wouldn't expect the price to drop significantly for every million sold. But still, alternatives (especially of this caliber) are good.
Unlike the P4, the 970 does one more trick after it has cracked the PPC instructions down into iops. The 970 divides up the iop stream into "groups" of five iops a piece. So first it cracks the PPC instructions down into iops, then it collects the iops back together into groups. The iops are placed the group's five slots in program order with the stipulation that all branch instructions must go in slot 4 (the last slot). Furthermore, slot 4 can hold only branch instructions and nothing else. It is these groups of five iops that are dispatched in-order to the issue queues. (I haven't yet seen a functional diagram of the 970's core, so I'm not sure how many issue queues there are.)
computing in chunks... sounds a lot like a Cray. Together with the 900MHz-effective (jesus... that's a lot) FSB, Apple really will be selling supercomputers in the next few years.
But what do I know. I'm just looking for anonymous gay sex.
when the p4 debuted it was sort of average too... the p4's power has come from it's ability to scale to higher mHz ratings pretty quickly. what kind of life are they going to get out of this chip? if it's going to top off at 2gHz then it doesn't really seem worth it, but if they can chip can get up to 3 gHz or so within a year of its release...
I mean... it's great news that Apple won't have to rely on Motorolla's decidedly passive desktop chip development strategy anymore...
But man. First off, this kills any possibility of a big surprise hit. Second, this dooms apple sales for the next year or so... who wants to buy a stagnating desktop model when the next edition has so much promise?
Then again, Apple's desktop offerings have been a little stagnant anyway... most people probably won't want to play the waiting game for as long as it'll take for these to come out.
I just hope that by the time they do, they're worth it.
Apple chips? Are those anything like banana chips? As a child I didn't like banana chips, but I slowly grew accustomed to their taste. However, I still hate them. I hope apple chips taste better.
1) Why all the hype over a chip that will be slow when it's released? I'll admit, the specs look damn impressive - a 1.6 Power4 single-core has the SpecFP/INT specs of a P4 2.5 (500mhz Bus), but they're not due out for a year, and the 1.6 is expected to be on the high end
2) Why only a single-core?
3) Where's the G5? It looked similarly impressive, a year ago. It still does, according to the Register's leaked spec numbers
4) What's the advantage again of a 64 bit processor? Sure, more RAM. Is it faster? Does it do more? Anyone?
4)
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
" In fact, some are suggesting the PPC 970 chip might be closely related to the PS3's 'Cell' processor...""
Ah, so it runs on vapor instead of smoke?
*wonders if anybody'll get that.*
ok, so it's SPEC INT and SPEC FP numbers are 937 and 1051 respectively. From www.spec.org, 2002 q3: dell Precision WorkStation 340 (2.8 GHz P4), specint base is 970, peak 1010; specfp base is 938, peak 947. When it's actually released, if they make 2003 Q2, it won't be particularly impressive. But the current apple G4 specmarks are about 35% of the 970, so it'll look good compared to that.
That really depends if you're doing a cost-per-performance comparison though. Mac is still often expensive.
A lot of windows people I know build gaming machines though, so I suppose if there were a comparison there (if Mac could run all my games) then the cost of expensive video accelerators, etc could be factored in. While I suppose Mac would factor in such costs as well, most of the Mac people I know didn't buy their systems to run Doom3 and the newest UT.
A couple of points to throw water on this:
Apples are certainly wonderful machines, and Windows certainly is icky most of the time, but be prepared to back up any benchmark statements with actual benchmarks.
Also, PowerPC and Intel/AMD are two different types of processing, so they can't really even be compared.
Um, no.
All general-purpose microprocessors perform certain basic tasks upon which everything else is built - integer and FP math, memory access, and control flow operations. Processors take different approaches in how they implement these functions, but the interfaces presented to programmers - even assembly programmers - are very similar [and yes, I've done assembly on multiple platforms].
You can also completely ignore architecture and take test programs that you think are representative of the kinds of tasks found in different types of application, compile them for both platforms, and measure how long it takes to do the same amount of work on each machine. This is the _foundation_ of benchmarking.
If the machines were completely different, you wouldn't be able to do the same tasks on them!
The PowerPC 970 triples the length of the PowerPC pipeline
This will give it the same issues the P4 has. Namely a large penalty for branch mispredicts, etc. Instructions per clock will decrease.
OTOH, they should be able to crank the speed!
The law is a weapon of the government, not a protection for the likes of you. Surely you understand that.
... so could somebody who understands this processor tell me this:
Would a 3D rendering app such as Lightwave potentially see a huge benefit to this processor? I understand that it's up to the developer to tune it, yadda yadda yadda, I'm concerned with potential not real world numbers.
I'm trying to get an image in my mind about how the various processor descriptions (32-bit, 64-bit, Altivec, SimD, etc...) can radically change how an app like that would work.
Us vertex pushers have a substantial interest in machines that excel at that type of work...
Since I won't be buying a pc from Apple with this chip on it, I hope some third party such as Tyan decides to make an ATX format MB for us pc builders. There WILL be Linux ports for this chip, and existing PPC ports would probably work with it in 32 bit mode at first. Even IBM might offer an ATX evaluation board, though it would probably cost too much.
Alright, I'm sooo tired of this argument. First of all, just because it is a RISC chip, doesn't mean that a 1.0 GHZ motorola chip in a Mac could even come close to outperforming a 3.06 or even 2.80 GHZ Pentium 4 when combined with a 533 FSB and RDRAM. Apple just recently adopted DDR ram, but get this - the little PPC chip you have isn't even natively able to support it at DDR speed, the current batch of PPC chips can only work on one swing of the computing "cycle", not on the up and down like an Athlon can for example. Meaning, the motorola chips are not double pumped, so Apple is years behind AMD and Intel right now. Your argument doesn't hold water.
"Hannibal" also has an incredible knack for making the workings of microprocessors understandable to those with no hardware engineering backgrounds.
// I will show you fear in a handful of jellybeans.
Middle of the Pack is not a step up for Apple... The G4 chips outperform Intel and there microinstruction intuperted to Risc instructions.... alot more goes into a processor than it's MHZ... Take a read of Hennessy and Patterson's book Computer Architecture A Quantitative Approach
True, but there's still no denying that current Pentium 4's are faster. For the sake of argument, let's say that an 800MHz G4 is roughly equivalent to a 1.4GHz Pentium 4. Well, now a bottom-end $500 Dell is shipping with a 1.8GHz processor, the norm is 2-2.4GHz, and you can buy up to 2.8GHz, if you really want to throw your money way.
Bottom line: Yes, the G4 is faster than most people claim, but it is still measurably slower than what Intel is currently offering.
> Instructions per clock will decrease.
.09 process shows me that this 970 chip has legs. Another thing... IBM has *always* been conservative about what not-quite-ready chips will do as far as clock, and benchmarks. I expect "Real World" [no relation to Peter Gabriel] performance to be quite good. [although I expect Peter Gabriel's performances to be fantastic =)]
Actually, IPC is *increased* from the current G4. It will now fetch 8 instructions per clock, and retire 5 per clock.
The current G4 IIRC fetches either 3 or 4 per clock. I have no idea how many it can retire at once.
This coupled with a quick move to a
Blocklevel: Practical Information Architecture
Everything you said is correct... BUT...
It doesn't matter how much work a processor does per clock, if you can scale an "inferior" (according to your definition of inferior) to a MUCH higher clock.
This may not even be an architectural flaw as much as the result of an inferior manufacturing process. If Motorola's fabs aren't as good as Intel's (I don't think they are) then the fact that the G4 is a "better" processor on paper is completely irrelevant - for all the consumer cares, the FASTEST G4 available is slower than the fastest P4 (Currently, according to benchmarks not done by apple, it seems that you dont even need the absolute fastest P4s to beat the fastest Macs)
My server
supposedly the issue with Apple's chips over the last few years was Moto's manufacturing process. rumors say that IBM was always able to make more chips of higher speeds than Moto. the story is that because of the contract between the 3, IBM chips did not go in Apple boxes (upgrades and whatnot), and they could not outclock Moto.
yes, that's from the rumor mill, but everyone knows Moto has been going through a lot of corporate restructuring and who knows where they will be focusing in the next 5 years. IBM is going to make these chips (where ever they are going to be used) at a brand new plant in NY state. they have a great rep for quality control.
i kind of creepy thing is that the articles say they will probably debut 2nd half of next year (Macworld NYC? one last hurah! before MW moves back to Boston?) or not till January 2004. the articles also inply that they will debut at 1.4GHz. Apple is now selling 2 x 1.25 GHz G4 chips.
will Apple stall at or below 1.4 GHz till these new chips come out? the general upgrade of Apple machines is 5 or 6 months right now. that leave 2 possible revisions to the G4 towers before these babies are set. now i know that these chips will come with a super motherboard and 64 bit vs 32 and bla bla bla but Apple fights the megahertx myth even to somewhat educated comsumers. how will they be able to spin it when they have to explain it in terms of Apples vs Apples?
i guess it's a minor problem if these chips are as zippy as they say... a few benchmark tests and bar graphs should convey some message? maybe instead of having a 12 y.o. kid set up his iMac and go online in 5 minutes, they will have a 12 y.o. kid clone his dog or something. i would be impressed.
Does it?
The eetimes story linked at the top says it's an 8-stage pipe. That doesn't mean any more or less than the extreme tech statement that the new pipe is triple the length (which would be 21, the current pipe is 7) since we haven't seen any actual reference docs from IBM.
Can anybody who was at the Microprocessor Forum give us more info?
Don Negro
Perl 6 will give you the big knob. -- Larry Wall
I have no idea who you are, Mononoke, but I'd wager $1000 that Hannibal Stokes knows more about chip architecture than you do. The PPC 970 will have a hard fight (both in marketing and benchmarks) against the 4+GHz x86 chips also due a year from now.
p.s. How the heck did that get rated as Insightful? I'm as rabid a Mac addict as any of you, but it's just plain wrong to mod someone up for spouting false evangelism.
Good lord, not this argument again.
First of all, whoever modded this as interesting should be poked in the eye. This statement is full of FUD to the max.
Perhaps a G4 will outperform an x86 of the same caliber, but the high-end P4 CPUs absolutely smoke the high-end G4s. The G4 architecture is so maxed out that Apple had to resort to adding a second CPU, cuz they just couldn't scale the G4 chips any higher.
When the G4 came out, it kicked the arse of all the x86 chips out there. But that was a couple years ago. As things currently stand, the best Apples are barely middle of the pack, performance-wise. Don't worry though, you'll still pay more for a Mac than a fully loaded Dell machine.
Once you move beyond a 4.5billion, into the realm of 18.5 (two orders of magnitude past trillion), you can address anything for the forseable future (since you can count each year until the heatdeath of the universe this way, for example).
For vector operations, 64bit words make for some fast math operations, since you can pack more 32-bit integer components into each bus transfer.
For floating point, it means you have greater precision in hardware (allowing things like real physics and shapes to be modelled without noticable issues caused by subtle number creep). Since most systems use IEE-784 (64bit double precision floating point), it means a speedup to that software since you're not working with it as 2 32-bit operations.
In terms of storage space, it means you can address more than 2,199,023,255,552 bytes (~2 terabytes) of disk space (assuming a 512-byte sector). This is important for people with big RAID arrays today, and people with ludicrously big Maxtor drives 3-4 years from now.
For RAM, it means you don't have to worry about your server topping out at 4 gigabytes of RAM. It also means that your VM space has no effective limitation for the forseable future (very useful for people working on large projects, trying memory-intensive algorithmic approachs to traditionally NP-hard problems, or distributed computing problems).
I'm sure I missed a lot of the benefits even with this list. As you can see, 64-bit is not just a number game. It is 32 orders of magnitude larger than 2^32, meaning our grandchildren will probably still be using 64bit machines with no limitions being apparent (unlike 16-bit to 32-bit, which only moved from 65k to 4.5 billion in terms of addressable amounts of something).
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
The 7.2 GB/sec of bandwidth is just not much more than double that of existing P4s (P4 = 4.2 GB/sec) and since Hammer will have 6.4 GB/sec in early 2003, should be essentially the same as competing x86 chips.
A deep unwavering belief is a sure sign you're missing something...
The 64 bit PPC uses 32 bit instructions.
;)
Basically the only real difference is in the details of some instructions, and the 64bit registers.
Since you're using 64 bit integer registers, you can now use 64 bit addressing (pointers), which means you can calculate addresses for 64bit address spaces, which yes, means more RAM.
Macs are currently limited to below 4GB of ram, which is actually a limit... I think the most significant reason to move to 64bit PPC is to go beyond 4GB of physical ram.
The other benefit will be the ability to handle 64bit integers fast. As used by databases
Another benefit will be 64bit load/stores which can happen in 1 cycle, rather than 2.
Of course, the Altivec unit has allowed 128bit load/stores for a while now (and the fpu allowed 64bit load/stores before)
Anywho, the big points of PPC64 are increased integer size and larger address space.
PPC does not use segment hacks like x86
---
Live Long & Prosper \\//_
CYA STUX =`B^) 'da Captain,
Jedi & Last *-fytr
Actually, the DDR thing is a little misguided. The real reason DDR had no effect was because the 2.1 GB of memory bandwidth was feeding into 1.3 GB/sec of processor bus bandwidth.
A deep unwavering belief is a sure sign you're missing something...
There is never a good time to buy a computer, and nobody in their right mind will ever buy one at all. There is always something faster coming up.
Once you get over how ludicrous that is, I say buy a computer whenever the hell you want one. And yes, your machine will be obsolete, according to all the charts and graphs and tables of benchmark numbers, almost immediately. It doesn't matter if you buy a G4 in 2003, or a 970 in 2004. It will still happen. Get over it.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
How do you make a CPU go faster without jacking up the clock? You can create a new design, but that would be a lot more expensive than jacking up the clock. It takes years to develop a new CPU design and IPC increases are usually minimal. Most new CPUs seem to be designed to run at higher frequencies to achieve better performance. The P4 actually has a lower IPC than the P3 but can operate at much higher frequencies.
Many companies are planning to use parallelism to improve performance. IBM has a CPU with two logical cores, and Intel will introduce CPUs with two virtual processors in the very near future. But parallelism is not likely to get you a doubling of performance, especially on a desktop machine that is often running only one intensive process at a time.
Decodes/breaks down the native ISA, repackages them in bundles, then issues them to the execution units... A point-to-point FSB... Will have higher IPC than Athlon, but has all the same scalability limits. Hammer has the integrated memory controller and multiple hypertransport interfaces for fast IO and glueless MP. In short, PPC is similar to 7th generation x86 along with P4 and Athlon. Hammer is much more like Power4, but more highly integrated/cost-reduced. fpg
Note that your new IBM chip is doing exactly that.
Intel and AMD have repeatedly shown that they can do whatever they like to implement top-notch internal architectures, and lopping on a translation unit only adds 10-20% die area and typically a very small performance hit over a traditional sequential RISC architecture. And they're free to change the internal architecture between revisions. And both Intel and AMD sell enough chips that they can spend a lot of money on designs and make them very good and still turn a profit.
-- Erich
Slashdot reader since 1997
Contrary to some of the opinions presented recently it is just fine for Apple to use the 970 and be behind the curve with respect to typical performance. Sure there are specialized apps that can leverage a RISC architecture to outperform x86 or leverage Altivec to outperform SSE, but that is a small minority. Typical performance lags behind PC a little but we are in a situation where PCs and Macs have more performance than most people actually use. Most folks out there in the real world will get along very nicely with a 1GHz PC or a 800MHz Mac. Very few people need 2.xGHz machines, and only a few more have enough disposable income to buy those machines for Quake FPS pissing contests :).
The real Apple problem is that the gap between typical PC and typical Mac performance is starting to grow beyond the range that has historically shown to be viable. Not a problem today, standard dual CPUs counter this to a degree, but it's likely to be a problem in a year or two. While the 970 may only perform like a 3GHz P4 (SPEC), lag whatever Intel/AMD has in a year or two, it will be close enough. Apple will be back to a point where the typical performance gap is small enough. Apple has sold tens of millions of Macs that lagged PC counterparts in performance. They know that their customers are more interested in ease of use. Performance wise close-enough is all they need.
"Mhz doesn't matter"
The MHz Myth that Apple talks about is not about trying to say that "Mhz doesn't matter", it's about the fact that MHz cannot be used as a direct comparison between architectures.
Of course MHz (brute force) matters. But what also matters is smart design.
I think showing a 333MHz G3 running faster than a 500MHz Pentium III, kinda proves the MHz Myth is just that. Bear in mind, that the G3 is not AltiVec equiped! So not getting a huge vectorized benefit here.
If you think that's impressive, look at the G4! I can't wait to see what CPU Apple actually unleashes next.
I'm astonished that there are actually people who think MHz is THE sole number to go by.
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
PPC chips can only work on one swing of the computing "cycle", not on the up and down like an Athlon can for example
It's called positive and negative edge triggering. It's not a new technology either. I was dealing with it in the 80's at the discrete logic level.
AGP 2x uses this and 4x uses positive, negative, high and low triggering. Certain UDMA modes make use of this clocking technique also.
Your argument doesn't hold water.
His arguement DOES hold water. PPC CPU's DO outperform Intel x86 CPU's by a good margin when compared clock for clock (showing the MHz Myth for what it is). Especially the G4 and boy when AltiVec can and is exploited... Wow. There IS more to CPU design than smaller die and deeper piplining for higher MHz.
As far as I can tell, Apple seem to be in a position where they have to make the best of what they can get, due to Motorolla dropping the ball pretty baddly.
I hope IBM comes to their rescue. How ironic.
War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
http://www.heise.de/ct/english/02/05/182/
j pg)
. ht ml
SPEC benchmarks for the G4 processors. (Not a synthetic benchmark issued by Apple, but by an unbiased third party, SPEC)
G4 1 GHz SPECs at 306 integer 187 floating-point
Interestingly, the 1 GHz G4 was almost neck-and-neck with a 1 GHz PIII (http://www.heise.de/ct/english/02/05/182/qpic02.
http://www.spec.org/osg/cpu2000/results/cpu2000
A large archive of SPEC results for many CPUs, including x86.
A few choice results:
1.2 GHz Athlon (Ancient by today's standards) - 443 integer, 387 FP
Athlon XP 1700+ on an Epox EP-8KHA (Happens to be my mobo - Slowst Athlon XP listed for this mobo):
633 integer, 561 FP
Dell Precision Workstation 330, 1.3 GHz P4 - 474 integer, 502 FP (The P4 doesn't seem to be taking too much of a branch misprediction hit here)
So in the case of G4s, while they may be a bit more efficient MHz for MHz (And the P3 vs. G4 benchmarks so that this isn't even necessarily the case), the fact that they're so far behind on the clock speed curve hurts them badly.
If you want to see a good example of MHz not being everything, check out the benchmarks of Alpha systems - The 750 MHz ones chew even 1.2 GHz Athlons for lunch. But don't look at Apple...
Also interesting in the case of the SPEC benchmarks run by Heise - MS C pays a 10-15% performance hit over GCC in the SPEC benchmarks.
retrorocket.o not found, launch anyway?
I can't find the link anymore, but last night I saw an article by Frank Soltis, the cheif scientist over the AS/400 unit. He basically laid out the evolution of the POWER achitecture (not the PowerPC) architecture and how it relates to the new 970 CPU. The first POWER cpu used by IBM was derived from their work with Moto and Apple, but it couldn't be used in the AS/400 line becuase of limitations in the chip. So IBM came up with Power2 (PowerPC AS). This exteneded the functionality of the chip to where it could be used in an AS/400 environment, but was no longer compatible with the PowerPC that Apple and Moto were selling. Then they added the POWER64 instruction set which made the chip faster for business and HPC applications, but drove it further away from the PowerPC platform. The POWER4 chip actuall includes 4 seperate instruction setts. POWER64, POWER32, PowerPC64 and PowerPC32. Adding Altivec and cutting out the second CPU core is what the 970 is. He didn't mention that there was really any overlap between it and the PS3 chip. POWER4 design was started in 96 so there may be some shared philosophy, but probably no real instruction matching between the two. He aslo said that the POWER5 (late next year) and POWER6 architectures would have some OS dependent accelerations put in them. He specifically mentioned that the chip would have an instruction for handling TCP streams instead of having to send several instructions to the CPU at once. And that these will be fully documented so that Linux/OSS can use them. POWER6 will extend that to specific DB2 and Domino calls to accelerate those apps.
The POWER4 and presumably the 970 will also have, a very very nice branch prediction scheme. The POWER4 uses a total of 3 branch predicters to the Intel P4s one. The 3rd table weighs the comparative performance of the first two tables to acheive the highest possible correct branch prediction.
In addition, the PowerPC architecture includes a static branch prediction bit for branching instructions, which allows the compiler to "hint" to the processor the likely branch, the x86 architecture has no equivalent feature.
In short, branch misprediction occurs less often with the POWER4 (and hopefully the 970) for the above reasons. In addition, the "tripling" of the G4 pipeline in the 970 is still shorter than Intel's 20 stage P4.
Spyky
Depends, the G4e has a 7 stage pipeline, so tripling it would make it 21 stages.
A deep unwavering belief is a sure sign you're missing something...
There's a specific talent to buying a Mac at the right time, as performance increases happen in large steps in a few distinct instances in the year.
PCs just keep getting gradually better and better. But with a Mac you can buy a single processor machine one day only to find you could have had a dual for the same price on the next day.