PowerPC 970 Running at 2.5 GHz
kuwan writes "IBM has just released a press release that indicates they have the new PowerPC 970 running at 1.8 to 2.5 GHz making it 'the fastest PowerPC so far.' IBM's original estimates were to have the chip running at 1.4 to 1.8 GHz at introduction, so this is very good news for those of us hoping Apple will use this as their next-generation chip."
Here you can find a more technical details than just press release.
Here is the actual spec about the PowerPC 970.
Ars Technica articles. Apparently, PPC 970 just last year's news. The real news is just the cranked-up speed...
--
Error 500: Internal sig error
"First of all, what is the processor that Apple using now? Isn't it some sort of PowerPC already? I see this one supports Altivec and I know that G3 and G4 Apple computers have the same instruction sets. Is this just another implementation, or is G3 and G4 relatives of this new processor?"
Apple does currently use a PowerPC processor in their computers. They have for the past eight years or so. Currently they're using the "750" edition, a'la G3 and G4, which are supplied by both IBM and Motorola.
"Second: what operating system does the IBM PowerPC run?"
The IBM machines with these series of microprocessors are things like the later generation AS/400s and RS/6000's. There are also some workstation machines (both badged as such and badged differently) with IBM PowerPCs in them. AS/400s use OS/400. RS/6000s can run many different OSes, including Linux and AIX.
"I suspect that the article is just confusing and processor itself is not made by IBM. Right??"
Wrong, at least on who makes the microprocessor. Motorola hasn't been doing so well lately, and even early on they had to deal with IBM to meet quota. IBM's hand in the PowerPC line is visible in Macintosh 5200's, which were common schoolroom computers that are starting to be end-of-lifed. They're dating back to August 1996 or so.
IBM had PL/1, with syntax worse than JOSS,
And everywhere the language went, it was a total loss...
From reading the specs it says:
9 Fetch, Decode Stages
5-13 OoO Execute Stages
2-3 Dispatch, Commit
So at total of 16-25 pipelined stages. I also notice that the longest(25) is for the Alti-Vec engine. This is very comparable to Pentium 4 which has 26 pipelined stages, although Pentium 4 does not have a vector engine.
Where the Music Matters
here is some info i found.. might help:c hdocs/A1387A29AC1C2AE087256C5200611780
SPECint2000
- 937 @ 1.8 GHz
SPECfp2000
- 1051 @ 1.8 GHz
Dhrystone MIPS
- 5220 @ 1.8 GHz
- 2.9 DMIPS / MHz
Additional Performance
- Peak scalar GFLOPS = 7.2
- Peak SIMD GFLOPS = 14.4
- RC5 : 18M keys/sec
Unfortunately at the very bottom it says that some of this are estimates.. here is the link where I got the info: http://www-3.ibm.com/chips/techlib/techlib.nsf/te
Because we're all too lazy to copy/paste
OK, Everyone who wants to understand which processor is fastest should really take a course on processors. Here's the (condensed) deal with the MHz myth:
All other things being equal, faster clock frequency = faster processor. The trick is in the magic words "all other things being equal". If I have a 1 GHz G4 and overclock it to 1.8GHz it will be faster. That's because the processor is using the exact same process but all the steps in the process suddenly take less time.
The problem is that no two processor designs are the same. RISC vs CISC isn't even the only consideration. There are cache sizes/locations, number of pipeline stages, number of pipelines, processor component layout, all kinds of crap. And thats just IN the processor. Motherboard designs don't even enter into my discussion.
PPC and x86 are very different, as well you know if you are a nerd (if you aren't then what are you doing here anyway?). But even processors that run the same instruction set are different enough that clock frequency doesn't necessarily dictate relative processing speed. This is why if you went to tom's hardware when the P4's first came out and looked at the benchmarks, initial P4's were rated as slower than P3's which were running at a SLOWER clock frequency. And I don't think I have to tell you about AMD vs. Intel processors at equal clock speeds.
The point is that clock frequency is a number that represents something that is actually going on inside your processor. It doesn't always accurately represent speeds relative to other processors, but its a pretty good heuristic when used wisely. If you're comparing the speed of different P4's you wouldn't be in error if you said "I want a 2.6GHz P4 because its faster than a 2.2GHz P4". However, you probably would be in error if you said "I want a 2.6GHz P4 because its faster than a 2.5GHz Power5".
The Statue of Liberty is America's lawn jockey.
Assuming the same bus speed (which is impossible, so take these numbers to be within, say, one hundred points of reality) and linear performance progression, the 2.5GHz chip should have:
;)
SPECint2000 =
937 / 1.8 = 520.5 points/GHz * 2.5
Estimated Score ~= 1300
Average P4@3.0GHz score ~= 1080 (the 970 = 20% faster)
SPECfp2000 =
1051 / 1.8 = 583.9 points/GHz * 2.5
Estimated Score ~= 1460
Average P4@3.0GHz score ~= 1100 (the 970 = 33% faster)
RC5 =
18 / 1.8 = 10 * 2.5
Estimated Score ~= 25M keys/sec
Average P4@3.0GHz score ~= 4.3M keys/sec (the 970 = 581% faster)
Take these numbers with a grain of salt, but they're somewhat interesting. I like the RC5 score, especially.
Yeah you're right I didn't account for MMX and SSE.
However there is little comparison.
Alti-Vec
# 32 separate Registers
# 128 bits per register
# No interference with FP registers
# no context or mode switching
# max throughput: 8 Flops / cycle
MMX/SSE
# 8 MMX registers shared with the FPU, 8 for SSE
# 64 bits per mmx register, 128 bits per xmm register
# MMX stalls the FP registers
# context switching required for MMX
# max throughput: 2 Flops / cycle
When you are playing a 3D game do you really want your FPU stalled for vector calculations?
To be fair, you could program your 3D game to do all FPU calculations in SSE. gcc has an option to do this automatically now. And SSE2 is one step ahead of AltiVec in one regard - it supports a few double-precision operations.
But aside from those two nitpicks, I agree completely. I've hand-optimized code for both Pentium/SSE and G4/AltiVec and there's no comparison: SSE provides a small performance boost for a lot of work, while AltiVec provides a large performance boost for a little bit of work. AltiVec has very fancy shift, rotate, and shuffle instructions that are completely lacking in SSE. These are useful for more than just RC5 - they're totally necessary to vectorize many more complicated algorithms without the overhead of putting the data in the right place eating up any potential speed gains.
That's why the 970 in a Mac will easily beat the P4 in a number of tests: Apple has optimized hundreds of system calls to use AltiVec already, so many programs get the speed gain automatically.