Inside the PowerPC 970
daveschroeder writes "Jon "Hannibal" Stokes has posted a long-awaited, very detailed analysis of the IBM PowerPC 970 at Ars Technica. Notable quote: 'The 970 was made for Apple'."
← Back to Stories (view on slashdot.org)
See here More on the power pc posted yesterday
PAGERANK++ Robsell.com
Its like comparing apples with apples... Its a dupe.
It's ok, the last post didn't get enough comments. Please continue discussion.
Fast forward a few months....hmm...a few options:
Sun: Nice hardware, very expensive, CDE.
AMD: Commodity hardware, cheap, WinXP.
HP: Intel hardware, very expensive, CDE or WinXP.
I think I know what I'd buy.
Of course, the Athlon64/Opteron would get quite a bit of consideration due to my hobbies.
But I think it'd end up being the Mac.
... in The Matrix. That strange feeling of deja vu can only mean one thing! Either that or the /. editors are asleep at the wheel again.
Floating point ops, optimized for graphics processing and things like compression (jpeg, mpeg, mp3). If you read the Ars article he waxes on about it's superiority over MMX/SSE/SSE2.
Also: "Water is special stuff that makes stuff float."
"The CPU does important stuff."
For all of your "What is AltiVec?" needs, check this out instead:
http://www.motorola.com/SPS/PowerPC/AltiVec/
Mikey-San
Karma: +Eleventy billion (mostly affected by watching Celebrity Jeopardy)
Apple'd be putting DDR400 on the G4 right now if they could. None of this (well, except the decision to go Moto) was their fault.
My real problem with the current G4e situation, aside from the 167 SDR FSB, is the fact that it's a shared bus topology, which is just ridiculous. To my knowledge, there's nothing stopping Apple from putting out a chipset that gives each G4e a dedicated FSB (even if it's still 167MHz SDR) to the chipset.
As far as the low MHz and SDR situation, I've also never been totally convinced that Apple wasn't partially to blame for this either, unless they just have zero clout with Moto SPS.
I believe that Hannibal mentions that the 970 is designed for SMP.. Clearly CmdrTaco is just testing its newest feature: you click post and the operation gets carried out by both processors.
Tierce
Tierce
Who sponsors your feelings?
Reading through the article, its nice to see some real design going into a processor. Looking through Intel's last few chips, they've been upping ther clock speed and packing in more cache.
Yeah, yeah, they are hog-tied because you can't easily re-compile the entire windows platform to use new instruction sets. Linux users, of course, don't have this problem (muhahahah).
Did anyone else catch the bit on the twin FPU's? I'm just imagining what this thing is going to do with vector operations and frequency transforms.
For most of you non-engineers:
-Most 3d vector operations are affine tranformations. Using a 4x4 array of floating point numbers you can translate, rotate, and scale. Works beautifully, but it's a lot of calculations.
-The Fast Fourier Transform (FFT) is used a lot in signal processing. It's a floating point monster.
I understand that a while ago there was some competition between IBM and Motorola about whose chip would be the G5. Was Motorola ever a serious contender, and if so, has Apple decided on IBM? I haven't heard much about Motorola for some time.
Mot actually had a G5 on the roadmap. They apparently got all the way to samples, but then ditched the effort. There never was a competition per se wrt the G5 name. There was a bit of friction over AltiVec, as IBM wanted to focus on clock speed and didn't think AV was worth the complexity (and hence why Mot came out with the G4 while IBM stuck with the G3). Motorola hasn't been serious about the mainstream cpu market for a while as they've been losing money on it. They'd rather focus on things like embedded proccies and cell phones (and related chips).
I don't know which came first, Mot ditching G5 so Apple pleads with IBM to come out with 970. Or Mot gets whiff of 970, so sees a way out of doing G5. Perhaps others more "in the know" can chime in?
This is probably true and rather unfortunate. AltiVec is important for Apple marketing because it lets them claim impressive performance figures without actually needing to push the state of the art in terms of processor design further than Intel. It's also important for a few special-purpose applications (PhotoShop filters, etc.).
But the reality of regular high-end computing is that people don't have the time to optimize their software for the latest oddball hardware platform. And even something like a hand-coded vectorized BLAS library doesn't help because most scientific software still doesn't use such libraries.
I think this tradeoff doesn't even work well for Apple. Imagine how much better it would be if Apple could ship systems based on the 970 today, rather than after a few months additional delay due to AltiVec. And every dollar and watt that is shaved off the AltiVec price makes it a much more viable processor for servers and blades, which would get volume up and prices down. Gimmicks like AltiVec cost much more than they are worth, even for Apple.
Officially, the PowerPC G5 is the Motorola PowerPC 8500 chip. So this would not be it. Apple may or may not call a computer that features IBM's PowerPC 970 the PowerMac G5 or PowerBook G5, but it wouldn't be the actual G5 chip. Although I don't think this chip officially has a G* name, I'd be more inclined to designate it the G6, since the G5 was actually a 32-bit chip.
Karma: Ran over your dogma.
Try doing audio signal processing or heavy graphics/video work.
You're pretty thankful for your Altivec then...
I saw such an insane improvement in Reaktor when it got Altivec enhanced...
i don't read slashdot anymore.
Whining about dupe comments is worse than the whining in the dupe comments, and thus the point....don't bitch about the symptom, lobby to stop the source of the pain, and the whining will cease at the same time.
"But Mom, I don't want to go to France!" "Shut up and keep rowing!"
Altivec is Motorola's name for the vector processing unit. The unit handles SIMD commands. SIMD stands for Single Instruction, Multiple Data. Basically, intead of looping through a list of 50,000 values one by one and multiplying each value by PI for instance, you simply tell the CPU where the list is, and to multiply it by PI.
In a much simplified analogy, it's like lighting 200 candles with a flame thrower instead of one by one with a match.
Article X: The powers not delegated... by the Constitution...are reserved...to the people
In response to your FFT being a floating point monster... in a lot of cases, couldn't you turn it into an integer monster? I've been thinking about this, and it occurs to me that the vector can be decomposed into halves (thus the 2^x units in the FFT), but a vector and angle theta it can as easily be decomposed into to vectors half the length, one at angle phi, and the other at angle (2theta-2phi).
That, where phi is any angle. That being the case, it seems to me that you could pick your values phi to correspond to "perfect" triangles (3-4-5, ~42 degrees, for example), and keep your operations in the integer realm for everything except subtraction of angles.
I dunno, I haven't checked this out really thoroughly, and this is therefore probably nonsense. Last time I tried to do anything with the DFT, I thought I had something that blew the FFT away in terms of speed... precisely because I didn't understand the full FFT process, and its beautiful simplicity.
In reality, I got a very modest improvement over the FFT, not worth the extra code in my opinion.
My method was very different, involving a redefinition of the DFT matrix-vector combination, and had more work on paper, but fewer multiplications. But what I thought was (log2n)^2 instead of the DFT's N^2 order of magnitude multiplications, was really something like 0.87Nlog2N multiplications. FFT gets N*log2N multiplications.
Essentially, when I understood the FFT well, and applied my lessons to it, I ended up showing that not all the multiplications are neccesary. Some of the FFT multiplications are dupes just like this article, and there is a system for finding them, also just like this article. (Look for the multiplications posted by Taco.)
But the fact that I can make such errors means that I could be completely wrong about my supposed integer FFT.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
"Currently AMD has the fastest commodity SIMD implementation"
You've not been looking at the distributed.net results, have you? The Altivec/VMX technology currently used by Moto and soon to be used by IBM is LEAGUES ahead.
That was classic intercourse!
well, G5 was written on a roadmap I once saw describing the Motorola MPC 85xx series - but seeing as how that series has been on sale for about a year now with no sign of a CPU variant suitable for Apple to use, I guess we can forget about the MPC 85xx being used in PowerMacs. Personally, I'd like Apple to adopt the moniker "G64" for their PPC 970 powered machines - that'd stick it to Intel alright, and the idiotic warez kids would stop comparing clock speed and start comparing word length instead of getting on with their lives.
That was classic intercourse!
I iz not very technical but. I think it means faster without a noisy fan or a burnt lap.
"AMD is delivering fast SIMD today, not next year"
What ARE you blathering about? Pentium 4 has SSE2, PowerPC has Altivec - here's a clue for you, when people code for x86 SIMD, they choose MMX, SSE and SSE2, they don't choose 3D Now!, when people code for SIMD under PowerPc ISA, they choose Altivec. Both SSE2 and Altivec are available to day, both are used in "commodity" CPU families. I think you'll find that it's "x87" FPu strength that typically marks out AMD's current CPUs, not their patchy implementation of SSE2.
That was classic intercourse!
Bullshit. When I worked foy the University Daily Paper we had no problem avoiding duplicate stories all over the paper... And we ran FAR MORE THAN 30 STORIES A DAY.
In my example it was a bunch of drunk/high/rushing out to get laid coward students--Can't professionals who are being paid do their damn job right do AT LEAST as good as the wasted college kids?
Who did what now?
You just use fixed-point arithmetic instead of floating-point (i.e. a fixed 32 bits of precision, or 16 bits, or whatever). A simple way of doing is is to make INT_MAX/2 = 1.0, -INT_MAX/2 = -1.0, and everything in between scaled appropriately. (/2 to avoid overflow). Then you implement fixed-point addition, multiplication, division, and subtraction (as commonly doing in hardware DSP chips) and you've got yourself an integer-only FFT.
Some really old C code doing something along these lines is available here.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Well, not really, but you're close. You can't just pass the Altivec unit an array of numbers and tell it to do some operation on them. Altivec (and MMX, etc) simply allows you to process the data in bigger chunks that normal.
Altivec can process 128 bits of data at a time. For example, it can add 16 8-bit integers to another 16 8-bit integers, resulting in yet another vector of 16 8-bit integers with a single instruction, rather than doing them one at a time.
The hue of the sky is determined by a phenomenon known as the "Tyndall Effect", the scattering of light through a colloid by dust or molecules suspended in a transparent medium.
Note that the light scattering that determines what color you see isn't due to dust in the air, as some think, but rather oxygen and nitrogen molecules.
However, all we are, as Bill and Ted once pointed out, dust in the wind, dude.
</t-i-c>
Mikey-San
Karma: +Eleventy billion (mostly affected by watching Celebrity Jeopardy)
I agree with this totally. A surprisingly large, and ever increasing, amount of OS X libraries use altivec, which means that developers using those libraries get some acceleration for free. Altivec is much easier to optimize stuff for then MMX, SSE2, etc.
For me, the most interesting part of the article concerns the pricing of the new machines as the real question. According to the author, the chip will make Apple machines technologically competitive. The question is, will Apple price them to gain market share, or continue to sell to a disappearing niche of luxury computer buyers.
Maybe Apple's concentration on developing software, and selling that software (rather than giving it away), along with its new business ventures, such as .Mac and the new iTunes online music store, point to a new business model that can afford to cut the margins on hardware.
If they don't lower the price of their machines -- the top ones, namely -- they will suffer, long-term. I don't think they need to be on par with PC's; I just think they cannot be too much more expensive than the PC's.
quiquid id est, timeo puellas et oscula dantes.
wtf? hey maclots.... Just cause someone is criticising Altivec doesn't necessarily make them a troll....
I do agree with you that clustering could be far more useful than it currently is, but as you say, anything that requires low latency is kind of problematic...
As far as clustering goes, you know you're able to put together a PC processing monster and use VST System Link ?
Been considering this to add to my TiBook...
i don't read slashdot anymore.