Inside the PowerPC 970
daveschroeder writes "Jon "Hannibal" Stokes has posted a long-awaited, very detailed analysis of the IBM PowerPC 970 at Ars Technica. Notable quote: 'The 970 was made for Apple'."
← Back to Stories (view on slashdot.org)
See here More on the power pc posted yesterday
PAGERANK++ Robsell.com
Its like comparing apples with apples... Its a dupe.
It's ok, the last post didn't get enough comments. Please continue discussion.
Interesting idea to say that the vector units were "hacked" onto the power arcitechture... and this being the reason therefore this chip is designed for apple...
||| I still can't believe Parkay's not butter.
Fast forward a few months....hmm...a few options:
Sun: Nice hardware, very expensive, CDE.
AMD: Commodity hardware, cheap, WinXP.
HP: Intel hardware, very expensive, CDE or WinXP.
I think I know what I'd buy.
Of course, the Athlon64/Opteron would get quite a bit of consideration due to my hobbies.
But I think it'd end up being the Mac.
altivec units are special 128bit registers that can be used for many different optimizations.
||| I still can't believe Parkay's not butter.
And I even e-mailed daddypants to notify him that it was a dupe.
If tits were wings it'd be flying around.
altivec units are special 128bit registers that can be used for many different optimizations.
for instance?
... in The Matrix. That strange feeling of deja vu can only mean one thing! Either that or the /. editors are asleep at the wheel again.
Floating point ops, optimized for graphics processing and things like compression (jpeg, mpeg, mp3). If you read the Ars article he waxes on about it's superiority over MMX/SSE/SSE2.
I understand that a while ago there was some competition between IBM and Motorola about whose chip would be the G5. Was Motorola ever a serious contender, and if so, has Apple decided on IBM? I haven't heard much about Motorola for some time.
Also: "Water is special stuff that makes stuff float."
"The CPU does important stuff."
For all of your "What is AltiVec?" needs, check this out instead:
http://www.motorola.com/SPS/PowerPC/AltiVec/
Mikey-San
Karma: +Eleventy billion (mostly affected by watching Celebrity Jeopardy)
Apple'd be putting DDR400 on the G4 right now if they could. None of this (well, except the decision to go Moto) was their fault.
My real problem with the current G4e situation, aside from the 167 SDR FSB, is the fact that it's a shared bus topology, which is just ridiculous. To my knowledge, there's nothing stopping Apple from putting out a chipset that gives each G4e a dedicated FSB (even if it's still 167MHz SDR) to the chipset.
As far as the low MHz and SDR situation, I've also never been totally convinced that Apple wasn't partially to blame for this either, unless they just have zero clout with Moto SPS.
I believe that Hannibal mentions that the 970 is designed for SMP.. Clearly CmdrTaco is just testing its newest feature: you click post and the operation gets carried out by both processors.
Tierce
Tierce
Who sponsors your feelings?
Reading through the article, its nice to see some real design going into a processor. Looking through Intel's last few chips, they've been upping ther clock speed and packing in more cache.
Yeah, yeah, they are hog-tied because you can't easily re-compile the entire windows platform to use new instruction sets. Linux users, of course, don't have this problem (muhahahah).
Did anyone else catch the bit on the twin FPU's? I'm just imagining what this thing is going to do with vector operations and frequency transforms.
For most of you non-engineers:
-Most 3d vector operations are affine tranformations. Using a 4x4 array of floating point numbers you can translate, rotate, and scale. Works beautifully, but it's a lot of calculations.
-The Fast Fourier Transform (FFT) is used a lot in signal processing. It's a floating point monster.
It's a really fast verctor processing unit. It can do floating point manipulations blazingly quickly.
This is probably true and rather unfortunate. AltiVec is important for Apple marketing because it lets them claim impressive performance figures without actually needing to push the state of the art in terms of processor design further than Intel. It's also important for a few special-purpose applications (PhotoShop filters, etc.).
But the reality of regular high-end computing is that people don't have the time to optimize their software for the latest oddball hardware platform. And even something like a hand-coded vectorized BLAS library doesn't help because most scientific software still doesn't use such libraries.
I think this tradeoff doesn't even work well for Apple. Imagine how much better it would be if Apple could ship systems based on the 970 today, rather than after a few months additional delay due to AltiVec. And every dollar and watt that is shaved off the AltiVec price makes it a much more viable processor for servers and blades, which would get volume up and prices down. Gimmicks like AltiVec cost much more than they are worth, even for Apple.
Try doing audio signal processing or heavy graphics/video work.
You're pretty thankful for your Altivec then...
I saw such an insane improvement in Reaktor when it got Altivec enhanced...
i don't read slashdot anymore.
Eh, nevermind.
You'd think michael and Taco talked every once in a while - less than 24 hours between duping this one.
And why is the blue? Do I really have the floor to post a 50 page post on what is altivec?
||| I still can't believe Parkay's not butter.
Whining about dupe comments is worse than the whining in the dupe comments, and thus the point....don't bitch about the symptom, lobby to stop the source of the pain, and the whining will cease at the same time.
"But Mom, I don't want to go to France!" "Shut up and keep rowing!"
These guys get paid? I always assumed it was a run-from-my-mom's-basement operation - what else could explain the poor spelling, frequent dupes (god help you if it's +-5 days from April Fool's), and the ho'ing out of the site to Microsoft advertisements?
Altivec is Motorola's name for the vector processing unit. The unit handles SIMD commands. SIMD stands for Single Instruction, Multiple Data. Basically, intead of looping through a list of 50,000 values one by one and multiplying each value by PI for instance, you simply tell the CPU where the list is, and to multiply it by PI.
In a much simplified analogy, it's like lighting 200 candles with a flame thrower instead of one by one with a match.
Article X: The powers not delegated... by the Constitution...are reserved...to the people
In response to your FFT being a floating point monster... in a lot of cases, couldn't you turn it into an integer monster? I've been thinking about this, and it occurs to me that the vector can be decomposed into halves (thus the 2^x units in the FFT), but a vector and angle theta it can as easily be decomposed into to vectors half the length, one at angle phi, and the other at angle (2theta-2phi).
That, where phi is any angle. That being the case, it seems to me that you could pick your values phi to correspond to "perfect" triangles (3-4-5, ~42 degrees, for example), and keep your operations in the integer realm for everything except subtraction of angles.
I dunno, I haven't checked this out really thoroughly, and this is therefore probably nonsense. Last time I tried to do anything with the DFT, I thought I had something that blew the FFT away in terms of speed... precisely because I didn't understand the full FFT process, and its beautiful simplicity.
In reality, I got a very modest improvement over the FFT, not worth the extra code in my opinion.
My method was very different, involving a redefinition of the DFT matrix-vector combination, and had more work on paper, but fewer multiplications. But what I thought was (log2n)^2 instead of the DFT's N^2 order of magnitude multiplications, was really something like 0.87Nlog2N multiplications. FFT gets N*log2N multiplications.
Essentially, when I understood the FFT well, and applied my lessons to it, I ended up showing that not all the multiplications are neccesary. Some of the FFT multiplications are dupes just like this article, and there is a system for finding them, also just like this article. (Look for the multiplications posted by Taco.)
But the fact that I can make such errors means that I could be completely wrong about my supposed integer FFT.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
"Currently AMD has the fastest commodity SIMD implementation"
You've not been looking at the distributed.net results, have you? The Altivec/VMX technology currently used by Moto and soon to be used by IBM is LEAGUES ahead.
That was classic intercourse!
Link here. In your browser, find "CmdrTaco", click on the checkbox next to it, and then go to the bottom and click "submit" (rough translation from swahili: submit = "apply patch".
[JUUUUST kidding, don't do this or you won't see any more of CmdrTaco's articles.]
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
I really don't think it's possible for each of 30 people to be aware of all 30 other articles. Why not assign one person to read *all* articles, and flag dupes? Then everything has to be cleared by him, and we'll eliminate dupes. And if CmdrTaco or someone else has a reason that it should go up again, he can argue it out, and modify the headline accordingly, so the readers will also know why this should go up again.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
well, I supose that Apple will use two of these chips in their workstations at a time, and coupled with the extensive Altivec optimisation programme that Apple has been conducting, and their ever expanding list of grade 1 in-house applications (Final Cut Pro, DVD Studio Pro, Logic, Shake, iTunes, QuickTime) that really can benefit from those optimisations, the near future looks very encouraging for media creation on the Mac platform. The 64 bit addressing made available by the 970 may yet have profound implications for Apple's viability as a server/high end workstation provider - they could certainly finish off SGI's desktop business.
That was classic intercourse!
I iz not very technical but. I think it means faster without a noisy fan or a burnt lap.
If you haven't seen it, it's new to you.
omnia tua castra sunt nobis
Well, after the first round of follow-up whinings (level 3) are logged, there is an 'incident number' (IN) reset, and the next whiner (NW) in line goes to the front of the queue. Unless of course, you have already participated in first-round whining (levels 1;2;3), in which case you have to sit out.
Whining-by-proxy, substitute whining, pitch-whiners, designated whiners, ghost whiners and stand-in whiners are all permitted (first-round whiners, all levels), but only for Rhode Island, New Jersey, some parts of lower Manhattan and the District of Columbia.
See, your friend's enemy is your friend. No wait. Your friend's friend can also be your enemy. No...
/w IBM on their new chips. No more analysis :)
Ah frig'..
Apple is going along
--
"I'm not bright. Big words confuse me. But Wanda loves me and that should be enough for you." - Cosmo
"AMD is delivering fast SIMD today, not next year"
What ARE you blathering about? Pentium 4 has SSE2, PowerPC has Altivec - here's a clue for you, when people code for x86 SIMD, they choose MMX, SSE and SSE2, they don't choose 3D Now!, when people code for SIMD under PowerPc ISA, they choose Altivec. Both SSE2 and Altivec are available to day, both are used in "commodity" CPU families. I think you'll find that it's "x87" FPu strength that typically marks out AMD's current CPUs, not their patchy implementation of SSE2.
That was classic intercourse!
djb (of qmail fame) has some rather uncomplimentary things to say about the accuracy of FFTW's speed claims.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Bullshit. When I worked foy the University Daily Paper we had no problem avoiding duplicate stories all over the paper... And we ran FAR MORE THAN 30 STORIES A DAY.
In my example it was a bunch of drunk/high/rushing out to get laid coward students--Can't professionals who are being paid do their damn job right do AT LEAST as good as the wasted college kids?
Who did what now?
You just use fixed-point arithmetic instead of floating-point (i.e. a fixed 32 bits of precision, or 16 bits, or whatever). A simple way of doing is is to make INT_MAX/2 = 1.0, -INT_MAX/2 = -1.0, and everything in between scaled appropriately. (/2 to avoid overflow). Then you implement fixed-point addition, multiplication, division, and subtraction (as commonly doing in hardware DSP chips) and you've got yourself an integer-only FFT.
Some really old C code doing something along these lines is available here.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
I like how this duplicate was posted under the 'take-a-closer-look' department.
Never hit your grandmother with a shovel, for it leaves a bad impression on her mind...
Well, not really, but you're close. You can't just pass the Altivec unit an array of numbers and tell it to do some operation on them. Altivec (and MMX, etc) simply allows you to process the data in bigger chunks that normal.
Altivec can process 128 bits of data at a time. For example, it can add 16 8-bit integers to another 16 8-bit integers, resulting in yet another vector of 16 8-bit integers with a single instruction, rather than doing them one at a time.
What is wrong with CDE? It is simple and efficient. I use it on my Alpha and I am quite satisfied with it.
The best alternatives to CDE would be Indigo Magic (used on SGI Irix systems, perhaps the best desktop environment I have ever used) or Window Maker.
The hue of the sky is determined by a phenomenon known as the "Tyndall Effect", the scattering of light through a colloid by dust or molecules suspended in a transparent medium.
Note that the light scattering that determines what color you see isn't due to dust in the air, as some think, but rather oxygen and nitrogen molecules.
However, all we are, as Bill and Ted once pointed out, dust in the wind, dude.
</t-i-c>
Mikey-San
Karma: +Eleventy billion (mostly affected by watching Celebrity Jeopardy)
Agreed. I use Logic Audio a lot, and my 350 MHz G4 can run about 3 times more simultaneous real-time DSP plug-ins than my 500 MHz iBook. I realize the 66 MHz system bus on the iBook comes into play here (it's 100 MHz on my old G4), but my mom's iMac (400 MHz with 100 MHz system bus) performs about as well as my iBook.
I agree with this totally. A surprisingly large, and ever increasing, amount of OS X libraries use altivec, which means that developers using those libraries get some acceleration for free. Altivec is much easier to optimize stuff for then MMX, SSE2, etc.
There are also 7th graders who can spell and edit better than these guys. It's really embarrassing.
In spite of all these horrible shortcomings, we get all of the good things Slashdot provides for free. It's not perfect. They make mistakes.
Get over it. Anyone can be a sharpshooter, waiting for someone else to screw up. But it takes a lot of hard work and dedication to put something like Slashdot together. Cut the guys some slack, or create your own website and call it Gripeslash.
Read the EFF's Fair Use FAQ
For me, the most interesting part of the article concerns the pricing of the new machines as the real question. According to the author, the chip will make Apple machines technologically competitive. The question is, will Apple price them to gain market share, or continue to sell to a disappearing niche of luxury computer buyers.
Maybe Apple's concentration on developing software, and selling that software (rather than giving it away), along with its new business ventures, such as .Mac and the new iTunes online music store, point to a new business model that can afford to cut the margins on hardware.
If they don't lower the price of their machines -- the top ones, namely -- they will suffer, long-term. I don't think they need to be on par with PC's; I just think they cannot be too much more expensive than the PC's.
quiquid id est, timeo puellas et oscula dantes.
You've not been looking at the distributed.net results, have you?
With RC5-64 that was true. Unfortunately for RC5-72, no one has written an optimized Mac core yet so the PC versions are way faster.
-You may license this sig for only $6.99.
Light travels faster than sound. This is why some people appear bright until you hear them speak.........
Yeah, macs are a bit more expensive, but not really by much if you compare to comparable brands and quality, not self assembled garage models.
And I agree in theory on your pricing opinion, but it's just that in reality Apple have been pricing their machines in pretty much this way for 20 years, and they have made a very successful business out of it. They've also continuously been pronounced on the verge of death for all those 20 years.
So I don't expect either Apple pricing or their good fortunes to change anytime soon.
Odd. All the Mac scores must be cheated then.
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
wtf? hey maclots.... Just cause someone is criticising Altivec doesn't necessarily make them a troll....
I do agree with you that clustering could be far more useful than it currently is, but as you say, anything that requires low latency is kind of problematic...
As far as clustering goes, you know you're able to put together a PC processing monster and use VST System Link ?
Been considering this to add to my TiBook...
i don't read slashdot anymore.
If you use the Mac beta client (final release candidate, actually) you get a proper, Altivec optimised cruncher. Just as fast at RC5-72 as it was at 64.
That was classic intercourse!
My bad. I hadn't checked it in a while. Looks like they fixed that. Horrible Mac performance was a problem when RC5-72 first came out and that was the reason the tech support gave me.
-You may license this sig for only $6.99.