Inside the PowerPC 970
daveschroeder writes "Jon "Hannibal" Stokes has posted a long-awaited, very detailed analysis of the IBM PowerPC 970 at Ars Technica. Notable quote: 'The 970 was made for Apple'."
← Back to Stories (view on slashdot.org)
Interesting idea to say that the vector units were "hacked" onto the power arcitechture... and this being the reason therefore this chip is designed for apple...
||| I still can't believe Parkay's not butter.
Whoa! A duplicate article, this I've seen before. But this is nuts!
I found the meaning of life the other day, but I had write-only access.
Try doing audio signal processing or heavy graphics/video work.
You're pretty thankful for your Altivec then...
I saw such an insane improvement in Reaktor when it got Altivec enhanced...
i don't read slashdot anymore.
But the reality of regular high-end computing is that people don't have the time to optimize their software for the latest oddball hardware platform. And even something like a hand-coded vectorized BLAS library doesn't help because most scientific software still doesn't use such libraries.
ATLAS is a BLAS implementation that is tuned for each system that it runs on. The people at Mathworks use this as the underlying BLAS system in Matlab. Mathematica Maple, etc. use this as well. There is even a G4/AltiVec optimized version available here. This is the whole point of layered software.
AltiVec is nice for somethings.
My iTunes ripping of mp3s nearly tripled when I went from a 466 MHz G3 to a 400 MHz G4 due to iTunes being optimized for AltiVec.
Some Photoshop actions and filters see up to 800% improvments.
Running iMovie exports on a 600 MHz G3 iMac take 2-300% longer than on a 400 or 500 MHz G4
Don't confuse "new" with "state of the art". The former is just something that hasn't been done before. The latter is something that yields "impressive performance figures". If Altivec is competitive with Intel, then it is state of the art, by definition, even if it's 20 years old. The CPU cache is a decades old concept, yet CPUs with caches are still state of the art.
Imagine how much better it would be if Apple could ship systems based on the 970 today, rather than after a few months additional delay due to AltiVec.
Don't underestimate the cost of software. Your idea is expensive, because it requires software vendors to maintain two different versions of their code. This can lead to buggier or more expensive products, or it can lead to the "abandonment" of the G4 installed base. That could easily be worth the few months for Apple.
I did not have much trouble getting GNOME working on a HP B180 with HP-UX 10.20 ( compiled with acc, HP-UXs standard compiler). BTW that is a 180MHz PA-RISC machine. It kicked a 1GHz Pentium based workstations butt, even after I put Gentoo on the Intel box ( it original had only Windows NT, shudder). Fast clock rates can't compensate for a moronic architecture in the hands of heavily multitasking users like me.
In response to your FFT being a floating point monster... in a lot of cases, couldn't you turn it into an integer monster? I've been thinking about this, and it occurs to me that the vector can be decomposed into halves (thus the 2^x units in the FFT), but a vector and angle theta it can as easily be decomposed into to vectors half the length, one at angle phi, and the other at angle (2theta-2phi).
That, where phi is any angle. That being the case, it seems to me that you could pick your values phi to correspond to "perfect" triangles (3-4-5, ~42 degrees, for example), and keep your operations in the integer realm for everything except subtraction of angles.
I dunno, I haven't checked this out really thoroughly, and this is therefore probably nonsense. Last time I tried to do anything with the DFT, I thought I had something that blew the FFT away in terms of speed... precisely because I didn't understand the full FFT process, and its beautiful simplicity.
In reality, I got a very modest improvement over the FFT, not worth the extra code in my opinion.
My method was very different, involving a redefinition of the DFT matrix-vector combination, and had more work on paper, but fewer multiplications. But what I thought was (log2n)^2 instead of the DFT's N^2 order of magnitude multiplications, was really something like 0.87Nlog2N multiplications. FFT gets N*log2N multiplications.
Essentially, when I understood the FFT well, and applied my lessons to it, I ended up showing that not all the multiplications are neccesary. Some of the FFT multiplications are dupes just like this article, and there is a system for finding them, also just like this article. (Look for the multiplications posted by Taco.)
But the fact that I can make such errors means that I could be completely wrong about my supposed integer FFT.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Bullshit. When I worked foy the University Daily Paper we had no problem avoiding duplicate stories all over the paper... And we ran FAR MORE THAN 30 STORIES A DAY.
In my example it was a bunch of drunk/high/rushing out to get laid coward students--Can't professionals who are being paid do their damn job right do AT LEAST as good as the wasted college kids?
Who did what now?
For me, the most interesting part of the article concerns the pricing of the new machines as the real question. According to the author, the chip will make Apple machines technologically competitive. The question is, will Apple price them to gain market share, or continue to sell to a disappearing niche of luxury computer buyers.
Maybe Apple's concentration on developing software, and selling that software (rather than giving it away), along with its new business ventures, such as .Mac and the new iTunes online music store, point to a new business model that can afford to cut the margins on hardware.
If they don't lower the price of their machines -- the top ones, namely -- they will suffer, long-term. I don't think they need to be on par with PC's; I just think they cannot be too much more expensive than the PC's.
quiquid id est, timeo puellas et oscula dantes.
wtf? hey maclots.... Just cause someone is criticising Altivec doesn't necessarily make them a troll....
I do agree with you that clustering could be far more useful than it currently is, but as you say, anything that requires low latency is kind of problematic...
As far as clustering goes, you know you're able to put together a PC processing monster and use VST System Link ?
Been considering this to add to my TiBook...
i don't read slashdot anymore.