Inside the PowerPC 970
daveschroeder writes "Jon "Hannibal" Stokes has posted a long-awaited, very detailed analysis of the IBM PowerPC 970 at Ars Technica. Notable quote: 'The 970 was made for Apple'."
← Back to Stories (view on slashdot.org)
Fast forward a few months....hmm...a few options:
Sun: Nice hardware, very expensive, CDE.
AMD: Commodity hardware, cheap, WinXP.
HP: Intel hardware, very expensive, CDE or WinXP.
I think I know what I'd buy.
Of course, the Athlon64/Opteron would get quite a bit of consideration due to my hobbies.
But I think it'd end up being the Mac.
Reading through the article, its nice to see some real design going into a processor. Looking through Intel's last few chips, they've been upping ther clock speed and packing in more cache.
Yeah, yeah, they are hog-tied because you can't easily re-compile the entire windows platform to use new instruction sets. Linux users, of course, don't have this problem (muhahahah).
Did anyone else catch the bit on the twin FPU's? I'm just imagining what this thing is going to do with vector operations and frequency transforms.
For most of you non-engineers:
-Most 3d vector operations are affine tranformations. Using a 4x4 array of floating point numbers you can translate, rotate, and scale. Works beautifully, but it's a lot of calculations.
-The Fast Fourier Transform (FFT) is used a lot in signal processing. It's a floating point monster.
This is probably true and rather unfortunate. AltiVec is important for Apple marketing because it lets them claim impressive performance figures without actually needing to push the state of the art in terms of processor design further than Intel. It's also important for a few special-purpose applications (PhotoShop filters, etc.).
But the reality of regular high-end computing is that people don't have the time to optimize their software for the latest oddball hardware platform. And even something like a hand-coded vectorized BLAS library doesn't help because most scientific software still doesn't use such libraries.
I think this tradeoff doesn't even work well for Apple. Imagine how much better it would be if Apple could ship systems based on the 970 today, rather than after a few months additional delay due to AltiVec. And every dollar and watt that is shaved off the AltiVec price makes it a much more viable processor for servers and blades, which would get volume up and prices down. Gimmicks like AltiVec cost much more than they are worth, even for Apple.
But you don't need altivec for a next gen chip that's so much faster than it's predecesor that no one would notice. There is a huge disparity in the performance of current G4's and whatever they're gonna call the 970 based machines.
More than 3 people have ripped music in iTunes. Then there's the tremendous acceleration it provides for encoding DVDs, Final Cut Pro's real-time effects, BLAST, and plenty more. It's not even close to just Photoshop.
How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
But if you spend the same $3800 on x86 hardware, you get a small compute cluster that runs a lot of software faster and without AltiVec optimizations. For most scientific applications, as well as most video and audio applications, that's probably a better deal in terms of bang-for-the-buck, but, admittedly, it's probably not something your average Mac user wants to set up.
well, G5 was written on a roadmap I once saw describing the Motorola MPC 85xx series - but seeing as how that series has been on sale for about a year now with no sign of a CPU variant suitable for Apple to use, I guess we can forget about the MPC 85xx being used in PowerMacs. Personally, I'd like Apple to adopt the moniker "G64" for their PPC 970 powered machines - that'd stick it to Intel alright, and the idiotic warez kids would stop comparing clock speed and start comparing word length instead of getting on with their lives.
That was classic intercourse!
You just use fixed-point arithmetic instead of floating-point (i.e. a fixed 32 bits of precision, or 16 bits, or whatever). A simple way of doing is is to make INT_MAX/2 = 1.0, -INT_MAX/2 = -1.0, and everything in between scaled appropriately. (/2 to avoid overflow). Then you implement fixed-point addition, multiplication, division, and subtraction (as commonly doing in hardware DSP chips) and you've got yourself an integer-only FFT.
Some really old C code doing something along these lines is available here.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Coding for a cluster introduces all kinds of communication and synchronisation headaches, especially since it takes such a long time to communicate between nodes (1ms is a very long time in terms of a CPU).
I am TheRaven on Soylent News
I agree with this totally. A surprisingly large, and ever increasing, amount of OS X libraries use altivec, which means that developers using those libraries get some acceleration for free. Altivec is much easier to optimize stuff for then MMX, SSE2, etc.