Inside the PowerPC 970
daveschroeder writes "Jon "Hannibal" Stokes has posted a long-awaited, very detailed analysis of the IBM PowerPC 970 at Ars Technica. Notable quote: 'The 970 was made for Apple'."
← Back to Stories (view on slashdot.org)
See here More on the power pc posted yesterday
PAGERANK++ Robsell.com
Its like comparing apples with apples... Its a dupe.
It's ok, the last post didn't get enough comments. Please continue discussion.
Interesting idea to say that the vector units were "hacked" onto the power arcitechture... and this being the reason therefore this chip is designed for apple...
||| I still can't believe Parkay's not butter.
and where would I want to use it?
How is it unlike normal CPU opcode processing?
Fast forward a few months....hmm...a few options:
Sun: Nice hardware, very expensive, CDE.
AMD: Commodity hardware, cheap, WinXP.
HP: Intel hardware, very expensive, CDE or WinXP.
I think I know what I'd buy.
Of course, the Athlon64/Opteron would get quite a bit of consideration due to my hobbies.
But I think it'd end up being the Mac.
And I even e-mailed daddypants to notify him that it was a dupe.
If tits were wings it'd be flying around.
... in The Matrix. That strange feeling of deja vu can only mean one thing! Either that or the /. editors are asleep at the wheel again.
I've never been one of the people that scream when a dupe is posted (the other is still on the main page!), but this is frickin' ridiculous. It happens SO OFTEN. OSDN should dock Taco and them a week of pay every time they post a dupe, two weeks if the other story is still on the main page. I realize it isn't 'pro' news source like NYT or CNN but there are 7th graders geeks that could avoid posting the same story twice in 24 hours...
I understand that a while ago there was some competition between IBM and Motorola about whose chip would be the G5. Was Motorola ever a serious contender, and if so, has Apple decided on IBM? I haven't heard much about Motorola for some time.
Apple'd be putting DDR400 on the G4 right now if they could. None of this (well, except the decision to go Moto) was their fault.
My real problem with the current G4e situation, aside from the 167 SDR FSB, is the fact that it's a shared bus topology, which is just ridiculous. To my knowledge, there's nothing stopping Apple from putting out a chipset that gives each G4e a dedicated FSB (even if it's still 167MHz SDR) to the chipset.
As far as the low MHz and SDR situation, I've also never been totally convinced that Apple wasn't partially to blame for this either, unless they just have zero clout with Moto SPS.
I believe that Hannibal mentions that the 970 is designed for SMP.. Clearly CmdrTaco is just testing its newest feature: you click post and the operation gets carried out by both processors.
Tierce
Tierce
Who sponsors your feelings?
Reading through the article, its nice to see some real design going into a processor. Looking through Intel's last few chips, they've been upping ther clock speed and packing in more cache.
Yeah, yeah, they are hog-tied because you can't easily re-compile the entire windows platform to use new instruction sets. Linux users, of course, don't have this problem (muhahahah).
Did anyone else catch the bit on the twin FPU's? I'm just imagining what this thing is going to do with vector operations and frequency transforms.
For most of you non-engineers:
-Most 3d vector operations are affine tranformations. Using a 4x4 array of floating point numbers you can translate, rotate, and scale. Works beautifully, but it's a lot of calculations.
-The Fast Fourier Transform (FFT) is used a lot in signal processing. It's a floating point monster.
No, it's not really ironic.
Also, it's Ford Motor Company, not Ford Motor Works.
This is probably true and rather unfortunate. AltiVec is important for Apple marketing because it lets them claim impressive performance figures without actually needing to push the state of the art in terms of processor design further than Intel. It's also important for a few special-purpose applications (PhotoShop filters, etc.).
But the reality of regular high-end computing is that people don't have the time to optimize their software for the latest oddball hardware platform. And even something like a hand-coded vectorized BLAS library doesn't help because most scientific software still doesn't use such libraries.
I think this tradeoff doesn't even work well for Apple. Imagine how much better it would be if Apple could ship systems based on the 970 today, rather than after a few months additional delay due to AltiVec. And every dollar and watt that is shaved off the AltiVec price makes it a much more viable processor for servers and blades, which would get volume up and prices down. Gimmicks like AltiVec cost much more than they are worth, even for Apple.
I'm assuming that the new procs must have some kind of support for the evil bit.
speaking of the matrix.....things are not as they seem!!!! bad guys are good good guys are bad.
I am the Alpha and the Omega-3
Try doing audio signal processing or heavy graphics/video work.
You're pretty thankful for your Altivec then...
I saw such an insane improvement in Reaktor when it got Altivec enhanced...
i don't read slashdot anymore.
Important Stuff: * Please try to keep posts on topic.
* Try to reply to other people's comments instead of starting new threads.
* Read other people's messages before posting your own to avoid simply duplicating what has already been said.
* Use a clear subject that describes what your message is about.
* Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything, even moderated posts, by adjusting your threshold on the User Preferences Page)
* If you want replies to your comments sent to you, consider logging in or creating an account.
Unfortunately, the vector performance of the G4e has been consistently bottlenecked by Apple's lackluster motherboard and chipset designs--specifically the anemic frontside bus and memory subsystems that Apple has saddled the PowerMac line with.
This implies that the decision of how much bus bandwidth to give the G4e was up to Apple - which it was not. Motorola designed the processor (for Cisco, depending on who you believe), and Apple made do with the anemic MaxBus at 133mhz that they got from Motorola.
Apple'd be putting DDR400 on the G4 right now if they could. None of this (well, except the decision to go Moto) was their fault.
*****
Why this had to be posted the morning before my presentation to my supervizor is a clear indication that the universe is against me.
Time to hide my network cable until the end of the day.
*****
Who knows whether it will still be competitive in several months when they actually want to offer it.
On the other hand Apple users won't have much of a choice, and neither has Apple.
*****
The PPC 970 will not really make the Macintosh competitive with modern PC's. It will make it competitive with PC's from the beginning of this year, which are not the fastest available any more, and will be even slower when compared to the machines that are available when the PPC 970 ships, which is the very earliest that Apple machines based on it can ship. It will however go a long way to catching up, and take off a lot of the pressure caused by the abominable performance of today's dual processor G4 machines when compared to even inexpensive PC's.
The other unkown in this is the price. PPC 970 based Apple computers may be significantly more expensive. Motorola loses hundreds of millions of dollars each year on their semiconductor business, and IBM does as well. Still, IBM may want to look at Apple and the PPC 970 as a PROFIT center, rather than a LOSS center, like Motorola does with Apple and the G4.
The PPC 970 is great news for Apple, but it is still a bone thrown to them while the x86 PC is feasting on the meat of the Intel and AMD processors.
*****
Reading through the article, its nice to see some real design going into a processor. Looking through Intel's last few chips, they've been upping ther clock speed and packing in more cache.
Yeah, yeah, they are hog-tied because you can't easily re-compile the entire windows platform to use new instruction sets. Linux users, of course, don't have this problem (muhahahah).
Did anyone else catch the bit on the twin FPU's? I'm just imagining what this thing is going to do with vector operations and frequency transforms.
For most of you non-engineers:
-Most 3d vector operations are affine tranformations. Using a 4x4 array of floating point numbers you can translate, rotate, and scale. Works beautifully, but it's a lot of calculations.
-The Fast Fourier Transform (FFT) is used a lot in signal processing. It's a floating point monster.
*****
Eh, nevermind.
You'd think michael and Taco talked every once in a while - less than 24 hours between duping this one.
Whining about dupe comments is worse than the whining in the dupe comments, and thus the point....don't bitch about the symptom, lobby to stop the source of the pain, and the whining will cease at the same time.
"But Mom, I don't want to go to France!" "Shut up and keep rowing!"
I fail to see how the parent post was at all "redundant."
This is the second time it's posted. Google News could almost do better than the /. editors.
A quick google for SPEC(int|fp)2000 values for Intel P4 (http://www.aceshardware.com/SPECmine/) shows the P4 3000's SPEC numbers are around 1200. So. What's so cool about this?
Computers are like air conditioners.
- They stop working when you open Windows.
In response to your FFT being a floating point monster... in a lot of cases, couldn't you turn it into an integer monster? I've been thinking about this, and it occurs to me that the vector can be decomposed into halves (thus the 2^x units in the FFT), but a vector and angle theta it can as easily be decomposed into to vectors half the length, one at angle phi, and the other at angle (2theta-2phi).
That, where phi is any angle. That being the case, it seems to me that you could pick your values phi to correspond to "perfect" triangles (3-4-5, ~42 degrees, for example), and keep your operations in the integer realm for everything except subtraction of angles.
I dunno, I haven't checked this out really thoroughly, and this is therefore probably nonsense. Last time I tried to do anything with the DFT, I thought I had something that blew the FFT away in terms of speed... precisely because I didn't understand the full FFT process, and its beautiful simplicity.
In reality, I got a very modest improvement over the FFT, not worth the extra code in my opinion.
My method was very different, involving a redefinition of the DFT matrix-vector combination, and had more work on paper, but fewer multiplications. But what I thought was (log2n)^2 instead of the DFT's N^2 order of magnitude multiplications, was really something like 0.87Nlog2N multiplications. FFT gets N*log2N multiplications.
Essentially, when I understood the FFT well, and applied my lessons to it, I ended up showing that not all the multiplications are neccesary. Some of the FFT multiplications are dupes just like this article, and there is a system for finding them, also just like this article. (Look for the multiplications posted by Taco.)
But the fact that I can make such errors means that I could be completely wrong about my supposed integer FFT.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Link here. In your browser, find "CmdrTaco", click on the checkbox next to it, and then go to the bottom and click "submit" (rough translation from swahili: submit = "apply patch".
[JUUUUST kidding, don't do this or you won't see any more of CmdrTaco's articles.]
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
I really don't think it's possible for each of 30 people to be aware of all 30 other articles. Why not assign one person to read *all* articles, and flag dupes? Then everything has to be cleared by him, and we'll eliminate dupes. And if CmdrTaco or someone else has a reason that it should go up again, he can argue it out, and modify the headline accordingly, so the readers will also know why this should go up again.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
If you haven't seen it, it's new to you.
omnia tua castra sunt nobis
Well, after the first round of follow-up whinings (level 3) are logged, there is an 'incident number' (IN) reset, and the next whiner (NW) in line goes to the front of the queue. Unless of course, you have already participated in first-round whining (levels 1;2;3), in which case you have to sit out.
Whining-by-proxy, substitute whining, pitch-whiners, designated whiners, ghost whiners and stand-in whiners are all permitted (first-round whiners, all levels), but only for Rhode Island, New Jersey, some parts of lower Manhattan and the District of Columbia.
See, your friend's enemy is your friend. No wait. Your friend's friend can also be your enemy. No...
/w IBM on their new chips. No more analysis :)
Ah frig'..
Apple is going along
--
"I'm not bright. Big words confuse me. But Wanda loves me and that should be enough for you." - Cosmo
djb (of qmail fame) has some rather uncomplimentary things to say about the accuracy of FFTW's speed claims.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Bullshit. When I worked foy the University Daily Paper we had no problem avoiding duplicate stories all over the paper... And we ran FAR MORE THAN 30 STORIES A DAY.
In my example it was a bunch of drunk/high/rushing out to get laid coward students--Can't professionals who are being paid do their damn job right do AT LEAST as good as the wasted college kids?
Who did what now?
You just use fixed-point arithmetic instead of floating-point (i.e. a fixed 32 bits of precision, or 16 bits, or whatever). A simple way of doing is is to make INT_MAX/2 = 1.0, -INT_MAX/2 = -1.0, and everything in between scaled appropriately. (/2 to avoid overflow). Then you implement fixed-point addition, multiplication, division, and subtraction (as commonly doing in hardware DSP chips) and you've got yourself an integer-only FFT.
Some really old C code doing something along these lines is available here.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Motorola is much better than IBM as far a chip companies go and I thought you were going to buy MOTOROLA
If "Makes crap chips like G4e's and doesn't bring them into the 21st century" is much better than "Makes awesome chips like 64bit 970s" then you have a pretty fucked up definition of "much better"
I like how this duplicate was posted under the 'take-a-closer-look' department.
Never hit your grandmother with a shovel, for it leaves a bad impression on her mind...
What is wrong with CDE? It is simple and efficient. I use it on my Alpha and I am quite satisfied with it.
The best alternatives to CDE would be Indigo Magic (used on SGI Irix systems, perhaps the best desktop environment I have ever used) or Window Maker.
Sun: crappy hardware; moderately priced; Solaris/ free UNIX
AMD (multiple vendors): decent hardware; relatively inexpensive; Windows/ free UNIX
Intel (multiple vendors): good hardware; moderately priced; Windows/ HP/UX/ free UNIX
IBM: good hardware; expensive; AIX/ free UNIX
Apple: decent hardware; unknown price (probably inexpensive to moderately priced); OS X/ free UNIX
It really depends on your goals and your applications. If you just want the cheapest, Apple or AMD hardware would be the best bet. Keep in mind that AMD hardware won't really be cheap until the Athlon 64 is introduced later this year; right now it is still inexpensive only in comparison to other 64-bit workstations. Apple pricing is unknown and will likely depend somewhat upon the prices that IBM gives them. Motorola bleeds lots of cash on their behalf; IBM may not be willing to do the same.
On the other hand if you were looking for a 64-bit workstation, you would probably consider Sun if your company already has lots of Sun systems, and the other vendors depending upon which one excels in benchmarks related to your application (AMD, IBM and Intel processors all have slightly different strong points; you really need to benchmark to find out which is best for you) and the money available.
Apple may be a good choice for content creation. It is likely to be less expensive, but also slower, than the competition from AMD, IBM and Intel. Lack of workstation applications may be a barrier for adoption to other workstation tasks.
Agreed. I use Logic Audio a lot, and my 350 MHz G4 can run about 3 times more simultaneous real-time DSP plug-ins than my 500 MHz iBook. I realize the 66 MHz system bus on the iBook comes into play here (it's 100 MHz on my old G4), but my mom's iMac (400 MHz with 100 MHz system bus) performs about as well as my iBook.
I agree with this totally. A surprisingly large, and ever increasing, amount of OS X libraries use altivec, which means that developers using those libraries get some acceleration for free. Altivec is much easier to optimize stuff for then MMX, SSE2, etc.
I just had a thought that would probably reduce the number of duplicates significantly. At the point where the author hits submit, the story checks all of the links in the article and then does a comparison of those against the complete story archive. At that point if the article has a duplicate it will find it and let them know.
A problem with this method is what may happen with this story. After the editors typically link directly to the story they also link to the referring domain. A way around this is that if there is more than one link in the story then any link to a base level site (ie. www.arstechnica.com) is ignored and the links focus into the sites pages and folders are paid attention to.
I may later play around with a couple of searches of the database to see how well this method works, but I think it would improve things significantly.
There are also 7th graders who can spell and edit better than these guys. It's really embarrassing.
In spite of all these horrible shortcomings, we get all of the good things Slashdot provides for free. It's not perfect. They make mistakes.
Get over it. Anyone can be a sharpshooter, waiting for someone else to screw up. But it takes a lot of hard work and dedication to put something like Slashdot together. Cut the guys some slack, or create your own website and call it Gripeslash.
Read the EFF's Fair Use FAQ
For me, the most interesting part of the article concerns the pricing of the new machines as the real question. According to the author, the chip will make Apple machines technologically competitive. The question is, will Apple price them to gain market share, or continue to sell to a disappearing niche of luxury computer buyers.
Maybe Apple's concentration on developing software, and selling that software (rather than giving it away), along with its new business ventures, such as .Mac and the new iTunes online music store, point to a new business model that can afford to cut the margins on hardware.
If they don't lower the price of their machines -- the top ones, namely -- they will suffer, long-term. I don't think they need to be on par with PC's; I just think they cannot be too much more expensive than the PC's.
quiquid id est, timeo puellas et oscula dantes.
Yeah, macs are a bit more expensive, but not really by much if you compare to comparable brands and quality, not self assembled garage models.
And I agree in theory on your pricing opinion, but it's just that in reality Apple have been pricing their machines in pretty much this way for 20 years, and they have made a very successful business out of it. They've also continuously been pronounced on the verge of death for all those 20 years.
So I don't expect either Apple pricing or their good fortunes to change anytime soon.
wtf? hey maclots.... Just cause someone is criticising Altivec doesn't necessarily make them a troll....
I do agree with you that clustering could be far more useful than it currently is, but as you say, anything that requires low latency is kind of problematic...
As far as clustering goes, you know you're able to put together a PC processing monster and use VST System Link ?
Been considering this to add to my TiBook...
i don't read slashdot anymore.
...and I reply with the same comment.
I went to Dell and configured a PWS 450 (I think that's what it was called) with almost all the bells and whistles of the top-end G4. I didn't include a SuperDrive with the G4 and went DVD/CD-R because I couldn't tell which Dell option was closest to the SuperDrive. I always chose the cheapest component. I also left out the monitors and tried to get the same specs for the video card.
A dual processor 2.0 GHz PWS 450 from Dell rang in around $3200. The Dual 1.42GHz PowerMac G4 rang in at $3500.
Personally, I don't think 10% ($300) is too much more expensive for a Mac. Do you?
Do the math. I have IBM chips from the 601 down to now and the 64 bit vectoring is not as good as altivec 128 bit. Humm 128 or 64 bit. I choose the Motorola built apple. Yes and even the gcc now has very good altivec factoring. The last time I checked 128 bit is twice that of 64 bit you just have to use the fuctionality. If you like IBM so much go work for them and you will see how evil a company they are. Hey but maybe you like clear channel too ( Clear channel is the government radio in USA). No bull Motorola is better.