Ars Technica's Hannibal on IBM's Cell
endersdouble writes "Ars Technica's Jon "Hannibal" Stokes, known for
his many articles on CPU technology, has posted a new article on IBM's new Cell processor. This one is the first part of a series, and covers the processor's approach to caching and control logic. Good read."
Part II is up as well.
" Last fall, IBM and Sony said they were developing a workstation based on Cell chips, which is the first product IBM will ship based on Cell."
Regardless if this is the first product shipped or not, a workstation is coming. I can't see it running anything but linux. Given the mass market targeting of the cell, I hope Sony makes a strong go at grabbing the market with cheap hardware, rather than trying to milk the high-end content creation market first.
"A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
e.g. 234 M transistors (!) That's why I don't think this will be replacing the G5 any time soon. The die size (at the current prototype's 90nm) is over 200 mm2.
It'll have to get a fair bit smaller/cheaper before the PS3 can use it without major subsidies, and I don't know why they think general consumer devices will want it. God knows how much power it dissipates with all 8 SPEs clocking over at 4 GHz...
Why would anyone engrave "Elbereth"?
Thank god. I've enjoyed his articles in the past, and if experience is any indication, I will have the false impression that I understand this stuff in a nontrivial way for up to three hours. This is not meant to rag on Hannibal, BTW.
WIth a name like that, I expect to see pictures of him eating those Cell processors, and describing how they taste.
// file: mice.h
#include "frickin_lasers.h"
Is that the 386 instruction set and arcitecture is so non proprietary. What made it so popular certainly wasn't that it was better. If I had the dough, I can literally make one and my own fab without asking a single soul. Alot of times it seems companies try to gather into consortiums to mimic the same effect and gather market momentum, but these are doomed to failure because the more valuable the technology becomes - the greater the pressure to diferentiate and fence off some "teritory" for themselves. We saw this happen first hand with UNIX, where all the flavors would constantly try to group under these unified standards - and they made little progress until Linux came along. The CPU world needs somthing similar to protect people from patent harassment. for design, cores, and fabrication.
Oh, yeah? Well I have Windows XP on my IBM 5150. Well, not so much Windows XP as MS-DOS. Version 2.0. But I have two floppy drives, so I don't even have to take out the system disk to play Oregon Trail!
From the article:
The Cell and Apple
Finally, before signing off, I should clarify my earlier remarks to the effect that I don't think that Apple will use this CPU. I originally based this assessment on the fact that I knew that the SPUs would not use VMX/Altivec. However, the PPC core does have a VMX unit. Nonetheless, I expect this VMX to be very simple, and roughly comparable to the Altivec unit o the first G4. Everything on this processor is stripped down to the bare minimum, so don't expect a ton of VMX performance out of it, and definitely not anything comparable to the G5. Furthermore, any Altivec code written for the new G4 or G5 would have to be completely reoptimized due to inorder nature of the PPC core's issue.
So the short answer is, Apple's use of this chip is within the realm of concievability, but it's extremely unlikely in the short- and medium-term. Apple is just too heavily invested in Altivec, and this processor is going to be a relative weakling in that department. Sure, it'll pack a major SIMD punch, but that will not be a double-precision Alitvec-type punch.
What I find interesting is that the vector processor are restricted to single precision floating point calculations.
This isn't terribly useful for scientific computations (there is the same problem with the GPU): currently the IEEE is working on a standard for 128bit precision floating point calculations!
Of course for 3D, video and sound, 32bit precision is good enough and *if* programmers (a big if) manage to overcome the pain of 'parallel programming' then it could be a big success.
The architecture of the Cell look like a much-improved PS2 system, with the PS2's vu0 and vu1 (vector units 0 and 1) replaced by 8 SPE's. Also, the programmable DMA (with chaining ability, allowing it to sequence multiple DMA events one after the other etc.) looks very similar to the PS2's.
If that turns out to be the case, then PS2 programming is a hint towards how it'll work. On the PS2, you generally configured the DMA controller to upload mini programs to the vector units, then DMA-chained data as streams from RAM through the just-uploaded program and onto the destination (usually the GS which rasterised the display).
On the Cell, it looks as though you can DMA-chain code & data through multiple SPE's and ultimately back to RAM/the PPC core/whatever is memory mapped. This is cool - it's software pipelining
So, my guess is that the PPC acts as a (DMA, IO, etc.) controller (much like the mips chip did in the PS2), and the heavy lifting goes on in the vector units, with code and data being streamed in on demand.
It's a different model to normal programming, and as far as I can see it encourages you to be closer to the metal (ie: it's harder, I normally expect my L1 cache to take care of itself...), but assuming they release/port gcc for the SPE's, it might not be too hard if you're used to event-driven highly-threaded programming. Let's just hope they release a Linux port and 'vcl' so we can do something useful with the vector units...
Oh, and if the xbox was a target for a self-hosting linux solution, I think the Cell will be irrestible
Simon
Physicists get Hadrons!
My old 600mhz g3 ibook runs panther, safari, quicktime, iphoto, itunes and everything else I need on a daily basis pretty well. Try saying that about a five year old PC.
5 year old? Your 600mhz g3 ibook came out October 2001. That machine is just a few months older than 3 years old.
In October of 2001, the P4 was at 2.0ghz, and the Athlon 2000+ was just coming out. Are you going to tell me that a 2ghz P4 isn't adequate for browsing the web, listing to mp3s and importing digital photos?!
We'd do our skeletal animation skinning with this. DMA a bunch of verts to scratchpad, transform and weight them on the VU, DMA back to a display list. The thing is, there's really no high-level language support for this... the onus is on the programmer to schedule and memory map everything, mostly in assembly.
The design of the cell-- it's incredible. It's every game programmer's wet dream. I just don't see how it's going to be as useful in other areas though. It's going to be a compiler-writer's nightmare, and to get real performance frome the SPEs is going to take a lot of assembly or a high-level language construct that I haven't seen yet.
Another article on the Cell design at http://www.theregister.co.uk/2005/02/03/cell_analy sis_part_two/ seems to indicate that there is some sort of DRM built in.
Hannibal doesn't say anything about this (that I noticed) - anyone have more info?
Don't save Windows XP! http://www.petitiononline.com/jjw1xp/petition.html
A proposal for Apple
I don't have an account, but this is an honest idea.
Why doesn't Apple include a Playstation 2 support card into their Macintosh line?
Problem: The OSX platform has almost no games. I own several macs, I love my macs, and I sincerely enjoy OSX. But it has no games, and that will never get better, especially as simpler games migrate to the web and the complex ones bail for the console market. The PC gaming market has essentially peaked.
Solution: Embed (or include as a BTO option) a PS2 chipset to a Macintosh. Run the generated display straight through to the graphical overlay plane. Done.
Everything works. The controllers are trivially converted to use USB. The DVD drive is already there. The display is already there. The USB and Firewire is already there. The harddrive is already there. The "memory cards" are already there.
Reason: The Macintosh game library explodes instantly to encompass something like 3,000 PS1 and PS2 games. With no need for emulation, the games are guaranteed to work out of the box and provide the Apple ease of use everyone loves. Sony increases their marketshare, Apple gets a viable expanding game library, and users get a vastly better gaming experience on OSX for maybe $40 of parts and engineering.
Why won't this work?
A budget-class PC laptop of that time might have been about 900 MHz to 1.1 GHz. I wouldn't consider such a laptop anything near useable. They tended to have poor quality sound systems that bottlenecked the processor and atrociously short battery times. The ibook was legendary for its excellent battery performance
Get off what you 'assume', assumption is just intuition for idiots.
We have test 200mhz laptops with 80mb of ram 5gb hard drives, released 1997 all running WindowsXP Professional (yes even the themes turned on) and they benchmark faster than they did when they shipped with Windows 95.
Secondly, they can do full 30fps video as long as it is uncompressed AVI or even WMA 9. QuickTime (MPEG4), MPEG2, and real stutter horribly on video playback unfortunately.
As for battery, don't know, these laptops hold for 3hrs with a single charge, and yes techs are REQUIRED and have no problems using them daily in test scenarios.
Now if you really want to compare laptops to laptops, why don't I show you our 900mhz AMD Compaq laptops, they have JBL sound systems in them, and there isn't a single feature the cannot perform with the exception of running a T&L based video game, as the integrated video doesn't handle it, oh wait, the 900mhz PowerBook video didn't support such features either. (BTW, This is not to say that there are not several 900-1000mhz class laptops that have upper end video features), I am just using what we have in our test labs for comparison.
The 900mhz laptop has a DVD/CDRW, came out late 2000 early 2001 (trying to remember if we got them before holidays or not). They do full software DVD decoding with less than 20% CPU utilization and pretty much do anything fairly fast that we through at them. We even have a beta version of Windows 2003 server running on one with 256mb of RAM. (Yes we are always pushing the limits, but it works as fast as the WindowsXP pro version of the machine sitting next to it.)
Now off my rant... Macs truly are great, and the PowerBooks of the time were great, but that DOES NOT MEAN they were the BEST, WILL ALWAYS BE THE BEST, or you should be complacent listening to Apple tell you what you are getting is the best when it might not be. It is time for us as MAC users to stand up and DEMAND that technology becomes as much a part of what a MAC is as the EASE of USE in the Interface.
The time is now, we need to STOP accepting what they tell us and give us and force them to truly give us the LATEST technological concepts, not just the above average concepts when compared to the PC world. These are Macs, they SHOULD BE BETTER. IT shouldn't even be subjected to a debate they should be so far advanced a debate should not be possible. PERIOD.
Sadly, it just isn't true now, and has not been for many years. OSX has giving the Mac world some credibility backing OS technology, but not Apple needs to take Macs to the next level.
Even if my comment inspires one Mac user to say hey Apple, we want better, then maybe we all can be the symbolic person with the hammer from their 1984 video and WAKE THEM UP this time.
"Starting today, the performance lunch isn't free any more. Sure, there will continue to be generally applicable performance gains that everyone can pick up, thanks mainly to cache size improvements. But if you want your application to benefit from the continued exponential throughput advances in new processors, it will need to be a well-written concurrent (usually multithreaded) application. And that's easier said than done, because not all problems are inherently parallelizable and because concurrent programming is hard."
Obviously, it's not clear whether this is directly relevant to cell processors, but I think it's at least of passing interest. It's also worth considering whether concurrency-oriented languages like Erlang and Oz could become more important with these sorts of processors (not for games but possibly for scientific work).
See also the discussion of this article on Lambda.
Not quite. The Cell is 9 complete yet simple CPU's in one. Each handles its own tasks with its own memory. Imagine 9 computers each with a really fast network connection to the other 8. You could problably treat them as extra vector processors, but you'd then miss out on a lot of potential applications. For instance, the small processors can talk to each other rather than work with the PowerPC at all.
Hardly. Sony is following the same game plan as they did with their Emotion Engine in the PS2. Everyone thought that they were losing 1-200 bucks per machine at launch, but financial records have shown that besides the initial R&D (the cost of which is hard to figure out), they were only selling the PS2 at a small loss initially, and were breaking even by the end of the first year. By fabbing their own units, they took a huge risk, but they reaped huge benefits. Their risk and reward is roughly the same now as it was then.
Doubtful. The problem is that though the main CPU is PowerPC-based like current Apple chips, it is stripped down, and the Altivec support will be much lower than in current G5s. Unoptomized, Apple code would run like a G4 on this hardware. They would have to commit to a lot of R&D for their OS to use the additional 8 processors on the chip, and redesign all their tweaked Altivec code. It would not be a simple port. A couple of years to complete, at least.
This is half-true. While it will be hard, most game logic will be performed on the traditional PowerPC part of the Cell, and thus normal to program. The difficult part will be concentrated in specific algorithms, like a physics engine, or certain AI. The modular nature of this code will mean that you could buy a physics engine already designed to fit into the 128k limitation of the subprocessor, and add the hooks into your code. Easy as pie.
Bwahahaha! No way. This is a delicate bit of coding that is going to need to be tweaked by highly-paid coders for every single game. Letting on OS predictively determine what code needs to get sent to what processor to run is insane in this case. The cost of switching out instructions is going to be very high, so any switch will need to be carefully considered by the designer, or the frame-rate will hit rock-bottom.
This is one myth that could be correct. The Cell is huge (relatively), and given IBM's problems in the recent past with making large, fast PowerPC chips, it's a huge gamble on the part of all parties involved that they can fab enough of these things.