A Look Into The Cell Architecture
ball-lightning writes "This article attempts to decipher the patent filed by the STI group (IBM, Sony, and Toshiba) on their upcoming Cell technology (most notably going to be used in the PS3). If it's as good as this article claims, the Cell chip could eventually take over the PC market."
Posted only a couple of days ago too.
Timothy do you actually read Slashdot?
Only if it complies with x86. Seriously, x86 will be around for a century.
"Timothy do you actually read Slashdot?"
Here's a better question. If he will not, why should we?
The original PS2 design was for a dataflow architecture - the Cell is a continuation (and significant evolution) of the theme. Interestingly enough, if this *does* take off it may be that the best programmers of tomorrow turn out to be the PS2 low-level guys, who've already written the algorithms that are about to be important.
In the PS2, the MIPS chip was there mainly to do the simple stuff, all the heavy lifting was done on the 2 vector processors, and they were designed to have programs uploaded into them and data streamed through them using a very flexible (chainable) DMA engine. Sounds similar (if in a limited sense) to the Cell chip itself.
Simon.
Physicists get Hadrons!
The last time I read about a revolutionary chip that would forever change the world and the company was so great they even had the Linux creator as a board member it turned out to be not much more than a loud fart in the wind. (Enter Transmeta)
This is a distributed-processing-capable chip. They're moving software into the chip, doing what software can do in a more compact and probably more efficient way. There's nothing revolutionary here and besides being a dupe story it's way overrated. The only attractive here is the fact PS3 will use it instead of embedding something open, like Mosix.
And no it won't "eventually take over the PC market."
Broken Hearts are for Assholes. - Frank Zappa
its very rare for a system to be able to be completely parallelised.
There will always be "critical sections", data which can only be used by 1 thread at a time, which limits how much it can be split up.. Then you have programs which cant be.. I mean, you can split up a game for instance into a sound, video, and keyboard threads easily. To really utilise parallel processing takes a massive amount of code, which with current languages, seems to make it a bit implausible to get a massive increase.
It should also be remembered that the G5's and G4's already have altivec, and even though this is on a much grander scale, there will always be bottlenecks that slow it down preventing 99% of commonly used apps from getting a significantly large increase..
Okay, who was down for Timothy on Saturday night for the /. Dupe Pool?
Do not anger the worm.
A measly 68k CPU with hardware that was autonomous.
A measly MIPS with hardware that is autonomous.
The only thing they need is to sync to the TV set.
The message on the other side of this sig is false.
Well, I think we all recognized that article was a little over enthusiastic but it does suggest some interesting possibilities.
First of all I want to say I think it is completly possible to make a processor with 8APUs and so forth. For starters PowerPC chips already have several seperate execution units on them, and I think they use fewer transitors than intel chips. Moreover, a huge chunk of the transitor budget goes to doing things like cache consistancy or complicated instruction prediction which is probably not used on the much simpler APUs.
Of course it seems like this is primarily of interest to game systems or signal processing applications (note that a 4 threaded 32 stream processors is just another way of saying 4 cell procesors, each has a PPC core with 8 APUs). However, I would not be so quick to dismiss this for the PC market. While it may be true that many individual applications may not easily multi-thread it seems we are approaching a point where the biggest complaint is not the maximum processing rate in one application but the ability to run multiple applications at once. On my computers I'm rarely if ever frustrated at the rate some program is running at, but slowdown in other programs when I run a processor intensive job or turn on a video. So while drawing a webpage may not be speed up by this processor drawing several webpages at the same time will be and that is the sort of thing which makes a big difference for the end user.
Also, a processor like this offers great possibilities for JIT and VM code. The main thread can dispatch instructions and threads to the APUs dynamically based on what is happening in the system. Also I find it interesting that IBM is going the same way as intel in pushing all the complexity on the compiler. It makes one wonder if itanium is really as dead as everyone thinks. Perhaps in 4 years when AMD can't squeeze anything more out of x86 intel will be ready to jump in having worked out all the bugs to their new chip.
If you liked this thought maybe you would find my blog nice too:
All the programs that run on PC architectures expect certain things to be in place - they expect a single fast central CPU. They expect that good cache usage is important for performance. They expect to have access to gobs of RAM. Etc. Etc. The PS2 (and by extension the cell) is completely different.
Consider a different architecture. You have a job that consists of multiple things to do. Some of these can be easily parallelised, others are mainly sequential. Divide it up so the parallel ones are coded separately, maybe with some IPC to synchronise to some clock.
For a sequential part (say rendering the object list of a scene back to front to gain occlusion) the approach that worked for me on the PS2 (which is logically similar, if significantly less powerful) was to divide the job into tasks. Each task (say, one per object in the above) gets its own bit of code and knows about the data that it needs to perform its task.
The key thing is that the Harvard separation of code and data just isn't, on a PS2. You set up a DMA chain that loads the program into the processor, then streams the data through the program on the processor, lather, rinse, repeat. Make the chain self-submitting and you can effectively forget about that chunk of code now, it'll just happen.
This is still doing things sequentially (but we've agreed that this is a sequential task, right?) - the point is that it's being done highly efficiently within the architectural constraints. You have a dataflow architecture and even sequential code can hit the performance limits if you code to the architecture.
The Cell looks even more powerful, in that you can chain execution modules together, so you can load code into APU's 1,2,3,4 and stream the data through 1,2,3,4 automatically before it's considered 'done'. This was possible on the PS2, but
Simon
Physicists get Hadrons!
Paper Details:
I've had for a very long time the suspicion that the XBox was basically just a big blindside at Sony. The XBox loses a huge amount of money, and looks as if it will continue to lose a huge amount of money right into the XBox 2 line; Microsoft must be doing this for some reason. My personal theory for awhile has been that at least one of Microsoft's motivations in spending all this money is because they see the Playstation as a potential future threat; i.e., they feared and fear that at some point the Playstation 2 or 3 or 4 will become so close in power and functionality to a PC that it will begin to supplant the PC for common tasks. This would be disastrous for Microsoft; their lockdown on the PC market is complete, but this doesn't protect them from the PC market itself being slowly eaten away at from the bottom by consumer electronics like the ones Sony makes. So to stave off this threat, Microsoft begins to instead grow the PC market it monopolizes downward, so that the PC (as it becomes the "Windows Media Center") begins to slowly suck up the consumer electronics market, competing directly with the Playstation, bringing the fight to Sony's door instead of Microsoft's. Since consumers wouldn't on their own be interested in a PC that supplants consumer electronics, Microsoft instead basically bribes them into being interested with subsidized hardware; they make a big money blackhole out of the XBox to undercut Sony's ability to maneuver with the Playstation, the way the money blackhole that was MSIE undercut Netscape's ability to maneuver.
This is, of course, all just conjecture.
But when I begin to see people seriously talking about the chip from the Playstation 3 eventually potentially being used in PC hardware, I begin to wonder if it's maybe reasonable conjecture...
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
It's been said before, but mature industries tend towards three of something, such as GM-Ford-Chrysler. For CPUs, it has to be AMD64/ia32e, PowerPC, and SPARC. They're the only ones with any high-volume prospects. SPARC will certainly be in third place, with AMD64/ia32e and PowerPC duking it out for one and two. The fact of the matter is that Itanium won't be a mainstream processor, and PA-RISC, Alpha, and MIPS are all more-or-less EOL.
For operating systems it will still be Windows, Linux, and UNIX (predominately Mac OS and Solaris). Okay, that's four, but the other historical major players are all becoming niche legacy platforms.
For office suites, it'll be MS Office, StarOffice/OpenOffice.org, and iWork. The others are all niche players.
For browsers it'll be IE, Firefox, and Safari.
At least this will tend to simplify some things, because the non-Microsoft platforms will be fewer making supporting them easier. This is a good thing, IMO.
-- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
The author had a good grasp of the high level architecture, but beyond that was clueless. His interpretation of the design is way off the mark.
He seemed astonished by the 1024 bit wide data paths. The Power family is design with cache fill lines of 128 bytes. So, for instance the G5 L2 cache already does fetches 128 bytes into cache for each main memory read.
Similarly all the talk about doing with cache and VM is bullshit. Instead of having each vector unit interfere with a shared cache as is done today, they've simply added smaller per ALU caches to the design, and complemented it with a device that is a souped up cache controller/MMU unit (the DMAC). The dmac apparently will be able to address both memory, and other hardware by having a virtual address layer, to enable reference to remote cell units as well as local physical hardware. The 64 MB of high speed rambus memory, may be all that is required for a PS3, but in a workstation implementation that memory is L3 cache.
Altivec currently has 32 vector registers. Each ALU as 128. It it highly likely that the core opcode architecture will remain similar. The most likely addition will be to add a few flow control instructions to the existing mix.
Altivec is already powerful but the biggest limiting factor is latency. Altivec can peform 1 instruction per clock on the G5, However the pipeline is 8 levels deep thus the overhead involved in fetching data, loading registers, performing a calculation among 1-3 registers, and getting a result is prohibitively expensive. However, if you can arrange to submit 8 calculations (or more) in rapid sequence, you can keep Altivac and the CPU busy and reap great benefits.
The beauty of Cell will be in proving the ALUs with a bit more autonomy (thought not much more, they are still basically vector units), and enabling the main CPU to keep doing useful work while a number of ALUs are cranking away. Other novel design features provide for communication and synchronization with other units via remote addressing and timing (that's what those realtime clock signals are all about).
This will be very fast, and very cheap. However, all the hand waving, and theorizing this guy does about both hardware and software reads like patent bullshit.