How Sony's Development of the Cell Processor Benefited Microsoft
The Wall Street Journal is running an article about a recently released book entitled "The Race for a New Game Machine" which details Sony's development of the Cell processor, written by two of the engineers who worked on it. They also discuss how Sony's efforts to create a next-gen system backfired by directly helping Microsoft, one of their main competitors. Quoting:
"Sony, Toshiba and IBM committed themselves to spending $400 million over five years to design the Cell, not counting the millions of dollars it would take to build two production facilities for making the chip itself. IBM provided the bulk of the manpower, with the design team headquartered at its Austin, Texas, offices. ... But a funny thing happened along the way: A new 'partner' entered the picture. In late 2002, Microsoft approached IBM about making the chip for Microsoft's rival game console, the (as yet unnamed) Xbox 360. In 2003, IBM's Adam Bennett showed Microsoft specs for the still-in-development Cell core. Microsoft was interested and contracted with IBM for their own chip, to be built around the core that IBM was still building with Sony. All three of the original partners had agreed that IBM would eventually sell the Cell to other clients. But it does not seem to have occurred to Sony that IBM would sell key parts of the Cell before it was complete and to Sony's primary videogame-console competitor. The result was that Sony's R&D money was spent creating a component for Microsoft to use against it."
This is really kind of misleading. The PowerPC, which is at the core of the Cell and is what MS uses as the cores of the Xbox 360, has been IBM's baby for years.
The Xbox 360 uses 3 of the cores. The Cell uses one of the cores plus 8 SPEs (6 of which you can actually use in a game). If you will recall, the Wii uses a PowerPC too, a slightly beefed up Gamecube CPU which IBM made for Nintendo even before they made Cell. And of course Apple used to use PowerPCs (and IBM itself did and does, for servers).
Anyhow, without the Cell's SPEs, there's not a lot to really 'steal'. The lack of SPEs is what makes the Xbox 360 so easy to program for, but the SPEs are what really define the Cell and make it such a floating point crunching monster (better suited for supercomputing than writing video games for in my opinion, and that's not intended as a dis here).
The chip they sold to Microsoft in the end is more or less the same design as the PPU core in the Cell, but that, in turn, is an in-order variant of the 970 with a few bits from the POWER4 that were originally dropped (the 970 itself was a cut-down POWER4 with a VMX unit bolted on) re-added.
IBM would be crazy not to reuse parts of old designs on any new one. They've spent hundreds of millions of dollars creating a library of CPU designs that fit anywhere from a mobile phone to a supercomputer. You're very unlikely to have a set of requirements that they can't meet with a tweaked version of one of their existing designs, and if you really need them to work from scratch then you probably can't afford the final product.
I am TheRaven on Soylent News
For example in a console, you normally want all the RAM shared between GPU and CPU. There's no reason to have them have separate RAM modules. The Xbox 360 does this, there's 512MB of RAM that is usable in general. The PS3 doesn't, it had 256MB for each CPU and GPU. Reason is that's how nVidia GPUs work in PCs and that's where it came from. nVidia didn't have the time to make them a custom one for the console, as ATi did for Microsoft. This leads to situations where the PS3 runs out of memory for textures and the 360 doesn't. It also means that the Cell can't fiddle with video RAM directly. It's power could perhaps be better used if it could directly do operations at full speed on data in VRAM but it can't.
Being a (former) PS3 and 360 dev, I have to say this is not true. Let's start with the memory split. Both consoles have about 20GB/s of memory bandwidth per memory system. Only the PS3 has two of them, giving it twice the memory bandwidth. The 360 compensates for that by having EDRAM attached to the GPU, which removes the ROP's share from your bandwidth budget. Especially with a lot of overdraw, the bandwidth needed by the ROPs can get huge (20GB/s, anyone?), so this would be a nice solution where it not for two things: the limited EDRAM-size and the costs of resolving from EDRAM to DRAM.
The RSX can also read and write both to XDR (main memory) and DDR (VRAM), giving it access to all of memory. The reason it is tighter on texture memory is because the OS is heavier.
About access to VRAM, it is true that reading from VRAM is something you don't want the Cell to do. It's a 14MB/s bus, so it is of no practical use for texture data. Writing into VRAM is actually pretty ok, as it's at 5GB/s, which is more or less achievable without trouble. At 60fps that's more than 80MB per frame.
In general, both design teams made sound decisions. The 360 has a significant ease-of-use advantage to PC developers with DirectX experience. The PS3 on the other hand is a lot more to-the-metal, but allows for some pretty crazy stuff. Sadly, most development these days is cross-platform, so you won't see a lot Cell-specific solutions. It's just not cost-effective.
Slashdot users read and learn because anyone who fails to understand the following is uninformed >
The SPU's on the Cell and the PPC Altivec unit on the Xenon(X360) are very closely associated never before has IBM done a 128register 128Bit Altivec unit. The 128bit X 128register Altivec VMX128 unit on the Xenon is the best of any CPU it is also an almost perfect subset or cut down version of the Cell's SPU !.
In non braching calculations and assuming no cache misses VMX128 performance is equal to the SPU's performance this is not a coincidence it's a newly shared design feature in both the instruction sets and silicon fab and clearly shows the CPU designers shared alot.
The older VMX is only 32 registers. Only the Xenon PPC cores and Cell's SPU's have this new VMX128 type arrangement with 128 SIMD registers - especially enhanced for multimedia and gaming.
Sony's payback comes when Playstation3 programmers learn to fully utilize the Cell architecture.
As someone else pointed out, if that was going to happen, it would have happened by now.
The fundamental problem with the Cell is that each SPU only has 256KB of RAM. (Not 256MB, 256KB.) Data can be moved in and out of main memory in the background with explicit DMA-like operations. Given that model, you have to turn your problem into a data-flow problem, where a data set is pumped sequentially through a Cell processor. The audio guys love this. It's useful for compression and decompression. It's a pain for everything else.
It's not good for graphics. There's not enough memory for a full frame, not enough memory for textures, not enough memory for the geometry, and not enough processors to divide the frame up into squares or bands. Sony had to hang a conventional nVidia GPU on the back to fix that. It's useful for particle systems. If you need snow, or waves, or grenade fragments, the Cell is helpful, because that's a pipelineable problem.
There are some other special-purpose situations where a Cell SPU is useful. But not many. If each SPU had, say, 16MB, the things might be more useful. But at 256KB, it's like having a DSP chip. The Cell part belongs in a cell phone tower, processing signal streams, not in a game machine. It's a great cryptanalysis engine, though. Cryptanalysis is all crunch, with little intercommunication, so that fits the Cell architecture.
We're back to a historical truth about multi-CPU architecture - there are only two things that work. Shared-memory multiprocessors ("multi-core" CPUs, or the Xbox 360) work; they're well understood and straightforward to program. Clusters, like Google/Amazon/any web farm, also work; each machine has enough resources to do its own work and can live with limited intercommunication. Everything in between those extremes has historically been a flop: SIMD machines (Illiac IV through Thinking Machines), dataflow machines (tried in the 1980s), and mesh machines (nCube, BBN Butterfly). The only exception to this are graphics processors and supercomputers derived from them. That, not the Cell, is cutting edge architecture.
I've met one of the architects of the Cell processor, and his attitude was "build it and they will come". They didn't.