How Sony's Development of the Cell Processor Benefited Microsoft

← Back to Stories (view on slashdot.org)

How Sony's Development of the Cell Processor Benefited Microsoft

Posted by Soulskill on Wednesday December 31, 2008 @09:48PM from the not-the-outcome-they'd-planned dept.

The Wall Street Journal is running an article about a recently released book entitled "The Race for a New Game Machine" which details Sony's development of the Cell processor, written by two of the engineers who worked on it. They also discuss how Sony's efforts to create a next-gen system backfired by directly helping Microsoft, one of their main competitors. Quoting: "Sony, Toshiba and IBM committed themselves to spending $400 million over five years to design the Cell, not counting the millions of dollars it would take to build two production facilities for making the chip itself. IBM provided the bulk of the manpower, with the design team headquartered at its Austin, Texas, offices. ... But a funny thing happened along the way: A new 'partner' entered the picture. In late 2002, Microsoft approached IBM about making the chip for Microsoft's rival game console, the (as yet unnamed) Xbox 360. In 2003, IBM's Adam Bennett showed Microsoft specs for the still-in-development Cell core. Microsoft was interested and contracted with IBM for their own chip, to be built around the core that IBM was still building with Sony. All three of the original partners had agreed that IBM would eventually sell the Cell to other clients. But it does not seem to have occurred to Sony that IBM would sell key parts of the Cell before it was complete and to Sony's primary videogame-console competitor. The result was that Sony's R&D money was spent creating a component for Microsoft to use against it."

13 of 155 comments (clear)

I have altered the deal by symbolset · 2008-12-31 21:53 · Score: 4, Funny

Pray I do not alter it further.

--
Help stamp out iliturcy.
And they both stole from Apple and Nintendo? by Sarusa · 2008-12-31 22:19 · Score: 5, Insightful

This is really kind of misleading. The PowerPC, which is at the core of the Cell and is what MS uses as the cores of the Xbox 360, has been IBM's baby for years.
The Xbox 360 uses 3 of the cores. The Cell uses one of the cores plus 8 SPEs (6 of which you can actually use in a game). If you will recall, the Wii uses a PowerPC too, a slightly beefed up Gamecube CPU which IBM made for Nintendo even before they made Cell. And of course Apple used to use PowerPCs (and IBM itself did and does, for servers).
Anyhow, without the Cell's SPEs, there's not a lot to really 'steal'. The lack of SPEs is what makes the Xbox 360 so easy to program for, but the SPEs are what really define the Cell and make it such a floating point crunching monster (better suited for supercomputing than writing video games for in my opinion, and that's not intended as a dis here).
It also helped MS by Sycraft-fu · 2008-12-31 22:22 · Score: 4, Informative

Because it was a really misdirected effort when it came to a console. Sony really had no idea what the hell they were doing as far as making a chip for their console. Originally, they thought the Cell would be the graphics chip. Ya well turned out not to be near powerful enough for that, so late in the development cycle they went to nVidia to get a chip. Problem was, with the time frame they needed, they couldn't get it very well customized.
For example in a console, you normally want all the RAM shared between GPU and CPU. There's no reason to have them have separate RAM modules. The Xbox 360 does this, there's 512MB of RAM that is usable in general. The PS3 doesn't, it had 256MB for each CPU and GPU. Reason is that's how nVidia GPUs work in PCs and that's where it came from. nVidia didn't have the time to make them a custom one for the console, as ATi did for Microsoft. This leads to situations where the PS3 runs out of memory for textures and the 360 doesn't. It also means that the Cell can't fiddle with video RAM directly. It's power could perhaps be better used if it could directly do operations at full speed on data in VRAM but it can't.
So what they ended up with is a neat processor that is expensive, and not that useful. The SPEs that make up the bulk of the Cell's muscle are hard to use in games given the PS3's setup, and often you are waiting on the core to get data to and from them.
It's a neat processor, but a really bad idea for a video game console. Thus despite the cost and hype, the PS3 isn't able to outdo the 360 in terms of graphics (in some games it even falls behind).
I really don't know what the hell Sony was thinking with putting a brand new kind of processor in a console. I'm willing to bet in 10 years there are compilers and systems out there that make real good use of the Cell. However that does you no good with games today.
Thus we see the current situation of the PS3 having weak sales as compares to the 360 and Wii. It is high priced, with the idea that it brings the best performance, but that just doesn't bare out in reality.
1. Re:It also helped MS by Sycraft-fu · 2008-12-31 23:39 · Score: 4, Informative
  
  Specs on it I see show the system bus as being around 2GB/sec. That's comparable to PCIe (about the same as an 8x connection). While that's fine, it isn't really enough to do much in terms of back and forth operations. You'll find on a PC if you try that things get real slow. You need to send the data to the graphics card and have it work on its own RAM.
  Now that isn't to say that you can't do things to the data before you send it, but then that's of limited use. What I'm talking about is doing things like, say, you write code that handles some dynamic lighting that the CPU does. So it goes in and modifies the texture data directly in VRAM. Well you can't really do that over a bus that slow. 2GB sounds like a lot but it is an order of magnitude below the speed that the VRAM works at. It is too slow to be doing the "read data, run math, write data, repeat a couple million times a frame" sort of thing that you'd be talking about.
  You see the same sort of thing on a PC. While in theory PCIe lets you use system memory for your GPU transparently, in reality you take a massive hit if you do. The PCIe bus is just way too slow to keep up with higher resolution, high frame rate rendering.
  So while it's fine in terms of the processor getting the data ready and sending it over to the GPU (which is what is done) it isn't a fast enough bus to have the SPEs act as additional shaders, which is what they'd probably be the most useful for.
2. Re:It also helped MS by Zixx · 2009-01-01 00:04 · Score: 5, Informative
  
  For example in a console, you normally want all the RAM shared between GPU and CPU. There's no reason to have them have separate RAM modules. The Xbox 360 does this, there's 512MB of RAM that is usable in general. The PS3 doesn't, it had 256MB for each CPU and GPU. Reason is that's how nVidia GPUs work in PCs and that's where it came from. nVidia didn't have the time to make them a custom one for the console, as ATi did for Microsoft. This leads to situations where the PS3 runs out of memory for textures and the 360 doesn't. It also means that the Cell can't fiddle with video RAM directly. It's power could perhaps be better used if it could directly do operations at full speed on data in VRAM but it can't.
  Being a (former) PS3 and 360 dev, I have to say this is not true. Let's start with the memory split. Both consoles have about 20GB/s of memory bandwidth per memory system. Only the PS3 has two of them, giving it twice the memory bandwidth. The 360 compensates for that by having EDRAM attached to the GPU, which removes the ROP's share from your bandwidth budget. Especially with a lot of overdraw, the bandwidth needed by the ROPs can get huge (20GB/s, anyone?), so this would be a nice solution where it not for two things: the limited EDRAM-size and the costs of resolving from EDRAM to DRAM.
  The RSX can also read and write both to XDR (main memory) and DDR (VRAM), giving it access to all of memory. The reason it is tighter on texture memory is because the OS is heavier.
  About access to VRAM, it is true that reading from VRAM is something you don't want the Cell to do. It's a 14MB/s bus, so it is of no practical use for texture data. Writing into VRAM is actually pretty ok, as it's at 5GB/s, which is more or less achievable without trouble. At 60fps that's more than 80MB per frame.
  In general, both design teams made sound decisions. The 360 has a significant ease-of-use advantage to PC developers with DirectX experience. The PS3 on the other hand is a lot more to-the-metal, but allows for some pretty crazy stuff. Sadly, most development these days is cross-platform, so you won't see a lot Cell-specific solutions. It's just not cost-effective.
3. Re:It also helped MS by xero314 · 2009-01-01 00:06 · Score: 4, Interesting
  
  There are a number of errors in the comment above and a number of oversights.
  
  First it is true that the Graphics processing of the PS3 was originally intended to be handled by a Cell processor, but this is not the same as saying the Cell processor was built to be a graphics processor. The original specs for the PS3 included 4 fully functional cell processors. This would have meant that there would be no need for dedicated GPU. Time and cost made this configuration prohibited.
  
  The reason the PS3 does not have dedicated memory is because it is a very different design. First the PS3 contains a very high speed data bus, which allows the system to keep it's lower amount of memory full of the data it needs at any given time, with no need to store data not actively in use. Secondly the GPU in the PS3 has direct access to almost all of the memory in the system (480MB to be exact). It's just not the same picture that some people would like to paint. Dedicated memory has it's advantages (which is why all high end PC GPUs have such).
  
  Now the reason that Sony, Toshiba and IBM design the Cell and crammed it into a PS3, prematurely, is ingenious, but we wont see this for a number of years. The Cell processor is designed from the ground up to work effective as a single node of a multi processor system. This means that you can include more than one, utilize the same code, and get a much faster program rate. What this means is that for computing today you can use a single Cell processor and have a fast machine. In the future you can have a machine with 4, 8, 16, or more cell processors and have an unbelievable fast machine. On top of that speed you also get a very energy efficient machine. Take a look at the top 500 supercomputer list to see what a difference the cell processor makes. Putting in the PS3 on the other hand was a good move because it meant mass production and greatly reducing costs so that they can finally build the system they want in the next console generation.
  
  Ok I'm to tired to finish this, but as you can see if you look, the cell is an interesting chip with great potential, and has already surpassed other chips a number of applications.
Hmm, really? by Ecuador · 2008-12-31 22:26 · Score: 4, Interesting

Maybe I have to read the book to get a better picture, it is possible that the article blows things out of proportion. So, I thought that the whole "deal" about the Cell are the SPE's. The Xenon CPU that powers the Xbox 360 is just a custom-made triple core PowerPC. Now, I guess the "customization" of that core is similar to what is done for the PPE of the Cell, so research there could have overlapped, but I would not think that the PPE is the "essence" of the Cell - at least that is what Sony's and IBM's own claims have made me believe.
Additionally, I have to admit that I always thought the usage of the Cell processor a very bad (or, more precisely, very arrogant) decision. It is not just that it has many "cores"; the fact that they are asymmetric and that SPE's are not your usual general-purpose cores, was bound to make it very hard for developers to utilize them. If you wanted to develop for many platforms there is no way you would want to optimize for the SPE's when all other architectures (PC, Xbox...) use symmetric, general purpose cores. So, in my book, the Microsoft engineers knew much better what they were doing than the Sony ones. I guess they are not the same engineers responsible for gems like Me, Vista or Zune firmware.
What I would like to know are the differences that the modified core has compared to a "classic" PowerPC core? So, if MS had not benefited at all from Cell research and got a triple-core whose cores were closer to the original PowerPC, would it be a much different CPU? Anybody knows? If the answer is not, the whole discussion about MS benefits from Sony is moot...

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
Re:I don't think it's quite as they tell it by Bastard+of+Subhumani · 2008-12-31 22:46 · Score: 4, Funny

Sony's payback comes when Playstation3 programmers learn to fully utilize the Cell architecture.
It has direct hardware support for rootkits.

--
Only three things are certain; death, taxes, and apocryphal quotations - Ben Franklin.
Re:a few facts please? by TheRaven64 · 2008-12-31 23:56 · Score: 5, Interesting

Even the SPEs aren't exactly built from scratch. They're based on the VMX units from the PowerPC 970 with widened register sets and a modified memory architecture with explicit DMA commands. If the meeting in question took place, I'd imagine IBM showed Microsoft the Cell, the PowerPC 980MP, the 40x, and said 'we can do anything on this spectrum - what requirements do you have?'.
The chip they sold to Microsoft in the end is more or less the same design as the PPU core in the Cell, but that, in turn, is an in-order variant of the 970 with a few bits from the POWER4 that were originally dropped (the 970 itself was a cut-down POWER4 with a VMX unit bolted on) re-added.
IBM would be crazy not to reuse parts of old designs on any new one. They've spent hundreds of millions of dollars creating a library of CPU designs that fit anywhere from a mobile phone to a supercomputer. You're very unlikely to have a set of requirements that they can't meet with a tweaked version of one of their existing designs, and if you really need them to work from scratch then you probably can't afford the final product.

--
I am TheRaven on Soylent News
Re:a few facts please? by TheRaven64 · 2009-01-01 00:53 · Score: 4, Informative

They are very different approaches. The 360's CPU is basically a 3-core, 6-context, in-order variant of the POWER4 with a vector unit. In terms of pure number crunching ability, it's pretty pathetic next to the Cell. On the other hand, it is based on a model that we have spent 30 years building compilers for. You only need to write slightly-parallel, conventional code to get close to 100% of the theoretical performance out of it.
In contrast, the Cell has one PPU which is roughly equivalent to one context on the 360's CPU (somewhere between 1/3 and 1/6 of the speed). It also has 7 SPUs. These are very fast, but they're basically small DSPs. They have very wide vector units and are limited to working on 256KB of data at a time. You can use them to implement accelerator kernels for certain classes of algorithm, but getting good performance out of them is hard.
In terms of on-paper performance, the Cell is a long way out in front, but it is a long way behind in ease of programming, meaning that you generally get a much smaller fraction of the maximum throughput.

--
I am TheRaven on Soylent News
VMX128 in Xenon is borrowed from the Cell SPU's ! by stephen70 · 2009-01-01 01:33 · Score: 5, Interesting

Slashdot users read and learn because anyone who fails to understand the following is uninformed >
The SPU's on the Cell and the PPC Altivec unit on the Xenon(X360) are very closely associated never before has IBM done a 128register 128Bit Altivec unit. The 128bit X 128register Altivec VMX128 unit on the Xenon is the best of any CPU it is also an almost perfect subset or cut down version of the Cell's SPU !.
In non braching calculations and assuming no cache misses VMX128 performance is equal to the SPU's performance this is not a coincidence it's a newly shared design feature in both the instruction sets and silicon fab and clearly shows the CPU designers shared alot.
The older VMX is only 32 registers. Only the Xenon PPC cores and Cell's SPU's have this new VMX128 type arrangement with 128 SIMD registers - especially enhanced for multimedia and gaming.
The Cell architecture just isn't that useful by Animats · 2009-01-01 04:22 · Score: 5, Interesting

Sony's payback comes when Playstation3 programmers learn to fully utilize the Cell architecture.
As someone else pointed out, if that was going to happen, it would have happened by now.
The fundamental problem with the Cell is that each SPU only has 256KB of RAM. (Not 256MB, 256KB.) Data can be moved in and out of main memory in the background with explicit DMA-like operations. Given that model, you have to turn your problem into a data-flow problem, where a data set is pumped sequentially through a Cell processor. The audio guys love this. It's useful for compression and decompression. It's a pain for everything else.
It's not good for graphics. There's not enough memory for a full frame, not enough memory for textures, not enough memory for the geometry, and not enough processors to divide the frame up into squares or bands. Sony had to hang a conventional nVidia GPU on the back to fix that. It's useful for particle systems. If you need snow, or waves, or grenade fragments, the Cell is helpful, because that's a pipelineable problem.
There are some other special-purpose situations where a Cell SPU is useful. But not many. If each SPU had, say, 16MB, the things might be more useful. But at 256KB, it's like having a DSP chip. The Cell part belongs in a cell phone tower, processing signal streams, not in a game machine. It's a great cryptanalysis engine, though. Cryptanalysis is all crunch, with little intercommunication, so that fits the Cell architecture.
We're back to a historical truth about multi-CPU architecture - there are only two things that work. Shared-memory multiprocessors ("multi-core" CPUs, or the Xbox 360) work; they're well understood and straightforward to program. Clusters, like Google/Amazon/any web farm, also work; each machine has enough resources to do its own work and can live with limited intercommunication. Everything in between those extremes has historically been a flop: SIMD machines (Illiac IV through Thinking Machines), dataflow machines (tried in the 1980s), and mesh machines (nCube, BBN Butterfly). The only exception to this are graphics processors and supercomputers derived from them. That, not the Cell, is cutting edge architecture.
I've met one of the architects of the Cell processor, and his attitude was "build it and they will come". They didn't.
Re:a few facts please? by Michael+Hunt · 2009-01-01 13:13 · Score: 4, Informative

Do you have even a vague understanding of what 'transform' and 'lighting' actually mean? Allow me to elucidate.
'transform' refers to the act of converting vertex positions in model space (the coordinate system used in the vertex buffers) to clip (camera) space. This is typically one matrix * vector multiplication per vertex; the vertex's position in model space is multiplied by the world-view-projection matrix. On modern hardware, this is generally done in the vertex program (other things may be done to the vertex's position before or after the co-ord transform, mind you, such as multiplication by a set of bone matrices for hardware animation, etc.)
'lighting' refers to the process of deciding the colour of each fragment ('potential pixel'). Before programmable graphics hardware, this was done by taking the dot product of the vertex normal and the light vector (or position, depending on light type), and multiplying it by the light's diffuse colour. The resulting colour intensity was then linearly interpolated across the face between vertices, and used to light the texture in conjunction with the ambient term. With modern programmable hardware, lighting is usually done per-fragment based on a normal map, which is input as a second texture to the fragment program. The light position is converted from object (or world) space into 'tangent' space, which is a coordinate system whose basis vectors are parallel and orthogonal to the plane being lit, and the surface is lit based on the dot product of the light vector in tangent space and the normal from the normal map.
Back in the bad old days, when men with beards owned IRIX boxes and everybody else had a TNT2 or worse, transform and lighting were done in software for most folks, by a client of the rendering system, before the primitives were submitted as draw calls to the rendering system. Post-about-2001, cards with hardware T&L, such as Geforce 256, showed up in the PC space. These cards were the first consumer 3D hardware to perform fixed-function transform and lighting (roughly as I've described it above) in silicon. The API didn't change much, although there was a DirectX version bump (6 to 7). OpenGL programmers didn't really notice; the library itself, obviously, had to know if it was talking to a fixed-function card or a dumb card, but most OpenGLs were provided by hardware vendors in any event, so this wasn't a factor.
Fast forward to today, everybody's using hardware which allows parts (most, these days) of the rendering pipeline to be replaced entirely with programs written by the engine developer (or even the artist, in some cases.) Transforming vertices can be done in conjunction with all manner of other crap, and lighting can be done using whatever model the programmer/artist desire. Regardless, however, it's all done in the same pipeline on the GPU. If the SPUs, as you suggest, were pre-transforming and pre-lighting vertices before writing them to 3d hardware's vertex buffers, then all you'd get is some really confused 3d hardware. RSX (the 3D chip loosely based on nvidia's G70 architecture) has 8 vertex pipelines and 24 fragment pipelines, all programmable. This is more than enough power to do significantly more with each vertex than simple transformation, and enough power to perform even complex effects such as steep parallax-mapped lighting in the fragment pipeline.
In conclusion, while Xenos (360's GPU) may or may not be better than RSX, RSX is CERTAINLY more than powerful enough to handle its own T&L. Cell's SPEs are, at least on some level, a compromise between the massively data-parallel yet somewhat braindead pipelines of a GPU and the more-or-less serial yet significantly intelligent threads of a modern CPU. They'd be great for accelerating physics (Bullet, i believe, has a Cell backend) or AI, but really add fuck all to the 3D rendering side of the console.

--
You're doing it wrong.