How Sony's Development of the Cell Processor Benefited Microsoft
The Wall Street Journal is running an article about a recently released book entitled "The Race for a New Game Machine" which details Sony's development of the Cell processor, written by two of the engineers who worked on it. They also discuss how Sony's efforts to create a next-gen system backfired by directly helping Microsoft, one of their main competitors. Quoting:
"Sony, Toshiba and IBM committed themselves to spending $400 million over five years to design the Cell, not counting the millions of dollars it would take to build two production facilities for making the chip itself. IBM provided the bulk of the manpower, with the design team headquartered at its Austin, Texas, offices. ... But a funny thing happened along the way: A new 'partner' entered the picture. In late 2002, Microsoft approached IBM about making the chip for Microsoft's rival game console, the (as yet unnamed) Xbox 360. In 2003, IBM's Adam Bennett showed Microsoft specs for the still-in-development Cell core. Microsoft was interested and contracted with IBM for their own chip, to be built around the core that IBM was still building with Sony. All three of the original partners had agreed that IBM would eventually sell the Cell to other clients. But it does not seem to have occurred to Sony that IBM would sell key parts of the Cell before it was complete and to Sony's primary videogame-console competitor. The result was that Sony's R&D money was spent creating a component for Microsoft to use against it."
What parts of the processor did IBM pass on to Microsoft? The XBox 360 processor Xenon is basically a three core hyperthreaded PowerPC. The Playstation 3 has a single PowerPC core (not hyperthreaded) and 7 (or 8) simpler SPU processors.
in so many levels.
Can't believe Sony would be so negligent not obtaining any exclusivity agreements against its competitors.
Can't believe IBM would permit such an arrangement; and carry out the release of the Cell Processor designs w/o Sony and Toshiba's willful consent. Bad Practice, Bad PR. I don't give a rip about making money at any costs. Now which major Japanese Company would be foolish enough to approach IBM's hardware team after this?
That's what happens when you delegate too much R&D, I guess...
Because it was a really misdirected effort when it came to a console. Sony really had no idea what the hell they were doing as far as making a chip for their console. Originally, they thought the Cell would be the graphics chip. Ya well turned out not to be near powerful enough for that, so late in the development cycle they went to nVidia to get a chip. Problem was, with the time frame they needed, they couldn't get it very well customized.
For example in a console, you normally want all the RAM shared between GPU and CPU. There's no reason to have them have separate RAM modules. The Xbox 360 does this, there's 512MB of RAM that is usable in general. The PS3 doesn't, it had 256MB for each CPU and GPU. Reason is that's how nVidia GPUs work in PCs and that's where it came from. nVidia didn't have the time to make them a custom one for the console, as ATi did for Microsoft. This leads to situations where the PS3 runs out of memory for textures and the 360 doesn't. It also means that the Cell can't fiddle with video RAM directly. It's power could perhaps be better used if it could directly do operations at full speed on data in VRAM but it can't.
So what they ended up with is a neat processor that is expensive, and not that useful. The SPEs that make up the bulk of the Cell's muscle are hard to use in games given the PS3's setup, and often you are waiting on the core to get data to and from them.
It's a neat processor, but a really bad idea for a video game console. Thus despite the cost and hype, the PS3 isn't able to outdo the 360 in terms of graphics (in some games it even falls behind).
I really don't know what the hell Sony was thinking with putting a brand new kind of processor in a console. I'm willing to bet in 10 years there are compilers and systems out there that make real good use of the Cell. However that does you no good with games today.
Thus we see the current situation of the PS3 having weak sales as compares to the 360 and Wii. It is high priced, with the idea that it brings the best performance, but that just doesn't bare out in reality.
Actually PowerPC is descendant of IBM's POWER2 processor. POWER and POWER2 processors were used in supercomputers and servers.
The XBox360 cores don't have any superscalar features, things like branch prediction, instruction re-ordering or speculative execution. That means they use much less power than a regular core (and so generate less heat), but only run branchy game logic type code at around half the speed.
They are very different approaches. The 360's CPU is basically a 3-core, 6-context, in-order variant of the POWER4 with a vector unit. In terms of pure number crunching ability, it's pretty pathetic next to the Cell. On the other hand, it is based on a model that we have spent 30 years building compilers for. You only need to write slightly-parallel, conventional code to get close to 100% of the theoretical performance out of it.
In contrast, the Cell has one PPU which is roughly equivalent to one context on the 360's CPU (somewhere between 1/3 and 1/6 of the speed). It also has 7 SPUs. These are very fast, but they're basically small DSPs. They have very wide vector units and are limited to working on 256KB of data at a time. You can use them to implement accelerator kernels for certain classes of algorithm, but getting good performance out of them is hard.
In terms of on-paper performance, the Cell is a long way out in front, but it is a long way behind in ease of programming, meaning that you generally get a much smaller fraction of the maximum throughput.
I am TheRaven on Soylent News
It was the same problem with the PS2. It took developers a few good years to really start to push the hardware. Look at some of the later games that really push the envelope like say Final Fantasy XII or Shadow of Colossus. The PS2 was certainly capable of some nice visuals but the other consoles were ultimately superior while basically using off the shelf hardware. Developers were pushing the Xbox and the Gamecube almost nearly from day one. I think the cell has backfired, but not for the reason that Microsoft shares aspects of their core. Parallel processing is indeed the future, but not in the form of vector units, but rather general purpose chips. The one size fits all approach is inefficient but at the same time it has been the approach that has worked to fit the needs of modern computer users. Hardware should get easier to program on over time, certainly not harder. What happened to those predictions that in the future the average user will be able to code just by throwing some GUI elements together and maybe even describing the program to the computer a bit and having it generate the program for you? How far away are we from that day? (It seems an awfully long way away and the visual IDE is not the same as what I am describing here)
zosxavius photography
Indeed. There's really no reason why the 256 KB of memory should be any particular obstacle, as long as you have decent access to main memory, and if you program correctly. It's bigger than a lot of L1 caches, after all, and it's on chip so it's very high speed memory.
Yes, you'll sacrifice some performance over purely streamable problems, but that would happen anyway. It's just making the trade-off explicit.
The problem, of course, is that you have to do your own cache management, and most programmers haven't had to think about that since the day hardware caches were invented.
At first glance, the Xbox CPU doesn't really resemble Cell, but if you just compare Cell's PPE to one of Xenon's three cores the similarity is striking: Xenon, Cell
My server
Programming issue as result of development tools? I am a Symbian user since Nokia 7650 (first S60) and I keep getting amazed at the developers love for iPhone, how a very advanced application like Fring can ship in matter of months without any kind of help from Apple and how wisely OpenGL (ES) acceleration was used while it is ignored on my poor UIQ3 Sony Ericsson P1i for years until Opera 9.5 beta.
People say SSE could just reach the point of Altivec after new Xeons and yet as a G5 owner, I kept wondering why Altivec was not used many times even by Apple themselves in certain parts. Or SMP (I got Quad G5) is just to be seen in full potential after OS X Leopard. It has easy answer. Intel and AMD does offer great support to developers, the entire gnu compiler family and OS developers.
If it is a programming issue and both IBM and Sony involved, I would look to Development tools. Somehow I suspect the development tools and support for them offered way better on XBox 360. Compare the Symbian UIQ3 market to way more premature (in terms of coding/ui) Nokia S60 and finally compare Symbian S60 to iPhone. Development tools really makes huge difference and Sony is a hardware company, IBM doesn't really have clue about end user etc.
Do you have even a vague understanding of what 'transform' and 'lighting' actually mean? Allow me to elucidate.
'transform' refers to the act of converting vertex positions in model space (the coordinate system used in the vertex buffers) to clip (camera) space. This is typically one matrix * vector multiplication per vertex; the vertex's position in model space is multiplied by the world-view-projection matrix. On modern hardware, this is generally done in the vertex program (other things may be done to the vertex's position before or after the co-ord transform, mind you, such as multiplication by a set of bone matrices for hardware animation, etc.)
'lighting' refers to the process of deciding the colour of each fragment ('potential pixel'). Before programmable graphics hardware, this was done by taking the dot product of the vertex normal and the light vector (or position, depending on light type), and multiplying it by the light's diffuse colour. The resulting colour intensity was then linearly interpolated across the face between vertices, and used to light the texture in conjunction with the ambient term. With modern programmable hardware, lighting is usually done per-fragment based on a normal map, which is input as a second texture to the fragment program. The light position is converted from object (or world) space into 'tangent' space, which is a coordinate system whose basis vectors are parallel and orthogonal to the plane being lit, and the surface is lit based on the dot product of the light vector in tangent space and the normal from the normal map.
Back in the bad old days, when men with beards owned IRIX boxes and everybody else had a TNT2 or worse, transform and lighting were done in software for most folks, by a client of the rendering system, before the primitives were submitted as draw calls to the rendering system. Post-about-2001, cards with hardware T&L, such as Geforce 256, showed up in the PC space. These cards were the first consumer 3D hardware to perform fixed-function transform and lighting (roughly as I've described it above) in silicon. The API didn't change much, although there was a DirectX version bump (6 to 7). OpenGL programmers didn't really notice; the library itself, obviously, had to know if it was talking to a fixed-function card or a dumb card, but most OpenGLs were provided by hardware vendors in any event, so this wasn't a factor.
Fast forward to today, everybody's using hardware which allows parts (most, these days) of the rendering pipeline to be replaced entirely with programs written by the engine developer (or even the artist, in some cases.) Transforming vertices can be done in conjunction with all manner of other crap, and lighting can be done using whatever model the programmer/artist desire. Regardless, however, it's all done in the same pipeline on the GPU. If the SPUs, as you suggest, were pre-transforming and pre-lighting vertices before writing them to 3d hardware's vertex buffers, then all you'd get is some really confused 3d hardware. RSX (the 3D chip loosely based on nvidia's G70 architecture) has 8 vertex pipelines and 24 fragment pipelines, all programmable. This is more than enough power to do significantly more with each vertex than simple transformation, and enough power to perform even complex effects such as steep parallax-mapped lighting in the fragment pipeline.
In conclusion, while Xenos (360's GPU) may or may not be better than RSX, RSX is CERTAINLY more than powerful enough to handle its own T&L. Cell's SPEs are, at least on some level, a compromise between the massively data-parallel yet somewhat braindead pipelines of a GPU and the more-or-less serial yet significantly intelligent threads of a modern CPU. They'd be great for accelerating physics (Bullet, i believe, has a Cell backend) or AI, but really add fuck all to the 3D rendering side of the console.
You're doing it wrong.
While it certainly sounds like you know what you're talking about, it's pretty clear to anyone with a game-dev background you do not.
Cell's SPEs are actually PRIMARILY used as aids to graphics processing (T&L) by most developers. Look into how games like Heavenly Sword use the SPEs as part of its "faux" HDR or games like Killzone 2 use SPEs to implement deferred rendering for awesome smoke effects. The SPEs are, in PRACTICAL TERMS to PS3 game developers, very essential to the 3D rendering side of the console.
While RSX is "powerful enough" to do its own T&L, it cannot compare to the standalone power of the 360's Xenos chip. There are many reasons for this (6 fixed vertex shaders on RSX vs the unified shaders on the 360 which permit far higher vertex workloads, to the RSX's limited bandwidth vs the 360's eDRAM bandwidth, to triangle setup rates). On the PS3, developers need to leverage Cell in intelligent ways to draw comparable graphics to the 360. If an intelligent and determined PS3 developer really leverages Cell, it can make unparalleled graphic in the console world. The problem is, it costs a fortune in time and money to do it and very few developers can. It's simply not worth it to even attempt it for most developers.
As a sidenote, Cell is not at all good for most game AI for many reasons (not the least of which is the lack of branch predictors in the SPEs).
Additionally, people keep making the mistake of assuming the PPU in the Cell is basically the same as each core in the 360's CPU. That's not at all true. There are some significant differences, including native Direct3D format support in the 360's CPU to the new VMX128 vector units (which have 128 registers per context per core [6 total], vs 32 on the PPU) as well as additional instructions specifically tailored towards 3D games (like single-cycle dot-product instructions). The combined triple VMX-128 units on the 360 are still faster than most quad-core Core i7 in vector processing, so I'm perplexed by the notion that it's somewhat slow or underpowered from what I've read from some people.
If you're truly interested in how PS3 games use Cell, check out the Beyond3D community where PS3 developers post in detail about how they do what they do. And Cell is a major factor in 3D rendering on the PS3. It has to be.