The Art of PS3 Programming
The Guardian Gamesblog has a longish piece talking with Volatile Games, developers of the title Possession for the PS3, about what it's like to make a game for Sony's next-gen console. From the article: "At the end of the day it's just a multi-processor architecture. If you can get something running on eight threads of a PC CPU, you can get it running on eight processors on a PS3 - it's not massively different. There is a small 'gotcha' in there though. The main processor can access all the machine's video memory, but each of the seven SPE chips has access only to its own 256k of onboard memory - so if you have, say, a big mesh to process, it'll be necessary to stream it through a small amount of memory - you'd have to DMA it up to your cell chip and then process a little chunk, then DMA the next chunk, so you won't be able to jump around the memory as easily, which I guess you will be able to do on the Xbox 360."
Because it's unproven, not *actually* available now and there's no laptop/low power version even planned??
I am a viral sig. Please copy me and help me spread. Thank you.
1) The API is well known by developers, and has remained stable from version to version. This reduces the amount of R&D and training that need to be done for a game.
;-)
Uh most games nowadays use D3D.
2) Use of OpenGL allows for portable code. While you can't completely get away with writing the same code between a PC version and a Console version, much of the rendering engine at least has a chance of getting reused.
If you write a flexible enough rendering engine this wont matter so much.
3) Carmack says so.
yeah, ok. good reason
4) New features actually go through a standards process, meaning that they get more documentation than just "whatever Microsoft feels like telling you".
Which also means it takes long YEARS for a new version to come out, how long have we been waiting on OpenGL 2.0? Some cool things have come out since and OpenGL is always playing catch now.
5) DirectX is a non-portable skill. It ties you to Windows and the X-Box(s). OpenGL "ties" you to the Gamecube, Windows, PS2, PS3, Linux, Macintosh, etc.
Graphics Programming is a portable skill, I've never met a good graphics programmer who couldnt switch between the two on the fly. Honestly if you can only do graphics in 1 or the other that's pretty worthless.
I'm sorry the whole DX vs OGL war is really old and really lame, Neither are a "god-send". They are both tools, use the one that is best for the job.
What I find interesting about the question of "What can I do with 8 threads?" is that most people seem to assume that you can only have one graphics thread. Why not have 2? Or 3? Or 6? The Emotion Engine's core design is based around having two parallel programmable units handling graphics at the same time, for example one animates the surface of a lake while the other makes the pretty refracted light patterns on the bottom. Yes, it's nastier to program than standard single-thread-for-each-task programming, but it makes for a very powerful architecture when used properly. Similar things can be done with other parts of a game, and if you design your data layout and flow correctly you minimise the need for synchronisation. You could draw your frame with 7 parallel threads, then flip all the SPEs over to handle the physics, input, etc update for the next frame. It's all just a matter of thinking about how you design your game.
(If the OS analogy is flawed, sorry).
[Retained Mode] was stupid to begin with, yet Microsoft kept pushing it version after version. I know it was still there at least as high as DirectX 5.0.
Look, you're welcome to hate Microsoft if you choose, but your memory is rather inaccurate.
Get this: Retained mode was not meant for games. Microsoft never "pushed" it for games. Immediate mode was always there for games to use. Games were always supposed to use immediate mode.
It's been a while since I read the documentation for ancient DirectX versions, but IIRC it actually said, right there, quite explicitly, in the documentation, that retained mode was not meant for high-performance graphics and that games should use immediate mode.
The idea of retained mode was that it provided a much simpler interface. It was intended for use by multimedia applications that did not require the power and flexiblity of immediate mode, but just wanted to throw a few 3D meshes on screen and move them about a bit, without all the hassle of coding all the data structures and transformations by hand. It didn't catch on, and it eventually died, but it wasn't stupid by any means, and something very similar will be making a comeback in Windows Vista.
At least, I say it wasn't stupid. Maybe it was stupid. I don't see how providing a simplified API for simple applications, and a complex API for complex applications, is "stupid", but then I use Microsoft software out of choice, so clearly I don't hate Microsoft badly enough yet for me to be able to judge their decisions objectively.
The ps3 will surely have way more bus contention issues than a PC
You do realize that the memory in each SPE is *local* memory, right? The 7 SPEs can all run flat out without creating any bus contention whatsoever on memory access.
And as for the DMA hardware and interconnect buses, those have immense bandwidth. I really wouldn't be concerned about contention on DMA to main memory.
The more likely problem is DMA latency, since small DMA requests may get delayed by longer ones. However, even that is unlikely unless the designers have chopped up the tasking extremely finely and thus made communication time significant compared to processing time, which is never a good idea.
The limiting factor on computing speed in the last several years has not been processor design or clock speed, but memory speed. Normal architectures feature two levels of fast SRAM to insulate the processor from the latencies inherent with accessing DRAM over a shared bus. That doesn't get rid of multi-cycle delays, it just tries to reduce their likelihood. Data cache misses are expensive, but instruction cache misses are even more expensive -- all the pipelining that modern processors use to handle large workloads efficiently will break down every time the processor stalls loading instructions from main memory.
The PS3's Cell processor offers a different solution to the problem -- sub-processors with fast local memory, and an explicitly programmed way to copy memory areas between processors (the "DMA" that the article mentions). The SPEs allow significant chunks of the batch-processing-style parts of a game to run on a processor that has no memory latencies, for data or instructions. Since memory-stall delays can run into the double digits, you can expect the performance increase from fast memory to be in the double digit range too. I've seen a public demo of some medical-imaging software that ran ~50x faster when rewritten for Cell. (The private demos I've seen are similarly impressive, but I can't describe those in detail. :-)
A traditional multi-processing architecture, like the 3 dual-core chips in the X360, has no such escape from the memory latencies. All coordination of memory state between processors (i.e. through the level 2 cache) is done on demand, when a processor suddenly finds it has a need for it. Prefetching is of course possible, but the minor efficiency gains to be made from prefetching (when they can be found at all) is vastly outweighed by the inherent efficiency of explicitly-programmed DMA transfers. Multi-buffering the DMA transfers allows the SPEs to run uninterrupted, without having to wait for the next batch of data to arrive -- something that isn't really possible with a traditional level-2-cache in a traditional multiprocessing system.
In short, the very nontraditional setup of the PS3's Cell chip is capable of vastly outpowering the traditional multiprocessor setup of the X360, mostly due to successfully eliminating memory latency.
Yes, writing code that can run like this is a major freaking pain in the ass. But so what? The biggest reason most code is hard to run on such an architecture is that the code was poorly thought out, poorly designed, and not documented. Any decently-written application can be re-factored to run like this. Besides, this is the future: Cell really does seem to solve the memory latency problem that's crippling traditional computing architectures, and the performance difference is astounding. If you can't rise to the level of code written for such a complex architecture, then your job is in danger of getting outsourced to Third World nations for $5 an hour...as it should be. So quit your whining.
(First post in ten months. Feels good!)
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
The Cell has been available for programming for a while now. I think reference platforms (i.e. other than PS3 prototypes) might even be available. Cell is being used for far more than the PS3. Also, sure the PS3 might run faster than 3.2 GHz, but you make that sound like a bad thing!
Between them, they have 2 MB of high-speed memory, which (as you say) is becoming fairly common for L2 cache sizes, plus it's got a traditional L2 cache. So I'm not sure what you mean by "crippled". There are plenty of computing problems (including video game development) that can fit into this sort of sub-processor/DMA-communication model. Anyone that's programmed a PS2 knows this (and you sound like a video game programmer). The Cell just pushes it further.
There are plenty of tasks that can be run independently with double-buffered batches of data, and not just scientific computing, but the sorts of tasks that are bound to be prevalent in next-generation video games. Physics simulation, whether for gameplay or weather/cloth/fur/etc. effects, can be made parallel & batchable after broadphase collision. Graphics transformation can be, as it is on the PS2.
"Complicated logic" can communicate between processors using ring buffers and short DMA messages. But that's only if the logic is truly complicated...this doesn't apply if the code is complicated because it's the usual not-designed, poorly-thought-out, uncommented, global/singleton-happy, spaghetti code, which is the real problem most of the time. The only thing that's going to hold up the software industry taking advantage of the Cell processor's capabilities is our own collective lameness.
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters