The Art of PS3 Programming
The Guardian Gamesblog has a longish piece talking with Volatile Games, developers of the title Possession for the PS3, about what it's like to make a game for Sony's next-gen console. From the article: "At the end of the day it's just a multi-processor architecture. If you can get something running on eight threads of a PC CPU, you can get it running on eight processors on a PS3 - it's not massively different. There is a small 'gotcha' in there though. The main processor can access all the machine's video memory, but each of the seven SPE chips has access only to its own 256k of onboard memory - so if you have, say, a big mesh to process, it'll be necessary to stream it through a small amount of memory - you'd have to DMA it up to your cell chip and then process a little chunk, then DMA the next chunk, so you won't be able to jump around the memory as easily, which I guess you will be able to do on the Xbox 360."
Apparently, the machine's use of Open GL as its graphics API means that anyone who's ever written games for the PC will be intimately familiar with the set-up.
As a programmer, I can attest to OpenGL being a God-send. Not only are programmers intimately familiar with the technology, but it was designed from the beginning with portability in mind. Direct3D, OTOH, tends to follow Microsoft's practices of hiding what's really going on behind the scenes. It's been a little while since I've bothered with Direct3D, but one of Microsoft's biggest features used to be their own SceneGraph known as "Retained Mode". For some reason, Microsoft believed that everyone would want to use their Scenegraph only and damn technological progress. Most programmers who were in the know immediately bypassed this ridiculousness and went straight for the "Immediate Mode" APIs, which weren't as well documented. (Thanks Microsoft)
Wikipedia has a comparison of Direct3D vs. OpenGL here: http://en.wikipedia.org/wiki/Direct3D_vs._OpenGL
Other than that, a computer is a computer, and game programming has always required a strong knowledge of how computers operate. So it's not too surprising that it would be "just like any other programming +/- a few gotchas".
Javascript + Nintendo DSi = DSiCade
Keep in mind that all the "extra" cores are special-purpose cores that can only execute code specifically written for them. They are not general-purpose cores so you can run 16 applications simultaneously. Also consider that the CPUs for the new consoles are targeted at consoles and not multitasking operating systems with lots of context switching. There's also the roadmap issue. Sure, this one processor will be available, but what about speed bumps and future generations?
I'm still baffled into how you can efficiently break up a game into 8 threads.
.... woops problem...need critical sections for this to operate with the graphics thread.
.... woops problem, need ritical sections for this to operate with the physics thread..
ok controller input on one..
graphics on another..
physics on a third
networking on a fourth
sound... ok no problems here, thats 5.
See, even dividing it up into 5 threads causes problems, you need to make sure that you are allowed to do something on one processor and if not you must wait on another processor to finish. critical sections are something that can ultimately cause your code to run slower than if it was not multithreaded in the first place.
More info on critical sections, and other issues involved with programming multithreaded apps can be found here
Kent Simon Multitheft Auto
There are other ways to divy up work.
If your intention is to put independent tasks out to different processors, you will run into huge issues like the ones you describe.
Instead, consider the beginning of each logical step in the game loop as a "constriction/delegation" point: You constrict, meaning that only one thread is running right now. Then, say, it's time for particles. You now wake up your eight particle worker threads, divy up the gargantuan 2000 particle emitter loop into 250 emitters each. You then instruct each particle thread to work through the 250 emitters and wait for them all to finish.
Naturally your real performance won't be as if you only had to process 250 emitters, but let's say you lose 50% due to internal synchronization, you've still processed all your particles in 25% of the time.
Another way is to pipeline the tasks: You know that all your game gizmos have to first do this, then that and then the other. You create three task threads, one that does "this", one that does "that" and one that does "the other". You feed the first gizmo to the "this" thread. When it is done, it will feed the gizmo on towards the "that"-thread. When the "that"-thread is done, it will in lastly feed the gizmo on the "the other"-thread.
But once the first thread (the "this" task) is done, it can accept a new gizmo while the "that"-thread munches on the first.
Advantage to this scheme is better memory locality (which seems like it is more important on PS3 that, say, PC) that the divide'n'conquer approach described first. Of course, individual game gizmos may have dependencies in between them, so you need a proper dependency graph to feed gizmos off the right order.
It's doable, as long as you don't think 8 threads have to independently work on completely different tasks at the same time.
A lot of people seem to be approaching the concept of the Cell processor improperly. The chip itself is not designed for the "Design a game in 8 threads" approach people seem to be thinking of. It's designed based on a forman/worker metaphore. The main chip handles the work of figuring out what comes next, the SPE's do the heavy lifting.
...
Don't think
Processor 1 = AI
Processor 2 = Physics
Processor 3 =
etc.
Instead picture the main CPU going through a normal game loop (simplified here)
Step 1: Update positions
Step 2: Check for collisions
Step 3: Perform motion caluclations
Step 4: AI
At the beginning of each step the main CPU farms out the work to the SPE's. So, you have a burst of activity in the SPE's for each step, thun a lull as the main core figures out what to do next.