The Art of PS3 Programming
The Guardian Gamesblog has a longish piece talking with Volatile Games, developers of the title Possession for the PS3, about what it's like to make a game for Sony's next-gen console. From the article: "At the end of the day it's just a multi-processor architecture. If you can get something running on eight threads of a PC CPU, you can get it running on eight processors on a PS3 - it's not massively different. There is a small 'gotcha' in there though. The main processor can access all the machine's video memory, but each of the seven SPE chips has access only to its own 256k of onboard memory - so if you have, say, a big mesh to process, it'll be necessary to stream it through a small amount of memory - you'd have to DMA it up to your cell chip and then process a little chunk, then DMA the next chunk, so you won't be able to jump around the memory as easily, which I guess you will be able to do on the Xbox 360."
Apparently, the machine's use of Open GL as its graphics API means that anyone who's ever written games for the PC will be intimately familiar with the set-up.
As a programmer, I can attest to OpenGL being a God-send. Not only are programmers intimately familiar with the technology, but it was designed from the beginning with portability in mind. Direct3D, OTOH, tends to follow Microsoft's practices of hiding what's really going on behind the scenes. It's been a little while since I've bothered with Direct3D, but one of Microsoft's biggest features used to be their own SceneGraph known as "Retained Mode". For some reason, Microsoft believed that everyone would want to use their Scenegraph only and damn technological progress. Most programmers who were in the know immediately bypassed this ridiculousness and went straight for the "Immediate Mode" APIs, which weren't as well documented. (Thanks Microsoft)
Wikipedia has a comparison of Direct3D vs. OpenGL here: http://en.wikipedia.org/wiki/Direct3D_vs._OpenGL
Other than that, a computer is a computer, and game programming has always required a strong knowledge of how computers operate. So it's not too surprising that it would be "just like any other programming +/- a few gotchas".
Javascript + Nintendo DSi = DSiCade
Keep in mind that all the "extra" cores are special-purpose cores that can only execute code specifically written for them. They are not general-purpose cores so you can run 16 applications simultaneously. Also consider that the CPUs for the new consoles are targeted at consoles and not multitasking operating systems with lots of context switching. There's also the roadmap issue. Sure, this one processor will be available, but what about speed bumps and future generations?
TFA says they are contemplating a job-queue organization, with cores taking jobs as they become available. Provided the size of the 'jobs' are limited so they fit comfortably within the overall time it takes to calculate a frame, it should work fairly well. A lot of physical-simulation problems are close to 'embarassingly parallel', anyway.
PHEM - party like it's 1997-2003!
6) Monsters
7) Aliens
8) Baddies
"Wise men talk because they have something to say; fools, because they have to say something" - Plato
One thing to think about though, regarding threading.
Just because you have critical sections in one thread that may have to hang out waiting for another thread, doesn't mean that at some point in time the two threads can't execute simultaneously while not needing data from one another. At times like that, you get speedup (especially since you have seperate cores/processing units/whatever)
Karnal
What I find interesting about the question of "What can I do with 8 threads?" is that most people seem to assume that you can only have one graphics thread. Why not have 2? Or 3? Or 6? The Emotion Engine's core design is based around having two parallel programmable units handling graphics at the same time, for example one animates the surface of a lake while the other makes the pretty refracted light patterns on the bottom. Yes, it's nastier to program than standard single-thread-for-each-task programming, but it makes for a very powerful architecture when used properly. Similar things can be done with other parts of a game, and if you design your data layout and flow correctly you minimise the need for synchronisation. You could draw your frame with 7 parallel threads, then flip all the SPEs over to handle the physics, input, etc update for the next frame. It's all just a matter of thinking about how you design your game.
(If the OS analogy is flawed, sorry).
A lot of people seem to be approaching the concept of the Cell processor improperly. The chip itself is not designed for the "Design a game in 8 threads" approach people seem to be thinking of. It's designed based on a forman/worker metaphore. The main chip handles the work of figuring out what comes next, the SPE's do the heavy lifting.
...
Don't think
Processor 1 = AI
Processor 2 = Physics
Processor 3 =
etc.
Instead picture the main CPU going through a normal game loop (simplified here)
Step 1: Update positions
Step 2: Check for collisions
Step 3: Perform motion caluclations
Step 4: AI
At the beginning of each step the main CPU farms out the work to the SPE's. So, you have a burst of activity in the SPE's for each step, thun a lull as the main core figures out what to do next.
At the end of the day, people who say "at the end of the day" just REALLY need to stop saying "at the end of the day".
The limiting factor on computing speed in the last several years has not been processor design or clock speed, but memory speed. Normal architectures feature two levels of fast SRAM to insulate the processor from the latencies inherent with accessing DRAM over a shared bus. That doesn't get rid of multi-cycle delays, it just tries to reduce their likelihood. Data cache misses are expensive, but instruction cache misses are even more expensive -- all the pipelining that modern processors use to handle large workloads efficiently will break down every time the processor stalls loading instructions from main memory.
The PS3's Cell processor offers a different solution to the problem -- sub-processors with fast local memory, and an explicitly programmed way to copy memory areas between processors (the "DMA" that the article mentions). The SPEs allow significant chunks of the batch-processing-style parts of a game to run on a processor that has no memory latencies, for data or instructions. Since memory-stall delays can run into the double digits, you can expect the performance increase from fast memory to be in the double digit range too. I've seen a public demo of some medical-imaging software that ran ~50x faster when rewritten for Cell. (The private demos I've seen are similarly impressive, but I can't describe those in detail. :-)
A traditional multi-processing architecture, like the 3 dual-core chips in the X360, has no such escape from the memory latencies. All coordination of memory state between processors (i.e. through the level 2 cache) is done on demand, when a processor suddenly finds it has a need for it. Prefetching is of course possible, but the minor efficiency gains to be made from prefetching (when they can be found at all) is vastly outweighed by the inherent efficiency of explicitly-programmed DMA transfers. Multi-buffering the DMA transfers allows the SPEs to run uninterrupted, without having to wait for the next batch of data to arrive -- something that isn't really possible with a traditional level-2-cache in a traditional multiprocessing system.
In short, the very nontraditional setup of the PS3's Cell chip is capable of vastly outpowering the traditional multiprocessor setup of the X360, mostly due to successfully eliminating memory latency.
Yes, writing code that can run like this is a major freaking pain in the ass. But so what? The biggest reason most code is hard to run on such an architecture is that the code was poorly thought out, poorly designed, and not documented. Any decently-written application can be re-factored to run like this. Besides, this is the future: Cell really does seem to solve the memory latency problem that's crippling traditional computing architectures, and the performance difference is astounding. If you can't rise to the level of code written for such a complex architecture, then your job is in danger of getting outsourced to Third World nations for $5 an hour...as it should be. So quit your whining.
(First post in ten months. Feels good!)
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters