The Art of PS3 Programming

← Back to Stories (view on slashdot.org)

Posted by Zonk on Friday January 27, 2006 @07:41AM from the zombies-ftw dept.

The Guardian Gamesblog has a longish piece talking with Volatile Games, developers of the title Possession for the PS3, about what it's like to make a game for Sony's next-gen console. From the article: "At the end of the day it's just a multi-processor architecture. If you can get something running on eight threads of a PC CPU, you can get it running on eight processors on a PS3 - it's not massively different. There is a small 'gotcha' in there though. The main processor can access all the machine's video memory, but each of the seven SPE chips has access only to its own 256k of onboard memory - so if you have, say, a big mesh to process, it'll be necessary to stream it through a small amount of memory - you'd have to DMA it up to your cell chip and then process a little chunk, then DMA the next chunk, so you won't be able to jump around the memory as easily, which I guess you will be able to do on the Xbox 360."

21 of 99 comments (clear)

Min score:

Reason:

Sort:

It uses OpenGL by AKAImBatman · 2006-01-27 07:53 · Score: 5, Informative

Apparently, the machine's use of Open GL as its graphics API means that anyone who's ever written games for the PC will be intimately familiar with the set-up.

As a programmer, I can attest to OpenGL being a God-send. Not only are programmers intimately familiar with the technology, but it was designed from the beginning with portability in mind. Direct3D, OTOH, tends to follow Microsoft's practices of hiding what's really going on behind the scenes. It's been a little while since I've bothered with Direct3D, but one of Microsoft's biggest features used to be their own SceneGraph known as "Retained Mode". For some reason, Microsoft believed that everyone would want to use their Scenegraph only and damn technological progress. Most programmers who were in the know immediately bypassed this ridiculousness and went straight for the "Immediate Mode" APIs, which weren't as well documented. (Thanks Microsoft)

Wikipedia has a comparison of Direct3D vs. OpenGL here: http://en.wikipedia.org/wiki/Direct3D_vs._OpenGL

Other than that, a computer is a computer, and game programming has always required a strong knowledge of how computers operate. So it's not too surprising that it would be "just like any other programming +/- a few gotchas".

--
Javascript + Nintendo DSi = DSiCade
1. Re:It uses OpenGL by HeavyMS · 2006-01-27 08:08 · Score: 2, Informative
  
  "It's been a little while since I've bothered with Direct3D"
  We can tell Immediate Mode/Retained Mode is ancient history..
2. Re:It uses OpenGL by amliebsch · 2006-01-27 08:11 · Score: 4, Informative
  
  Thee article you cite to doesn't really support your conclusion of OpenGL being a "god-send." Instead, the article seems to conclude that at this stage, for all intents and purposes, the two APIs are functionally equivalent.
  
  --
  If you don't know where you are going, you will wind up somewhere else.
3. Re:It uses OpenGL by AKAImBatman · 2006-01-27 08:26 · Score: 3, Informative
  
  Thee article you cite to doesn't really support your conclusion of OpenGL being a "god-send."
  
  OpenGL is a God-send for a couple of reasons, IMHO:
  
  1) The API is well known by developers, and has remained stable from version to version. This reduces the amount of R&D and training that need to be done for a game.
  
  2) Use of OpenGL allows for portable code. While you can't completely get away with writing the same code between a PC version and a Console version, much of the rendering engine at least has a chance of getting reused.
  
  3) Carmack says so. ;-)
  
  4) New features actually go through a standards process, meaning that they get more documentation than just "whatever Microsoft feels like telling you".
  
  5) DirectX is a non-portable skill. It ties you to Windows and the X-Box(s). OpenGL "ties" you to the Gamecube, Windows, PS2, PS3, Linux, Macintosh, etc.
  
  That's my opinion, for what it's worth. That and 50 cents will get you a cup of coffee, so take it as you will.
  
  --
  Javascript + Nintendo DSi = DSiCade
4. Re:It uses OpenGL by jinzumkei · 2006-01-27 08:46 · Score: 5, Insightful
  
  1) The API is well known by developers, and has remained stable from version to version. This reduces the amount of R&D and training that need to be done for a game.
  Uh most games nowadays use D3D.
  
  2) Use of OpenGL allows for portable code. While you can't completely get away with writing the same code between a PC version and a Console version, much of the rendering engine at least has a chance of getting reused.
  If you write a flexible enough rendering engine this wont matter so much.
  
  3) Carmack says so. ;-)
  yeah, ok. good reason
  
  4) New features actually go through a standards process, meaning that they get more documentation than just "whatever Microsoft feels like telling you".
  Which also means it takes long YEARS for a new version to come out, how long have we been waiting on OpenGL 2.0? Some cool things have come out since and OpenGL is always playing catch now.
  
  5) DirectX is a non-portable skill. It ties you to Windows and the X-Box(s). OpenGL "ties" you to the Gamecube, Windows, PS2, PS3, Linux, Macintosh, etc.
  Graphics Programming is a portable skill, I've never met a good graphics programmer who couldnt switch between the two on the fly. Honestly if you can only do graphics in 1 or the other that's pretty worthless.
  
  I'm sorry the whole DX vs OGL war is really old and really lame, Neither are a "god-send". They are both tools, use the one that is best for the job.
5. Re:It uses OpenGL by Haeleth · 2006-01-27 09:36 · Score: 4, Insightful
  
  [Retained Mode] was stupid to begin with, yet Microsoft kept pushing it version after version. I know it was still there at least as high as DirectX 5.0.
  
  Look, you're welcome to hate Microsoft if you choose, but your memory is rather inaccurate.
  
  Get this: Retained mode was not meant for games. Microsoft never "pushed" it for games. Immediate mode was always there for games to use. Games were always supposed to use immediate mode.
  
  It's been a while since I read the documentation for ancient DirectX versions, but IIRC it actually said, right there, quite explicitly, in the documentation, that retained mode was not meant for high-performance graphics and that games should use immediate mode.
  
  The idea of retained mode was that it provided a much simpler interface. It was intended for use by multimedia applications that did not require the power and flexiblity of immediate mode, but just wanted to throw a few 3D meshes on screen and move them about a bit, without all the hassle of coding all the data structures and transformations by hand. It didn't catch on, and it eventually died, but it wasn't stupid by any means, and something very similar will be making a comeback in Windows Vista.
  
  At least, I say it wasn't stupid. Maybe it was stupid. I don't see how providing a simplified API for simple applications, and a complex API for complex applications, is "stupid", but then I use Microsoft software out of choice, so clearly I don't hate Microsoft badly enough yet for me to be able to judge their decisions objectively.
6. Re:It uses OpenGL by Trinn · 2006-01-27 10:02 · Score: 2, Informative
  
  As someone who has done a lot of programming in SDL, yes it has an input layer that is fairly good, integrated into the main SDL event loop. The major missing piece as far as myself and my fellow developers are concerned is a library for mapping SDL input events to some sort of internal game events (control customization). Eventually we plan to write one, but we are...lazy.
Re:Yet another reason by BillBrasky · 2006-01-27 08:06 · Score: 3, Informative

Keep in mind that all the "extra" cores are special-purpose cores that can only execute code specifically written for them. They are not general-purpose cores so you can run 16 applications simultaneously. Also consider that the CPUs for the new consoles are targeted at consoles and not multitasking operating systems with lots of context switching. There's also the roadmap issue. Sure, this one processor will be available, but what about speed bumps and future generations?
8 Threads? by Kent+Simon · 2006-01-27 08:20 · Score: 2, Informative

I'm still baffled into how you can efficiently break up a game into 8 threads.

ok controller input on one..
graphics on another..
physics on a third .... woops problem...need critical sections for this to operate with the graphics thread.
networking on a fourth .... woops problem, need ritical sections for this to operate with the physics thread..
sound... ok no problems here, thats 5.

See, even dividing it up into 5 threads causes problems, you need to make sure that you are allowed to do something on one processor and if not you must wait on another processor to finish. critical sections are something that can ultimately cause your code to run slower than if it was not multithreaded in the first place.

More info on critical sections, and other issues involved with programming multithreaded apps can be found here

--
Kent Simon Multitheft Auto
1. Re:8 Threads? by Dr.+Manhattan · 2006-01-27 08:27 · Score: 4, Informative
  
  I'm still baffled into how you can efficiently break up a game into 8 threads.
  TFA says they are contemplating a job-queue organization, with cores taking jobs as they become available. Provided the size of the 'jobs' are limited so they fit comfortably within the overall time it takes to calculate a frame, it should work fairly well. A lot of physical-simulation problems are close to 'embarassingly parallel', anyway.
  
  --
  PHEM - party like it's 1997-2003!
2. Re:8 Threads? by hobbit · 2006-01-27 08:31 · Score: 3, Funny
  
  6) Monsters
  7) Aliens
  8) Baddies
  
  --
  "Wise men talk because they have something to say; fools, because they have to say something" - Plato
3. Re:8 Threads? by karnal · 2006-01-27 08:31 · Score: 4, Informative
  
  One thing to think about though, regarding threading.
  
  Just because you have critical sections in one thread that may have to hang out waiting for another thread, doesn't mean that at some point in time the two threads can't execute simultaneously while not needing data from one another. At times like that, you get speedup (especially since you have seperate cores/processing units/whatever)
  
  --
  Karnal
4. Re:8 Threads? by AKAImBatman · 2006-01-27 08:33 · Score: 2, Informative
  
  I can think of a few ways off the top of my head, but none I'd actually like to try coding. For example, you can divy up the collision detection process across different threads to have each processor test a given percentage of objects. Similarly, you can assign the physics handling for different objects across different processors.
  
  The article suggests that this be done by having a single "controller" processor rapid fire the tasks to the other processors. While this would work, it's also less efficient than a true parallel scheme. The article also mentions this, and comments that the scheme could result in poor utilization of the system's processors. But if a true parallel scheme is used, then it's difficult to reassign the processors in case a sudden jump in processing ability is required for, say, graphics over physics.
  
  So it would seem that there's still some question as to how useful the multi-processors concept actually is in games. At least until new methodolgies emerge. :-)
  
  --
  Javascript + Nintendo DSi = DSiCade
5. Re:8 Threads? by AuMatar · 2006-01-27 08:40 · Score: 2, Interesting
  
  Its worse than that.
  
  Controlers- no reason to have a thread, you use interrupts and wake up once a second when an input changes. No thread needed.
  
  Sound- unless you're doing heavy duty software mixing (not hardware mixing or channels), you don't need a thread, its all interrupt driven.
  
  Network IO- a thread probably isn't the best way to do it for the client. Just poll for IO once a frame. Or use select to sleep if you have no other processing to do
  
  AI- this one makes more sense, a smart AI can try and predict moves and counters ahead of time.
  
  I really don't see most games making use of 8 threads. Most games now are still single threaded.
  
  --
  I still have more fans than freaks. WTF is wrong with you people?
6. Re:8 Threads? by astromog · 2006-01-27 08:55 · Score: 5, Insightful
  
  What I find interesting about the question of "What can I do with 8 threads?" is that most people seem to assume that you can only have one graphics thread. Why not have 2? Or 3? Or 6? The Emotion Engine's core design is based around having two parallel programmable units handling graphics at the same time, for example one animates the surface of a lake while the other makes the pretty refracted light patterns on the bottom. Yes, it's nastier to program than standard single-thread-for-each-task programming, but it makes for a very powerful architecture when used properly. Similar things can be done with other parts of a game, and if you design your data layout and flow correctly you minimise the need for synchronisation. You could draw your frame with 7 parallel threads, then flip all the SPEs over to handle the physics, input, etc update for the next frame. It's all just a matter of thinking about how you design your game.
Pipeline them or divy up loops by sunbeam60 · 2006-01-27 08:54 · Score: 2, Informative

There are other ways to divy up work.

If your intention is to put independent tasks out to different processors, you will run into huge issues like the ones you describe.

Instead, consider the beginning of each logical step in the game loop as a "constriction/delegation" point: You constrict, meaning that only one thread is running right now. Then, say, it's time for particles. You now wake up your eight particle worker threads, divy up the gargantuan 2000 particle emitter loop into 250 emitters each. You then instruct each particle thread to work through the 250 emitters and wait for them all to finish.

Naturally your real performance won't be as if you only had to process 250 emitters, but let's say you lose 50% due to internal synchronization, you've still processed all your particles in 25% of the time.

Another way is to pipeline the tasks: You know that all your game gizmos have to first do this, then that and then the other. You create three task threads, one that does "this", one that does "that" and one that does "the other". You feed the first gizmo to the "this" thread. When it is done, it will feed the gizmo on towards the "that"-thread. When the "that"-thread is done, it will in lastly feed the gizmo on the "the other"-thread.

But once the first thread (the "this" task) is done, it can accept a new gizmo while the "that"-thread munches on the first.

Advantage to this scheme is better memory locality (which seems like it is more important on PS3 that, say, PC) that the divide'n'conquer approach described first. Of course, individual game gizmos may have dependencies in between them, so you need a proper dependency graph to feed gizmos off the right order.

It's doable, as long as you don't think 8 threads have to independently work on completely different tasks at the same time.
SPE overhead by ClamIAm · 2006-01-27 09:20 · Score: 4, Insightful

As game developers use the 7 SPE chips more, I wonder how much of the main CPU's time will be taken up by things like managing threads and packing up work for the SPEs. It's almost similar to an operating system, where the main CPU would almost be like a kernel, managing memory and allowing different threads to talk to each other.
(If the OS analogy is flawed, sorry).
Missing the point. by Anonymous Coward · 2006-01-27 10:38 · Score: 5, Informative

A lot of people seem to be approaching the concept of the Cell processor improperly. The chip itself is not designed for the "Design a game in 8 threads" approach people seem to be thinking of. It's designed based on a forman/worker metaphore. The main chip handles the work of figuring out what comes next, the SPE's do the heavy lifting.

Don't think
Processor 1 = AI
Processor 2 = Physics
Processor 3 = ...
etc.

Instead picture the main CPU going through a normal game loop (simplified here)
Step 1: Update positions
Step 2: Check for collisions
Step 3: Perform motion caluclations
Step 4: AI

At the beginning of each step the main CPU farms out the work to the SPE's. So, you have a burst of activity in the SPE's for each step, thun a lull as the main core figures out what to do next.
At the end of day... by Anonymous Coward · 2006-01-27 12:43 · Score: 3, Funny

At the end of the day, people who say "at the end of the day" just REALLY need to stop saying "at the end of the day".
The PS3 creams the Xbox 360. (Not a troll) by ulatekh · 2006-01-28 09:40 · Score: 3, Insightful

The limiting factor on computing speed in the last several years has not been processor design or clock speed, but memory speed. Normal architectures feature two levels of fast SRAM to insulate the processor from the latencies inherent with accessing DRAM over a shared bus. That doesn't get rid of multi-cycle delays, it just tries to reduce their likelihood. Data cache misses are expensive, but instruction cache misses are even more expensive -- all the pipelining that modern processors use to handle large workloads efficiently will break down every time the processor stalls loading instructions from main memory.
The PS3's Cell processor offers a different solution to the problem -- sub-processors with fast local memory, and an explicitly programmed way to copy memory areas between processors (the "DMA" that the article mentions). The SPEs allow significant chunks of the batch-processing-style parts of a game to run on a processor that has no memory latencies, for data or instructions. Since memory-stall delays can run into the double digits, you can expect the performance increase from fast memory to be in the double digit range too. I've seen a public demo of some medical-imaging software that ran ~50x faster when rewritten for Cell. (The private demos I've seen are similarly impressive, but I can't describe those in detail. :-)
A traditional multi-processing architecture, like the 3 dual-core chips in the X360, has no such escape from the memory latencies. All coordination of memory state between processors (i.e. through the level 2 cache) is done on demand, when a processor suddenly finds it has a need for it. Prefetching is of course possible, but the minor efficiency gains to be made from prefetching (when they can be found at all) is vastly outweighed by the inherent efficiency of explicitly-programmed DMA transfers. Multi-buffering the DMA transfers allows the SPEs to run uninterrupted, without having to wait for the next batch of data to arrive -- something that isn't really possible with a traditional level-2-cache in a traditional multiprocessing system.
In short, the very nontraditional setup of the PS3's Cell chip is capable of vastly outpowering the traditional multiprocessor setup of the X360, mostly due to successfully eliminating memory latency.
Yes, writing code that can run like this is a major freaking pain in the ass. But so what? The biggest reason most code is hard to run on such an architecture is that the code was poorly thought out, poorly designed, and not documented. Any decently-written application can be re-factored to run like this. Besides, this is the future: Cell really does seem to solve the memory latency problem that's crippling traditional computing architectures, and the performance difference is astounding. If you can't rise to the level of code written for such a complex architecture, then your job is in danger of getting outsourced to Third World nations for $5 an hour...as it should be. So quit your whining.
(First post in ten months. Feels good!)

--
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
The PS3 creams the Xbox 360 (nice try Mr. Gates) by ulatekh · 2006-01-29 11:00 · Score: 2, Insightful

Allow me to state the obvious:
(1) The PS3 has not shipped yet.
(2) There is no final PS3 hardware that runs at full speed yet.

The Cell has been available for programming for a while now. I think reference platforms (i.e. other than PS3 prototypes) might even be available. Cell is being used for far more than the PS3. Also, sure the PS3 might run faster than 3.2 GHz, but you make that sound like a bad thing!

The PS3's Cell processor offers a different solution to the problem -- sub-processors with fast local memory

Err.. each sub-processor has 256k. I really don't see how that's an advantage, especially when those sub-processors are functionally crippled.

Between them, they have 2 MB of high-speed memory, which (as you say) is becoming fairly common for L2 cache sizes, plus it's got a traditional L2 cache. So I'm not sure what you mean by "crippled". There are plenty of computing problems (including video game development) that can fit into this sort of sub-processor/DMA-communication model. Anyone that's programmed a PS2 knows this (and you sound like a video game programmer). The Cell just pushes it further.

the very nontraditional setup of the PS3's Cell chip is capable of vastly outpowering the traditional multiprocessor setup of the X360, mostly due to successfully eliminating memory latency.

O RLY. Operation phrase: "is capable of". Congratulations doing finite element analysis, non-interactive scientific computing - and rendering animations. But it'll suck for running complicated logic - particularly if that logic has to interact with the logic running on other subprocessors.

There are plenty of tasks that can be run independently with double-buffered batches of data, and not just scientific computing, but the sorts of tasks that are bound to be prevalent in next-generation video games. Physics simulation, whether for gameplay or weather/cloth/fur/etc. effects, can be made parallel & batchable after broadphase collision. Graphics transformation can be, as it is on the PS2.
"Complicated logic" can communicate between processors using ring buffers and short DMA messages. But that's only if the logic is truly complicated...this doesn't apply if the code is complicated because it's the usual not-designed, poorly-thought-out, uncommented, global/singleton-happy, spaghetti code, which is the real problem most of the time. The only thing that's going to hold up the software industry taking advantage of the Cell processor's capabilities is our own collective lameness.

--
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters