The Art of PS3 Programming
The Guardian Gamesblog has a longish piece talking with Volatile Games, developers of the title Possession for the PS3, about what it's like to make a game for Sony's next-gen console. From the article: "At the end of the day it's just a multi-processor architecture. If you can get something running on eight threads of a PC CPU, you can get it running on eight processors on a PS3 - it's not massively different. There is a small 'gotcha' in there though. The main processor can access all the machine's video memory, but each of the seven SPE chips has access only to its own 256k of onboard memory - so if you have, say, a big mesh to process, it'll be necessary to stream it through a small amount of memory - you'd have to DMA it up to your cell chip and then process a little chunk, then DMA the next chunk, so you won't be able to jump around the memory as easily, which I guess you will be able to do on the Xbox 360."
Apparently, the machine's use of Open GL as its graphics API means that anyone who's ever written games for the PC will be intimately familiar with the set-up.
As a programmer, I can attest to OpenGL being a God-send. Not only are programmers intimately familiar with the technology, but it was designed from the beginning with portability in mind. Direct3D, OTOH, tends to follow Microsoft's practices of hiding what's really going on behind the scenes. It's been a little while since I've bothered with Direct3D, but one of Microsoft's biggest features used to be their own SceneGraph known as "Retained Mode". For some reason, Microsoft believed that everyone would want to use their Scenegraph only and damn technological progress. Most programmers who were in the know immediately bypassed this ridiculousness and went straight for the "Immediate Mode" APIs, which weren't as well documented. (Thanks Microsoft)
Wikipedia has a comparison of Direct3D vs. OpenGL here: http://en.wikipedia.org/wiki/Direct3D_vs._OpenGL
Other than that, a computer is a computer, and game programming has always required a strong knowledge of how computers operate. So it's not too surprising that it would be "just like any other programming +/- a few gotchas".
Javascript + Nintendo DSi = DSiCade
No, but it'll run Linux.
"This is considered plagiarism."
Because it's unproven, not *actually* available now and there's no laptop/low power version even planned??
I am a viral sig. Please copy me and help me spread. Thank you.
Keep in mind that all the "extra" cores are special-purpose cores that can only execute code specifically written for them. They are not general-purpose cores so you can run 16 applications simultaneously. Also consider that the CPUs for the new consoles are targeted at consoles and not multitasking operating systems with lots of context switching. There's also the roadmap issue. Sure, this one processor will be available, but what about speed bumps and future generations?
I'm still baffled into how you can efficiently break up a game into 8 threads.
.... woops problem...need critical sections for this to operate with the graphics thread.
.... woops problem, need ritical sections for this to operate with the physics thread..
ok controller input on one..
graphics on another..
physics on a third
networking on a fourth
sound... ok no problems here, thats 5.
See, even dividing it up into 5 threads causes problems, you need to make sure that you are allowed to do something on one processor and if not you must wait on another processor to finish. critical sections are something that can ultimately cause your code to run slower than if it was not multithreaded in the first place.
More info on critical sections, and other issues involved with programming multithreaded apps can be found here
Kent Simon Multitheft Auto
There are other ways to divy up work.
If your intention is to put independent tasks out to different processors, you will run into huge issues like the ones you describe.
Instead, consider the beginning of each logical step in the game loop as a "constriction/delegation" point: You constrict, meaning that only one thread is running right now. Then, say, it's time for particles. You now wake up your eight particle worker threads, divy up the gargantuan 2000 particle emitter loop into 250 emitters each. You then instruct each particle thread to work through the 250 emitters and wait for them all to finish.
Naturally your real performance won't be as if you only had to process 250 emitters, but let's say you lose 50% due to internal synchronization, you've still processed all your particles in 25% of the time.
Another way is to pipeline the tasks: You know that all your game gizmos have to first do this, then that and then the other. You create three task threads, one that does "this", one that does "that" and one that does "the other". You feed the first gizmo to the "this" thread. When it is done, it will feed the gizmo on towards the "that"-thread. When the "that"-thread is done, it will in lastly feed the gizmo on the "the other"-thread.
But once the first thread (the "this" task) is done, it can accept a new gizmo while the "that"-thread munches on the first.
Advantage to this scheme is better memory locality (which seems like it is more important on PS3 that, say, PC) that the divide'n'conquer approach described first. Of course, individual game gizmos may have dependencies in between them, so you need a proper dependency graph to feed gizmos off the right order.
It's doable, as long as you don't think 8 threads have to independently work on completely different tasks at the same time.
(If the OS analogy is flawed, sorry).
There is no "gotcha" because no one should have been under the impression that you could use all 8 cores just like a regular general purpose CPU core in the first place.
Anyone that did have that impression, and was supposed to be developing for the PS3, should be out of a job.
When it comes down to it, any speculation or flaming about how difficult or easy it is to write code for the PS3 is idiotic. It doesn't change the reality of it - difficult or not - and in the end, the games are the only thing that matters.
Advanced users are users too!
A lot of people seem to be approaching the concept of the Cell processor improperly. The chip itself is not designed for the "Design a game in 8 threads" approach people seem to be thinking of. It's designed based on a forman/worker metaphore. The main chip handles the work of figuring out what comes next, the SPE's do the heavy lifting.
...
Don't think
Processor 1 = AI
Processor 2 = Physics
Processor 3 =
etc.
Instead picture the main CPU going through a normal game loop (simplified here)
Step 1: Update positions
Step 2: Check for collisions
Step 3: Perform motion caluclations
Step 4: AI
At the beginning of each step the main CPU farms out the work to the SPE's. So, you have a burst of activity in the SPE's for each step, thun a lull as the main core figures out what to do next.
At the end of the day, people who say "at the end of the day" just REALLY need to stop saying "at the end of the day".
When it comes down to it, any speculation or flaming about how difficult or easy it is to write code for the PS3 is idiotic. It doesn't change the reality of it - difficult or not - and in the end, the games are the only thing that matters.
If you don't think it matters, try writing a game - any game - in assembler. You'll soon realize that how much time you spend dicking around with a stupid interface (API would be too much of a compliment) translates real well to how good your game will be.
Live today, because you never know what tomorrow brings
The ps3 will surely have way more bus contention issues than a PC. While the high level issues of concurrent programming will be comparable to any multi-processor architecture, once you get into the low level details, the similarities will end.
The ps3 will surely have way more bus contention issues than a PC
You do realize that the memory in each SPE is *local* memory, right? The 7 SPEs can all run flat out without creating any bus contention whatsoever on memory access.
And as for the DMA hardware and interconnect buses, those have immense bandwidth. I really wouldn't be concerned about contention on DMA to main memory.
The more likely problem is DMA latency, since small DMA requests may get delayed by longer ones. However, even that is unlikely unless the designers have chopped up the tasking extremely finely and thus made communication time significant compared to processing time, which is never a good idea.
You seem to also be forgetting that linked structures can be prefetched by the DMA in ways that even a lock-in cache cannot. If you have a mesh stored as an array (as most are) then you run into trouble. If you are dealing with something stored in a linked tree structure or a linked list, the SPE will outperform a general purpose CPU.
Most games for windows use D3D. Consoles are still a big business
And most independent games are for Windows.
I agree for the most part, but just a couple points
-There's the joke that goes "Don't buy anything from microsoft until at least the third version.". Direct3D definitely fits into that stigma. The early versions of directx were apparently garbage I think Direct3D v3.0 was the version that Carmack blasted when he opted to use OpenGL. I've read that nowadays he is much happier with the API, and he's even working on an Xbox360 game - which is noteable considering that the PS3 uses OpenGL.
DirectX is a non-portable skill. It ties you to Windows and the X-Box(s). OpenGL "ties" you to the Gamecube, Windows, PS2, PS3, Linux, Macintosh, etc.
-The PS2 and Gamecube have proprietary api's (though Gamecube's GX api is very similar). However, you're point is correct - the reason why Quake 3 was able to target Linux, Windows, and Mac was because 90% of the codebase was ANSI C and OpenGL.
So, OpenGL is still very good, but DX9 is much better than when you grew to hate it(you mentioned retained mode, which is gone as of dx8 I think), and Microsoft has ended up stealing-er, embracing- so many of the OpenGL concepts that learning one at least partially prepares you for the other.
Tell me, at what point does one send a big mesh to the processor when there is a GPU? What kind of real-time process are you running that requires so much computing power that you cannot parallelize the process? Video and Audio? No, they are highly parallelizable. AI and Physics? No, they are not computation intensive relative to multimedia. 3D graphics? Taken care of by the GPU.
Of course it _matters_.
All I'm saying is that spouting off about it on slashdot doesn't change how hard it is, and in the end, even if it is hard - if we still get good games, what does it matter us as end users?
Of course, if we don't get good games because it's too hard, then it does matter, but bitching about it on slashdot still won't fix it.
Advanced users are users too!
I'd bet that controller input won't use much more than 10% of a CPU, so you still have to find other things to do for this CPU..
I think you misspelled DRM
Did you get that thing I sent ya?
Do eastern games programmers get a voice at all or are they all mushrooms? I'm sure that Kojima is too busy writing bad spy fiction or doing interviews and junkets with the gaming press to write any code.
The limiting factor on computing speed in the last several years has not been processor design or clock speed, but memory speed. Normal architectures feature two levels of fast SRAM to insulate the processor from the latencies inherent with accessing DRAM over a shared bus. That doesn't get rid of multi-cycle delays, it just tries to reduce their likelihood. Data cache misses are expensive, but instruction cache misses are even more expensive -- all the pipelining that modern processors use to handle large workloads efficiently will break down every time the processor stalls loading instructions from main memory.
The PS3's Cell processor offers a different solution to the problem -- sub-processors with fast local memory, and an explicitly programmed way to copy memory areas between processors (the "DMA" that the article mentions). The SPEs allow significant chunks of the batch-processing-style parts of a game to run on a processor that has no memory latencies, for data or instructions. Since memory-stall delays can run into the double digits, you can expect the performance increase from fast memory to be in the double digit range too. I've seen a public demo of some medical-imaging software that ran ~50x faster when rewritten for Cell. (The private demos I've seen are similarly impressive, but I can't describe those in detail. :-)
A traditional multi-processing architecture, like the 3 dual-core chips in the X360, has no such escape from the memory latencies. All coordination of memory state between processors (i.e. through the level 2 cache) is done on demand, when a processor suddenly finds it has a need for it. Prefetching is of course possible, but the minor efficiency gains to be made from prefetching (when they can be found at all) is vastly outweighed by the inherent efficiency of explicitly-programmed DMA transfers. Multi-buffering the DMA transfers allows the SPEs to run uninterrupted, without having to wait for the next batch of data to arrive -- something that isn't really possible with a traditional level-2-cache in a traditional multiprocessing system.
In short, the very nontraditional setup of the PS3's Cell chip is capable of vastly outpowering the traditional multiprocessor setup of the X360, mostly due to successfully eliminating memory latency.
Yes, writing code that can run like this is a major freaking pain in the ass. But so what? The biggest reason most code is hard to run on such an architecture is that the code was poorly thought out, poorly designed, and not documented. Any decently-written application can be re-factored to run like this. Besides, this is the future: Cell really does seem to solve the memory latency problem that's crippling traditional computing architectures, and the performance difference is astounding. If you can't rise to the level of code written for such a complex architecture, then your job is in danger of getting outsourced to Third World nations for $5 an hour...as it should be. So quit your whining.
(First post in ten months. Feels good!)
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
actually, they have supposedly planned (at least planned, how many other companies would just say "nope, aint gonna happen") "scaled down" versions for cell phones and such, and scaled up versions for servers and supercomputers...
>The Gamecube had SRAM main memory with a latency of 1 to 2
>cycles (about 5 nanoseconds). It only had 24MB of it, but any
>speed problems you may encounter were not a result of memory
>latency. This is also why even the 1st generation Gamecube games
>ran with silky smooth framerates.
Bullshit. How can you compare cycles for GC (485Mhz) with those of Cell (3.2Ghz)? Besides on this official specsheet (http://www.nintendo.co.jp/ngc/specific/) the access speed of 1T-SRAM is 10ns, not 5ns.
Most games now are still single threaded.
At last years game developer conference both Intel and AMD were saying that games should go multithreaded, that future CPU performance improvements were largely going to come from multiple cores not clock rate. Intel and AMD were both demonstrating current games taking advantage of threading. I forget what the game was but one racing game uses a second thread for optional effects. When running a single thread you get a small amount of dust, smoke, flames, etc. However when running with multiple threads the particle system will use HT or DC to generate additional effects, more/better smoke, dust, etc. The game makes the decision to go single or multi after evaluating the CPU.
The Cell has been available for programming for a while now. I think reference platforms (i.e. other than PS3 prototypes) might even be available. Cell is being used for far more than the PS3. Also, sure the PS3 might run faster than 3.2 GHz, but you make that sound like a bad thing!
Between them, they have 2 MB of high-speed memory, which (as you say) is becoming fairly common for L2 cache sizes, plus it's got a traditional L2 cache. So I'm not sure what you mean by "crippled". There are plenty of computing problems (including video game development) that can fit into this sort of sub-processor/DMA-communication model. Anyone that's programmed a PS2 knows this (and you sound like a video game programmer). The Cell just pushes it further.
There are plenty of tasks that can be run independently with double-buffered batches of data, and not just scientific computing, but the sorts of tasks that are bound to be prevalent in next-generation video games. Physics simulation, whether for gameplay or weather/cloth/fur/etc. effects, can be made parallel & batchable after broadphase collision. Graphics transformation can be, as it is on the PS2.
"Complicated logic" can communicate between processors using ring buffers and short DMA messages. But that's only if the logic is truly complicated...this doesn't apply if the code is complicated because it's the usual not-designed, poorly-thought-out, uncommented, global/singleton-happy, spaghetti code, which is the real problem most of the time. The only thing that's going to hold up the software industry taking advantage of the Cell processor's capabilities is our own collective lameness.
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
You, sir, have hit the main problem square on the head with your last paragraph.
I deal with that type of software on an almost daily basis. Not necessarily with games programming, but in embedded applications I design the hardware for. It seems that no matter how powerful I design the platform, some bonehead seems to cripple it with crap code exactly like you describe.
Thank you!
-dh