The Art of PS3 Programming
The Guardian Gamesblog has a longish piece talking with Volatile Games, developers of the title Possession for the PS3, about what it's like to make a game for Sony's next-gen console. From the article: "At the end of the day it's just a multi-processor architecture. If you can get something running on eight threads of a PC CPU, you can get it running on eight processors on a PS3 - it's not massively different. There is a small 'gotcha' in there though. The main processor can access all the machine's video memory, but each of the seven SPE chips has access only to its own 256k of onboard memory - so if you have, say, a big mesh to process, it'll be necessary to stream it through a small amount of memory - you'd have to DMA it up to your cell chip and then process a little chunk, then DMA the next chunk, so you won't be able to jump around the memory as easily, which I guess you will be able to do on the Xbox 360."
Apparently, the machine's use of Open GL as its graphics API means that anyone who's ever written games for the PC will be intimately familiar with the set-up.
As a programmer, I can attest to OpenGL being a God-send. Not only are programmers intimately familiar with the technology, but it was designed from the beginning with portability in mind. Direct3D, OTOH, tends to follow Microsoft's practices of hiding what's really going on behind the scenes. It's been a little while since I've bothered with Direct3D, but one of Microsoft's biggest features used to be their own SceneGraph known as "Retained Mode". For some reason, Microsoft believed that everyone would want to use their Scenegraph only and damn technological progress. Most programmers who were in the know immediately bypassed this ridiculousness and went straight for the "Immediate Mode" APIs, which weren't as well documented. (Thanks Microsoft)
Wikipedia has a comparison of Direct3D vs. OpenGL here: http://en.wikipedia.org/wiki/Direct3D_vs._OpenGL
Other than that, a computer is a computer, and game programming has always required a strong knowledge of how computers operate. So it's not too surprising that it would be "just like any other programming +/- a few gotchas".
Javascript + Nintendo DSi = DSiCade
No, but it'll run Linux.
"This is considered plagiarism."
Because it's unproven, not *actually* available now and there's no laptop/low power version even planned??
I am a viral sig. Please copy me and help me spread. Thank you.
Keep in mind that all the "extra" cores are special-purpose cores that can only execute code specifically written for them. They are not general-purpose cores so you can run 16 applications simultaneously. Also consider that the CPUs for the new consoles are targeted at consoles and not multitasking operating systems with lots of context switching. There's also the roadmap issue. Sure, this one processor will be available, but what about speed bumps and future generations?
I'm still baffled into how you can efficiently break up a game into 8 threads.
.... woops problem...need critical sections for this to operate with the graphics thread.
.... woops problem, need ritical sections for this to operate with the physics thread..
ok controller input on one..
graphics on another..
physics on a third
networking on a fourth
sound... ok no problems here, thats 5.
See, even dividing it up into 5 threads causes problems, you need to make sure that you are allowed to do something on one processor and if not you must wait on another processor to finish. critical sections are something that can ultimately cause your code to run slower than if it was not multithreaded in the first place.
More info on critical sections, and other issues involved with programming multithreaded apps can be found here
Kent Simon Multitheft Auto
There are other ways to divy up work.
If your intention is to put independent tasks out to different processors, you will run into huge issues like the ones you describe.
Instead, consider the beginning of each logical step in the game loop as a "constriction/delegation" point: You constrict, meaning that only one thread is running right now. Then, say, it's time for particles. You now wake up your eight particle worker threads, divy up the gargantuan 2000 particle emitter loop into 250 emitters each. You then instruct each particle thread to work through the 250 emitters and wait for them all to finish.
Naturally your real performance won't be as if you only had to process 250 emitters, but let's say you lose 50% due to internal synchronization, you've still processed all your particles in 25% of the time.
Another way is to pipeline the tasks: You know that all your game gizmos have to first do this, then that and then the other. You create three task threads, one that does "this", one that does "that" and one that does "the other". You feed the first gizmo to the "this" thread. When it is done, it will feed the gizmo on towards the "that"-thread. When the "that"-thread is done, it will in lastly feed the gizmo on the "the other"-thread.
But once the first thread (the "this" task) is done, it can accept a new gizmo while the "that"-thread munches on the first.
Advantage to this scheme is better memory locality (which seems like it is more important on PS3 that, say, PC) that the divide'n'conquer approach described first. Of course, individual game gizmos may have dependencies in between them, so you need a proper dependency graph to feed gizmos off the right order.
It's doable, as long as you don't think 8 threads have to independently work on completely different tasks at the same time.
(If the OS analogy is flawed, sorry).
"...you'd have to DMA it up to your cell chip and then process a little chunk, then DMA the next chunk, so you won't be able to jump around the memory as easily, which I guess you will be able to do on the Xbox 360."
"I guess"? Thats a HUGE deal. In essence you cripple all of the other cores. Not to mention we have seen articles posted on slashdot before that point out you can't use all 8 cores for a game, but "I guess" that is another topic all together.
"I guess" what I'm getting at is that this guy is obviously bias and doing a fluff piece/ damage control. His choice of words is very telling, and the attempt to downplay a HUGE gotcha makes me skeptical of everything he says.
I can't speak to PS3 development (yet), because none of my clients are working with it... but I look forward to having the opportunity... but this article makes me believe that it really is as bad as others have said.
Artificial Intelligence.
You are wrong. While the SPEs are highly optimised to specific types of tasks, they are certianly capable of general purpose computing.
A lot of people seem to be approaching the concept of the Cell processor improperly. The chip itself is not designed for the "Design a game in 8 threads" approach people seem to be thinking of. It's designed based on a forman/worker metaphore. The main chip handles the work of figuring out what comes next, the SPE's do the heavy lifting.
...
Don't think
Processor 1 = AI
Processor 2 = Physics
Processor 3 =
etc.
Instead picture the main CPU going through a normal game loop (simplified here)
Step 1: Update positions
Step 2: Check for collisions
Step 3: Perform motion caluclations
Step 4: AI
At the beginning of each step the main CPU farms out the work to the SPE's. So, you have a burst of activity in the SPE's for each step, thun a lull as the main core figures out what to do next.
You can make good ripples on water, and do other geometry things. Make the trees have REAL leaves (like that great Cell demo). Make more individual blades of grass and such. Just little things that act correct so the world looks more "real" and less "here is a random bush so you don't notice there are no bushes".
Hair, clothes, weapons. Make them act more realistically when walking, etc. Why is it every game seems to have a walk cycle or some such where parts of a character's model (weapon, hair, clothes, etc) move in and out of another part (intersecting). With your free CPU time, you can test this and make sure the weapon doesn't SLICE MY CHARACTER'S THIGH as I walk.
Other little things like this. Add more rabbits, ants, birds, squirrels, whatever around the world. Make more NPCs who walk around town. Whatever. Just make things more alive.
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
At the end of the day, people who say "at the end of the day" just REALLY need to stop saying "at the end of the day".
The ps3 will surely have way more bus contention issues than a PC. While the high level issues of concurrent programming will be comparable to any multi-processor architecture, once you get into the low level details, the similarities will end.
The PS2 was the same way. Sony provided a decently powerful graphics subsystem but they crippled it by only providing 4MB of RAM. But hey, they can put "eight cores" in the marketing brochures can't they?
The ps3 will surely have way more bus contention issues than a PC
You do realize that the memory in each SPE is *local* memory, right? The 7 SPEs can all run flat out without creating any bus contention whatsoever on memory access.
And as for the DMA hardware and interconnect buses, those have immense bandwidth. I really wouldn't be concerned about contention on DMA to main memory.
The more likely problem is DMA latency, since small DMA requests may get delayed by longer ones. However, even that is unlikely unless the designers have chopped up the tasking extremely finely and thus made communication time significant compared to processing time, which is never a good idea.
Most games for windows use D3D. Consoles are still a big business
And most independent games are for Windows.
I agree for the most part, but just a couple points
-There's the joke that goes "Don't buy anything from microsoft until at least the third version.". Direct3D definitely fits into that stigma. The early versions of directx were apparently garbage I think Direct3D v3.0 was the version that Carmack blasted when he opted to use OpenGL. I've read that nowadays he is much happier with the API, and he's even working on an Xbox360 game - which is noteable considering that the PS3 uses OpenGL.
DirectX is a non-portable skill. It ties you to Windows and the X-Box(s). OpenGL "ties" you to the Gamecube, Windows, PS2, PS3, Linux, Macintosh, etc.
-The PS2 and Gamecube have proprietary api's (though Gamecube's GX api is very similar). However, you're point is correct - the reason why Quake 3 was able to target Linux, Windows, and Mac was because 90% of the codebase was ANSI C and OpenGL.
So, OpenGL is still very good, but DX9 is much better than when you grew to hate it(you mentioned retained mode, which is gone as of dx8 I think), and Microsoft has ended up stealing-er, embracing- so many of the OpenGL concepts that learning one at least partially prepares you for the other.
... with a 256K working set.
Yeah, right.
Good going, Sony. You did a heckuva job on this CPU.
I'd bet that controller input won't use much more than 10% of a CPU, so you still have to find other things to do for this CPU..
I think you misspelled DRM
Did you get that thing I sent ya?
I don't get it. Why do you stress over it?
Cell is not god. it's not supposed to be able to do anything that's thrown to them.
Just put some more code in and more optimisation, voila, you got a faster, running game.
And that's not even a big deal. Programming isn't even the biggest part of game(or program) development cycle.
I mean, you don't see Hideo Kojima Whining over it, do you?
Even at Revolution Controllers, While eastern developers are full of excitement, western developers are full of doubt or criticism.
Is the western developing community geared towards getting jobs done, in opposition to Eastern's creatively programming?
Do eastern games programmers get a voice at all or are they all mushrooms? I'm sure that Kojima is too busy writing bad spy fiction or doing interviews and junkets with the gaming press to write any code.
20 minutes of HD-TV footage takes up around 4.7GB, so an Xbox 360 game would quickly run out of space.
That must be a bit inaccurate, 1080i mpeg2 of The Incredibles is only nearly 7gig or something- considering video on a game disc will probably be cg too, and they could use mpeg4 and without interlacing, they are gona be able to fit a lot more on an dual sided ~8-9gig dvd that 20minutes
The limiting factor on computing speed in the last several years has not been processor design or clock speed, but memory speed. Normal architectures feature two levels of fast SRAM to insulate the processor from the latencies inherent with accessing DRAM over a shared bus. That doesn't get rid of multi-cycle delays, it just tries to reduce their likelihood. Data cache misses are expensive, but instruction cache misses are even more expensive -- all the pipelining that modern processors use to handle large workloads efficiently will break down every time the processor stalls loading instructions from main memory.
The PS3's Cell processor offers a different solution to the problem -- sub-processors with fast local memory, and an explicitly programmed way to copy memory areas between processors (the "DMA" that the article mentions). The SPEs allow significant chunks of the batch-processing-style parts of a game to run on a processor that has no memory latencies, for data or instructions. Since memory-stall delays can run into the double digits, you can expect the performance increase from fast memory to be in the double digit range too. I've seen a public demo of some medical-imaging software that ran ~50x faster when rewritten for Cell. (The private demos I've seen are similarly impressive, but I can't describe those in detail. :-)
A traditional multi-processing architecture, like the 3 dual-core chips in the X360, has no such escape from the memory latencies. All coordination of memory state between processors (i.e. through the level 2 cache) is done on demand, when a processor suddenly finds it has a need for it. Prefetching is of course possible, but the minor efficiency gains to be made from prefetching (when they can be found at all) is vastly outweighed by the inherent efficiency of explicitly-programmed DMA transfers. Multi-buffering the DMA transfers allows the SPEs to run uninterrupted, without having to wait for the next batch of data to arrive -- something that isn't really possible with a traditional level-2-cache in a traditional multiprocessing system.
In short, the very nontraditional setup of the PS3's Cell chip is capable of vastly outpowering the traditional multiprocessor setup of the X360, mostly due to successfully eliminating memory latency.
Yes, writing code that can run like this is a major freaking pain in the ass. But so what? The biggest reason most code is hard to run on such an architecture is that the code was poorly thought out, poorly designed, and not documented. Any decently-written application can be re-factored to run like this. Besides, this is the future: Cell really does seem to solve the memory latency problem that's crippling traditional computing architectures, and the performance difference is astounding. If you can't rise to the level of code written for such a complex architecture, then your job is in danger of getting outsourced to Third World nations for $5 an hour...as it should be. So quit your whining.
(First post in ten months. Feels good!)
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
actually, they have supposedly planned (at least planned, how many other companies would just say "nope, aint gonna happen") "scaled down" versions for cell phones and such, and scaled up versions for servers and supercomputers...
Allow me to state the obvious:
(1) The PS3 has not shipped yet.
(2) There is no final PS3 hardware that runs at full speed yet.
Normal architectures feature two levels of fast SRAM to insulate the processor from the latencies inherent with accessing DRAM over a shared bus.
Yeah, my current CPU has 1MB of L2 cache (a several year old hyperthreading Pentium 4). 2MB is starting to become fairly common in the new models.
The PS3's Cell processor offers a different solution to the problem -- sub-processors with fast local memory
Err.. each sub-processor has 256k. I really don't see how that's an advantage, especially when those sub-processors are functionally crippled.
the very nontraditional setup of the PS3's Cell chip is capable of vastly outpowering the traditional multiprocessor setup of the X360, mostly due to successfully eliminating memory latency.
O RLY. Operation phrase: "is capable of". Congratulations doing finite element analysis, non-interactive scientific computing - and rendering animations. But it'll suck for running complicated logic - particularly if that logic has to interact with the logic running on other subprocessors.
Be prepared for another round of Sony games that look absolutely amazing in the cutscenes.. but then your character will only be able to walk left, walk right, shoot - and trigger another cutscene. Joy.
Cell really does seem to solve the memory latency problem that's crippling traditional computing architectures
The Gamecube had SRAM main memory with a latency of 1 to 2 cycles (about 5 nanoseconds). It only had 24MB of it, but any speed problems you may encounter were not a result of memory latency. This is also why even the 1st generation Gamecube games ran with silky smooth framerates.
>The Gamecube had SRAM main memory with a latency of 1 to 2
>cycles (about 5 nanoseconds). It only had 24MB of it, but any
>speed problems you may encounter were not a result of memory
>latency. This is also why even the 1st generation Gamecube games
>ran with silky smooth framerates.
Bullshit. How can you compare cycles for GC (485Mhz) with those of Cell (3.2Ghz)? Besides on this official specsheet (http://www.nintendo.co.jp/ngc/specific/) the access speed of 1T-SRAM is 10ns, not 5ns.
Most games now are still single threaded.
At last years game developer conference both Intel and AMD were saying that games should go multithreaded, that future CPU performance improvements were largely going to come from multiple cores not clock rate. Intel and AMD were both demonstrating current games taking advantage of threading. I forget what the game was but one racing game uses a second thread for optional effects. When running a single thread you get a small amount of dust, smoke, flames, etc. However when running with multiple threads the particle system will use HT or DC to generate additional effects, more/better smoke, dust, etc. The game makes the decision to go single or multi after evaluating the CPU.
The Cell has been available for programming for a while now. I think reference platforms (i.e. other than PS3 prototypes) might even be available. Cell is being used for far more than the PS3. Also, sure the PS3 might run faster than 3.2 GHz, but you make that sound like a bad thing!
Between them, they have 2 MB of high-speed memory, which (as you say) is becoming fairly common for L2 cache sizes, plus it's got a traditional L2 cache. So I'm not sure what you mean by "crippled". There are plenty of computing problems (including video game development) that can fit into this sort of sub-processor/DMA-communication model. Anyone that's programmed a PS2 knows this (and you sound like a video game programmer). The Cell just pushes it further.
There are plenty of tasks that can be run independently with double-buffered batches of data, and not just scientific computing, but the sorts of tasks that are bound to be prevalent in next-generation video games. Physics simulation, whether for gameplay or weather/cloth/fur/etc. effects, can be made parallel & batchable after broadphase collision. Graphics transformation can be, as it is on the PS2.
"Complicated logic" can communicate between processors using ring buffers and short DMA messages. But that's only if the logic is truly complicated...this doesn't apply if the code is complicated because it's the usual not-designed, poorly-thought-out, uncommented, global/singleton-happy, spaghetti code, which is the real problem most of the time. The only thing that's going to hold up the software industry taking advantage of the Cell processor's capabilities is our own collective lameness.
"Once we've identified and embraced our sickness, we'll have strength...and that's when we get dangerous." - John Waters
You, sir, have hit the main problem square on the head with your last paragraph.
I deal with that type of software on an almost daily basis. Not necessarily with games programming, but in embedded applications I design the hardware for. It seems that no matter how powerful I design the platform, some bonehead seems to cripple it with crap code exactly like you describe.
Thank you!
-dh