IEEE Spectrum On The PS3 Learning Curve
An anonymous reader writes "The Insomniacs is the cover article in the December issue of IEEE Spectrum, discussing developers ramping up to the PS3 hardware. The article features Insomniac Games, who developed the PS3 launch title Resistance: Fall of Man. Despite mixed reports in the press, the Insomniac folks are delighted to be working with Sony's technology, and describe the process of helping to make or break a console launch." From the article: "Despite the delays, there's something inside the PS3 that burnished Sony's reputation as a hardware company. The heart of the machine is the powerful new Cell Broadband Engine microprocessor. Developed over the last five years by Sony, IBM, and Toshiba on a reported budget of $400 million, the Cell is not just another chip: it is a giant leap beyond the current generation of computer processors into a nextgen muscle machine optimized for multimedia tasks."
"...the Cell is not just another chip: it is a giant leap beyond the current generation of computer processors into a nextgen muscle machine optimized for multimedia tasks."
Anyone else react the same way I did?
Fox News is now spinning CPU development?
Living With a Nerd
See any serious problems with this story? Email our on-duty editor.
Yeah, I sure do - it's about good news for the PS3.
That can't be right.
Reality has a conservative bias: it conserves mass, energy, momentum...
I think we all should just wait and see what this cell CPU is capable of, just like the "emotion engine" it was laughed at but did fairly well (imagine for example "Hitman: blood money" coming out when the Ps2 was released :)). Also don't expect games from (EA, Ubisoft) to be using all of playstation3's power as they will probably write it in a generic way so it can be ported around from crapbox360 to playstations3 and visa versa.
The only thing I've seen for the PS3 that looks even remotely impressive is White Knight, and I won't believe that would be impossible on the 360 for a second. If the PS3 has any edge at all over the 360, its the one year newer Nvidia graphics chip. The PS3 MAY have more memory speed than the 360, but the 360 has twice the capacity (512 vs. 256), so I'd say that's a wash, especially since I'm not sure how the RAMBUS components in the PS3 compare latency wise to the GDDR3 in the 360. I'm sick of the Sony Hype Machine. Of course I'm also tired of the Final Fantasy Fanboys that keep Sony alive in the console market. Both Sony and Squeenix need to be taken down a notch or two.
...even the staunchest fanboy must admit after reading that, that the PS3 has about 15 tons of potential. Can you imagine the level of AI and graphical detail you could suck out of that chip! Tough as it may be to develop for right now, I can't imagine any developer who isn't drooling over the possibilities.
The burning question is will enough developers take the challenge and endure the headaches to reap the rewards when two much simpler boxes are out there albeit with less potential.
I for one hope folks step up to the plate.
Back seven years ago I remember whiteboard discussions with other engineers when we first started work on the first PS2 devkits about where we hoped Sony would take the amazing technology we now had access to with the PS2 hardware. Cell is in essence exactly what we wanted to see Sony take the PS2 design philosophy.
As game developers we spend a huge amount of our time 1) organizing data 2) feeding that data to someplace to operate on it 3) sending that data back to step one to repeat the process
Cell's design makes our lives vastly simpler. It is an absolute dream to work with.
The insanely high floating point power is what is talked about most with the Broadband Engine, but it is the memory architecture that is the best part of the architecture. The internal ring bus allows us to write code that hide memory latency.
Writing for Cell is extremely straightforward. You have each SPU setup to operate on three regions of internal memory: 1) Static data 2&3) doubled buffer of dynamic data. Data is being fed into one buffer while the SPU operates on the other. With this setup optimal Cell code has all available SPUs plowing through data with very little latency from the memory subsystem.
In many ways it is very similar to writing old style code where you got your data into the chips cache, operated on it, and then wrote that data back out to main memory or somewhere else. But with Cell you now have total control of how the data is loaded into your cache due to the SPU ability to scatter DMA into local memory, and you have the internal ring bus to pass data around to other SPUs instead of having to go out to slow main memory, and of course you have 6-8(depending on the hardware you are using) SPUs all running in parallel.
It is wonderful that every PS3 is setup to easily allow install Linux and have access to the Cell devkit. There is a wonderful world beyond the archaic x86 architecture just waiting for you.
SIMD is easy compared to working with the cell.
I am sure that eventually their will be some tool-kits that will optimize some common tasks for the SPEs but will not easy.
My guess is that it will be a few years before any game really uses the Cell to the max.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
" article features Insomniac Games, who developed the PS3 launch title Resistance: Fall of Man."
...
Which is a game that is published by Sony developed by a company that is owned by Sony
What's next "Bungie, the Developers of the XBox 360's highly anticipated shooter Halo 3, have announced that the XBox 360 is Super Powerful and that Sony Rapes Babies!"
I want to hear from EA, Ubisoft, Activision and Sega (ie. companies which have little interest in the platform) on which is easy/hard to develop for; so far EA has said that next-gen development is insanely expensive.
I entered:
:D
"the Cell is not just another chip: it is a giant leap beyond the current generation of computer processors into a nextgen muscle machine optimized for multimedia tasks."
and the fish sayeth:
"Cell is not just another chip, it is a big buzzword buzzword buzzword processor buzzword buzzword buzzword buzzword."
So it's not just another chip...it's a processor!
Karma: Chameleon (mostly due to the fact that you come and go).
This is the real problem with the Article. Has nothing to do with the Cell per se, just that we're getting PR crap instead of engineering information. Ask someone whose salary is not tied to the success of the platform.
As game developers we spend a huge amount of our time 1) organizing data 2) feeding that data to someplace to operate on it 3) sending that data back to step one to repeat the process
I assume that most of this "operating on it" involves floating point operations on triangles.
And I suppose that most of the results are essentially triplets of eight bit RGB [red, green, blue] values [i.e. your results are expressed in 24-bit color], and I assume that you rarely venture much beyond screen sizes of about 1600 X 1200 pixels, or refresh rates much greater than about 60 frames per second.
Now current instantiations of the Cell processor can only perform 32-bit [single precision] floating point operations in hardware; as I understand it, 64-bit [double precision] floating point operations suffer an enormous performance penalty by contrast.
But 32-bit [single precision] floating point numbers are notoriously inaccurate; for instance, they begin to lose integer granularity as early as 16 million [2^24].
So here's my question: Have you seen any instances where 32-bit [single precision] floating point number rounding error caused unacceptable inaccuracies?
Any of your triangles come out blurry, or mis-colored, or mis-placed, or mis-aligned, simply because 32-bit floating point calculations were insufficiently exact for, say, 24 bits of color, 1600 x 1200 pixels, and 60 frames per second?
as I understand it, 64-bit [double precision] floating point operations suffer an enormous performance penalty by contrast.
Well, it's not exactly an enormous penalty. CBE cannot pipeline double-precision math instructions, which have a throughput of 7 cycles. Additionally, you can only store 2 DP floats per register instead of 4 SP, so your double performance is effectively 1/14 of single. However, many things can prevent you from peaking SP performance in the first place, such as instruction-level latency stall due to dependency, and memory-level latency due to unpredictable memory-access or being DMA-bound, branch penalties, load/store latency, etc. DP is certainly slower, but if that's your biggest bottleneck then you are already "winning".
But 32-bit [single precision] floating point numbers are notoriously inaccurate; for instance, they begin to lose integer granularity as early as 16 million [2^24]. So here's my question: Have you seen any instances where 32-bit [single precision] floating point number rounding error caused unacceptable inaccuracies?
Constantly. One of the keys to using single precision is to never rely on precision. In fact, even within the domain of SP precision some crucial instructions are approximations: reciprocal, reciprocal square root. Even operations like mul+add vs madd differ in the result. Everyone knows that if you actually try to dot two "normalized" vectors and take the acosf of the result, you're heading to NaN in a hurry. Fortunately, there are no FP exceptions so you can pretty much just do your math and then mask away bogus results at the end.
This doesn't really apply to rendering though. Mostly collision detection and resolution are subject to the vagaries of the precision gods ("If something ever can go wrong, it will.") The other edge of the precision sword is that you don't pay for what you don't need. Particles, for instance, go through tons of sin, cos, and quaternion operations where no more than 8 or 9 bits of precision are needed... and that's all the polynomial they get.
They can, but then anyone making a game with maps large enough to cause such issues commonly split up the map into chunks with their own co-ordinate space and re-centre the global co-ordinate space's origin to the origin of each chunk as the camera moves into it.
Animating objects like characters and such have all of their calculations performed in their local co-ordinate space before the result is transformed into world space.
Most also use the scale of 1.0f = 1M, so you'll be going on for a few KM's before precision becomes much of an issue.
So overall, it's hardly an issue.
I realize it's a really big [as in REALLY BIG] subject, but do you know of any books that treat this sort of thing very well?
Also, because the Cell can perform so many single-precision floating point operations in parallel, do you know of any good texts which concentrate on the theory of the parallelization of common floating point algorithms [or, better yet, on the provability of the NON-existence of parallelization of floating point algorithms]?