Sony Says Nobody Will Ever Use All the Power of a PS3
Tighthead Prop writes "Sony executive Phil Harrison has made some brash comments about the Cell processor and the PlayStation 3. Harrison says that the current PS3 game lineup is using less than half of the machines power, adding that 'nobody will ever use 100 percent of its capacity.' Is he right? 'The major reason Harrison wants to hype up the "unlimited" potential of the PS3's architecture is to downplay comparisons between games running on Sony's console and Microsoft's Xbox 360. The two systems are not completely dissimilar: they both contain a PowerPC core running at 3.2 GHz, both have similarly-clocked GPUs, and both come with 512 MB of RAM.'"
Good point, except this time the guy is actually on record as saying it. Bill Gates never said that infamous quote that is often attributed to him.
Nice work Anonymous Coward, two small problems. You've obviously never heard of the "dual layer DVD", something which has been in common use for a very long time. It has 8.5GB storage capacity. You've also obviously managed to avoid every single article, of the hundreds out there, which all point out 1 thing. The cell does not have 8 cores. It has 1 core and 7 SPEs. The Xbox 360 on the other hand has 3 cores. I take it you're looking forward to your "real time rendered, Toy-story quality graphics" on your PS3 just like you were when the PS2 came out? Get off my internet.
Consoles are never that impressive, when compared to actual computers...Computers are general purpose tools, and their architecture reflects this.
Console systems, on the other hand, are engineered for a very tight, very specific, set of tasks. This is why a console with comparatively crappy stats can walk all over a much beefier computer, and vice versa.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
The PS2-PS3 generation was six years (Oct 2000 - Nov 2006). If you count the Dreamcast, the last-gen started in Sept 99 and ended in Nov 05 with the 360 - still six years. The NES came out in October of 85, the SNES in August of 91 - less than six years. The N64 came out Sep 96, the Game Cube in Nov. 01 - a little over five years, and five years again until the Wii. The console generations are as long as they've ever been. There's more games available for the PS and PS2 than any other console. And if you're wary about buying crappy accessories, those have always been around. ROB the Robot, Super Scope Six, The SNES mouse, the N64 and Dreamcast Microphones (at least they came with the game), the Dreamcast's fishing controller, DDR mats, Guitar Hero, etc. Nothing is different, except now with the Wii game developers will move gimmick development over to the system that has all those capabilities built in so less money is wasted on 1-game peripherals.
Yes and no. The PS3 does use a new architecture, but there is literally a PS2 emotion engine chip in every PS3 to "emulate" PS2 functionality. I'm not sure we can really call it emulation when it's the original chip just doing the same thing it did before.
Thunderclone: ONE MAN ENTERS! TWO MEN LEAVE! ONE MAN ENTERS! TWO MEN LEAVE!
How the parent got modded so high is baffling. Ubisoft has NEVER said the AI in the 360 will be more intelligent than the PS3. Jade Raymond said that the XBox360 has "improved threading" during X06, but no where did she say what it was compared to. It was clearly FUD that Microsoft got Ubisoft to spread.
And how such a false statement of saying the PS3 will be limited to 256MB of video RAM has been modded as Interesting on Slashdot is absurd. Look at the top level diagram. The RSX can access an additional 256MB of XDR through the Cell. The RSX was designed to work with the Cell, that is why it is different than the conventional console hardware setup.
It's hype all over again, for sure. Every company does it, but it looks like you are being lead into believing the Microsoft FUD-hype instead.
4K of memory? Luxury! The Atari 2600 had only 128 bytes of memory! You're thinking of the 4K of ROM in the cartridge.
Javascript + Nintendo DSi = DSiCade
Probably because you are stupid. The specs have been out for a nearly a year now. The 360 has the exact same IBM powerpc core processor, just 3x as many of them as the PS3. The vector units are too brain dead for AI and have to be chained together to use their full potential, so basically you have a quick matrix transform, vs 3x as much cpu power, and a video card 1.5x as powerful.
I'm not a fanboy, I am a game graphics programmer. (but yes perhaps I am a little irritated over the difficulty level as well)
Regards,
----- 70% of all statistics are completely made up.
I think I should point out that the Atari Jaguar had at least that many titles in their "under development" list at one point. We see how many of those got released...
Just because there's a list of upcoming games doesn't mean that they're all going to be released.
"You know your god is man-made when he hates all the same people you do."
That's amazing really. I agree with your sentiment, but it's amazing how you went from using the release of a console from one manufacturer to the release of a console from a different manufacturer then compared that timeframe to the release times of consoles from the same manufacturer.
Dreamcast doesn't count here because they will never have a next generation console. Playstation 1 came out in September of 95 (in America) and the Playstation 2 came out in October of 2000 (also in America). That's only 5 years. The Playstation 3 came out in November 2006 so that's 6 years. So yeah, judging by Sony's consoles, the generations are getting longer.
But Microsoft shows another story. The Xbox was released in November of 2001 while the Xbox360 was released Novemer 2005 so there's only 4 years there.
The NES came out in October of 85 (in America), the SNES in August of 91, nearly 6 years later. The N64 was released in September 96, 5 years after the SNES. The Gamecube was released 5 years later in November of 2001, and the Wii was released 5 years after that in November of 2006.
Looks to me that of the major players right now 5 years seems to be a fairly average number with Microsoft barely reaching 4 years and both Nintendo and Sony having a 6 year period in there somewhere.
Stop Global Warming!
Just say no to irreversible processes!
[PS3 Hardware Summary]
You can't really directly compare a racing game with, well, anything else. Racing games need zero deformable objects - everything is rigid but the eye candy - and they have a very predictable path. If the view changes rapidly you're probably spinning in circles so the frame rate doesn't matter so much; but in normal use the view is very predictable so it's a lot easier to predict what you will have to draw.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Wow, school time!
The 360 does not have a "2 core cell' it has a 3 core PowerPC.
The PS3's Has 1 core and 7 SPE's, 1 SPE is reserved for the OS, and Sony tells developeres to only use 5 of the remaining 6.
The 360 has more *useable* RAM than the PS3 and from what iv'e read also has a superior GPU.
As far as disc space, 360 games are on dual-layer DVD which is 8.5GB, not 4.7GB. And as long as games like 'Gears of War' and 'Elder Scrolls IV' are fitting on a single disc the Blu-Ray argument holds no water. And worse case scenario...2 disc game! Oh n0's!
Sony has convinced you that you *need* blu-ray..and it's just not true.
Did I mention the 360 can be between $100 and $300 cheaper than a PS3 (depending on configurations for both)? And that it has games out, like, right now? And that you can go into a store and buy one no problem right now?
- "Scientia non habet inimicum nisp ignorantem"
From a modern hardware perspective you never use ALL of a systems power at the same time but that does not mean you can replace any one component without lowering overall performance. All systems have at least one bottleneck, but most games encounter more than one, so you may be limited by the CPU, System bus, and then GPU. Which means beefing up any one component would not be worth it without beefing up several.
Think of it this way replacing 4mb L2 cash with 4 GB L2 cache would speed up most games, however spending that money on several components would be a better use for that same cash. The PS3 is designed to be flexible so you can use the cell to speed up rendering or AI as needed But that flexibility comes at the price of complexity, thus first gen games are using ~50% of the systems capabilities. However games will probably never use more than 80-90% of the systems resources at the same time so the graphics will get better they will not become twice as good.
PS: 3 games may all use 90% of the systems capabilities, but they will probably not use the same 90%.
No, most AIs are just not suitable for stream processors like the Cell. They need a general purpose processor with efficient branching. Xbox 360 has three of those, PS3 has one.
There is a good Mark Twain quote: "It is better to keep your mouth closed and let people think you are a fool than to open it and remove all doubt." The "Cell" processors are more like secondary helper processors, the main CPU in the PS3 is indeed a PowerPC. As for the 'Vector Units too brain dead for AI'?... From my very limited knowledge of the Cell processor I can tell you it is not a general purpose CPU and therefore not very useful for doing things like AI, etc. It is good at crunching numbers in a very specific domain (such as matrix calculations).
Depends on what you mean by AI. For instance a path-finding algorithm should absolutely fly on the Cell. One way to do this is to divide the area into a grid, mark the start, and at each point label the best path from already visited points. A single Cell should be able to do this orders-of-magnitude faster than even a dedicated PPC chip (I'm guessing at least 100x faster).
Generally any dynamic-programming ie ground-up algorithm should work very, very fast on the Cell. It's just a matter of, once somebody writes a path-finding code for a cell then everybody starts using it and then games get much faster AI.
FYI 4ghz Cell is at 256 glops peak vs ppc at 8 peak (both single precision), but cache misses never happen on the cell and often waste cycles on PPC (3 for L1, 9 for L2, ~40 for L3).
The PS3 also has a GPU of comparable power to the Xbox360, neither of which runs at 4ghz. Only the cores for the xb360 runs at that speed.
For some reason, most people pit "GPU+3 Core Cpu" up against the Cell alone, when the Cell is also paired up with a GPU too!
And that you can go into a store and buy one no problem right now?
:)
The biggest unreported story of this console generation is that you can, more and more, "go into a store and buy a PS3 no problem right now".
I've been offered 3 this past week alone when asking for a Wii. Yes, I do far too much xmas shopping
Endless arguments over trivial contradictions in books written by ignorant savages to explain thunder in the dark.
Well, it seems like it'd be more a bottleneck for the PS3 and Xbox360 than for a lot of machines. I look at the CPU-speed/GPU-speed to RAM ratio on most desktops, and 512MB is just enough for the GPU, with another 1GB to 2GB sitting out there for the CPU. When compared to 3 x 3.2GHz PPC (Xbox360) or 3.2GHz PPC + 8 SPEs (PS3's Cell Broadband Engine), even a current AMD 4x4 system (4 Althon 64s) or a Core 2 Duo system has a run for its money in processing performance. So the ratio of compute to memory is quite a bit off compared to desktop boxes. Granted, the PS3 and Xobx360 don't have all the other miscellany running in the background that a desktop has, but is it really that big of a difference?
Granted, consoles have traditionally gotten by with much less RAM than their desktop counterparts. This was especially true in the cartridge days, where the entire game image lived in ROM, but it seems like it should be less so in the era of optical-media based devices.
About the only way I can see using up all those MIPS is to enable advanced physics and simulation in the game, and enable extra rendering passes to spiffy-up the images. Now that we have a larger deployment of HD-capable displays, spending the MIPS on rendering I guess makes sense. But where are you going to put all the additional textures and data required if you don't have enough RAM? You certainly aren't going to aggressively page it from optical media.
Unless a game specifically targets a console and doesn't bother targeting a desktop in tandem, I can't see the developer getting too excited about developing advanced engines that soak the console CPUs with physics/simulation and coding a cut-down version that keeps up on the desktop. That'd make the game behave noticeably differently on the two platforms. So, we're left with graphics enhancements which only change the quality of the visible output of the game, not the gameplay itself. So, until the desktop platforms get into the same raw-compute territory as the consoles, it's very easy to imagine many of those console MIPS will be left on the table or just spent on polishing the graphics output.
Now to those of you who say "It isn't pushed to its limits unless you're always using 100% of the CPU." Pshaw. I would say a system is pushed to its limits when no one thing is the sole bottleneck all the time, the overall playability of the game doesn't suffer for it, and increasing the depth of any given element would cause the game to lag or misbehave in such a way that playability or enjoyability does suffer. The notion that you have to use every byte of RAM, fill every sector of the disc and use every issue slot on every cycle of the CPU to say you're at 100% is a silly one. It might've made sense when games were measured in kilobytes, RAM was measured in bytes and CPU was measured in kHz or MHz, but not in the modern era.
--JoeProgram Intellivision!
Actually the upcoming Blue Dragon on the Xbox 360 comes
on 3 DVDs. Which IMO actually suggests the inadequecies of
the DVD-DL format..
The fact that a 3 DVD game has already been released should
be a suggestion of things to come.
I wonder if we'll see games spanning 5 to 8 DVDs nearer to the
end of the lifespan of the 360, or whether they'll start
offering the 8 DVDs or 2 HD-DVDs option later down the road.
Personally I think its a pain in the ass to keep 3 disks
in pristine condition and/or swap them in and out.
Then again, its not like I'm planning to buy one of these "next gen"
consoles in the near future.
I could be snarky, but I'll just say this: Please, be honest, or leave. As a Sony employee, you could probably offer a lot of insight in these discussions. But you aren't - you're astroturfing. Please go away.
As far as optimizing for the memory system using prefetches and streamed processing et al., that's the future of performance coding. There's no avoiding these techniques as the gap between memory speed and processor speed looks destined to only get worse. It's a space in which the compiler really can't do much to help you; your algorithm design has to take into account how much slower memory is than compute, and either be able to set up its data transfers long in advance (as in streaming computation), or have something else to do while it waits (as in context switching).
I think you're confusing bottlenecks, but it's easy to do. At the very least, I may not have been entirely clear about which bottleneck I consider most important.
First, a short primer on how the SPE works. You can find a more in-depth explanation in Al Eichenberger's paper on IBM's site.
The SPEs have flat memory and software managed paging to help hide the latency of starting a new task on an SPE. A separate DMA controller brings code to the SPE's local memory, ideally well ahead of when it is needed. I think you're confusing the SPE's prefetch instruction with a traditional cache prefetch. The SPE uses a single high speed memory port to fetch instructions and data, and I'm pretty sure each can only access its local memory store. The SPE's fetch pipeline can hold 2.5 "fetch packets" of instructions, each packet containing 32 instructions. That prefetch amounts to 80 instructions, or 40 to 80 cycles of execution capability. (The SPE vector architecture can issue 1 or 2 vector instructions per cycle, and that's it.) Also, IIRC, branches can re-hit in this buffer, allowing tight loops to execute entirely from the prefetch buffer structure. This is entirely reasonable.
Yes, the ratio of compute power to memory bandwidth has increased enormously, but in the meantime, the amount of work the CPU does in each byte of memory has also increased noticeably. Furthermore, most interesting workloads have either good locality, or good access predictablilty. If that weren't true, then we wouldn't see noticeable gains on many workloads as CPUs got faster. Instead, we'd build ever wider memory interfaces to try to keep up. Indeed, memory interfaces have grown from 8 bits and 16 bits to now 128 bit and 256 bit. (A dual Opteron system with RAM populated on each memory port has a 256-bit wide memory interface, effectively.)
For graphics workloads, the access pattern ranges from moderately to very highly predictable. Hence the prevalence of specialized DMA engines and/or data prefetch instructions in many programmable graphics engines, including the Cell Broadband Engine. The PowerPC Altivec instruction set defines a set of streaming prefetch instructions for the same purpose. So, both PS3 and Xbox360 have well defined, well understood and effective ways to hide memory latency and to make the most of the bandwidth they have.
The RAM bottleneck I was referring to does not concern bandwidth or latency (though both are certainly an issue). It has more to do with working set. As scenes get more complex, it takes larger numbers of textures, vertices and everything else. (I hesitate to say "triangles," because they're not the only primitive you might deign to render.) Keeping all that render state in addition to world state and program code now becomes the challenge. Now the PS3 has a leg up here: The Xbox360 may not have a hard-drive, whereas the PS3 always has at least some HD. Paging textures and world data from optical media is tremendously painful. At least the PS3 can use its HD to page some of its state. Sure, hard drives are much slower than main memory, but optical media is much, much, MUCH slower than that. Think 10s of milliseconds vs. 100s to 1000s of milliseconds, depending on how much seeking you end up doing.
The more you can keep in RAM, the richer the world you can build, and the less you need to hit the spinning media. That's the bottleneck I was referring to.
--JoeProgram Intellivision!
I'm missing a sentence I needed to fully make my point, without being apparently contradictory. I said: Indeed, memory interfaces have grown from 8 bits and 16 bits to now 128 bit and 256 bit. (A dual Opteron system with RAM populated on each memory port has a 256-bit wide memory interface, effectively.) Add after that: However, to keep pace with the phenomenal growth in CPU performance we've seen, they'd easily need to be 10x that width, depending on how you measure things.
This issue deserves greater exploration. (Warning: Long winded ramblings below, intended to give background to a wider audience. I'm a CPU architect by trade, and would like to educate while keeping the discussion accessible.) It gets tricky to measure available memory interface bandwidth, because caches distort the bandwidth requirements on the memory interface, and latency throttles the rate at which an unmodified program can make memory system requests.
Consider a hypothetical system at the turn of the 80s, running at about 0.3 MIPS (1MHz 6502), with a memory interface capable of 8Mbit/s. (1MHz x 8-bit bus.) This is a memory bandwidth to compute ratio of 27:1. And those are 8-bit MIPS. The 32-bit MIPS are probably 1/3rd to 1/4th that or worse. The compute engine is the bottleneck, and all requests complete with essentially no latency. CPU asserts the address, and the RAM asserts the data on the next cycle.
Now consider a hypothetical top of the line CPU of today, with 128-bit vector instructions and multiple integer units. If you could keep all the units fed, depending on the CPU, you can issue 8 to 16 32-bit operations (not instructions, mind you) per cycle. Assume the fastest case. At 3GHz, that amounts to 48,000 32-bit MIPS. Meanwhile, suppose the memory interface on that same CPU has grown to 128-bit x 1GHz. That's a total bandwidth of 128Gbit/s.
Compute performance has grown by a factor of 480,000 or more on 32-bit code. (Well, less on code that only needed 8 bits, but you could always throw in floating point for the ultimate coup de grace on the part of modern hardware.) Meanwhile, memory system bandwidth has grown by a factor of 16,000. The ratio of difference in this hypothetical situation is 30:1. Granted, I picked nice round numbers and assumed perfect workloads. The reality may be closer to 10:1 or less if you don't take the effect of latency on request rate into account. This assumes you interpret the loss in compute performance as a reduction in demand on the memory system, not a loss in available memory system bandwidth. If you consider a loss in memory system bandwidth, it makes the ratio look works. (See how hard it is to talk about this?)
Caches skew this tremendously. On that good ol' 6502, the memory system could service program fetches, data fetches, and still have half its bandwidth left over. Steve Wozniak used that to great effect on the Apple ][, using even cycles for the CPU and odd cycles for display refresh. Modern CPUs cache everything they can, and do so aggressively. Program fetches are serviced entirely by cache, eliminating the memory system from seeing the vast majoring of program fetches. Thus, the effect of program footprint on memory bandwidth has been very sub-linear with respect to compute rate. The data side of the equation is quite a different story.
Random, scattered scalar accesses get amplified by most caches. Caches tend to operate in terms of cache lines, so a random scalar read or write gets amplified into a full cache line transaction. Wider interfaces tend to hide this effect, especially if the width of the cache line matches or is a small multiple of the memory system interface's width. More typical program sequences have strong temporal and spatial locality, meaning that caches services the accesses directly, filtering them out of the requests going to the external memory interface. This too reduces the impact on the external memory bandwidth requirement.
But what about latency? That's wher
Program Intellivision!