Slashdot Mirror


PS3 Cell Processor 'Broken'?

D-Fly writes "Charlie Demerijian at the Inquirer got a look at some insider specs on the PS3, and says, Sony screwed up big time with the Cell processor; the memory read speed on the current Devkits is something like 3 orders of magnitude slower than the write speed; and is unlikely to improve much before the ship date. The slide from Sony pictured in the article is priceless: 'Local Memory Read Speed ~16Mbps, No this isn't a Typo.' Demerjian says when the PS3 comes out a full year after the XBox360, it's still going to be inferior: 'Someone screwed up so badly it looks like it will relegate the console to second place behind the 360.'" This is the Inquirer, so take with a grain of salt. Just the same, doesn't sound too good for Sony or IBM.

10 of 417 comments (clear)

  1. PS2 Vs PS3 by eldavojohn · · Score: 5, Informative

    Microprocessor Online has some an interesting analysis. Pay attention to page 8, where the PS2 "Emotion Engine" processor is compared to the PS3 Cell processor. This is an analyst report for the industry of microprocessors.

    If you really want to dig into the details of the Cell processor, check out Sony's resources. You have to agree to a bunch of things to get to the pdfs but there's a lot of information in them. Another place you can find information is IBM's resource site which contains a lot of stuff including the programming handbook.

    --
    My work here is dung.
  2. Re:Go Sony, go! by adubey · · Score: 5, Informative

    On the cell processor, local memory is similar to a cache, but is not "transparently" managed by the CPU. Rather, the software must explicitly say what it wants to have in the local memory.

  3. Re:D-Fly, you piece of shit: Mbps != MB/s by Anonymous Coward · · Score: 5, Informative

    I was just about to post the exact same thing. It amazes me that:
    1) The poster had no clue
    2) Zonk (and for that matter, the whole /. editing kiddie troupe) seems to have no clue
    3) This mistake happens _constantly_ on /., and it's constantly pointed out and constantly ignored
    4) Anyone with even a basic understanding of computers wouldn't make this mistake

    Just more proof that "IT" != computer science

  4. For goodness sake... by hptux06 · · Score: 5, Informative

    Does anyone ever bother reading the *IBM* documents for this? Never mind what Sony have managed to do to the cell processor, if you turn to the IBM CBEA developers handbook (page 75), you will see:

    "Load and store operations (LS), 6 Clock cycles Latency". And that's the time it takes for the instruction to complete, not to be issued to memory.

    (3.2Ghz / 6 cycles) * 16 bytes != 16MB/s

    Personally, I'm gonna bet on IBM being right, seeing how they're the ones who made the bloody thing. I don't trust the inquirer anyway, but if those figures are true, the most likely answer is inefficiencies in their benchmarking programs, (Such as instruction starvation, a nasty side effect of using SPU's)

  5. 16MB/s = CPU reading GPU memory directly by Anonymous Coward · · Score: 5, Informative

    The The Inquirer article is rubbish and that slide is taken out of context. It seems to imply that the Cell can only read "Cell local memory" (whatever that is) at 16MB/s.

    Memory transfer bandwidth between each SPU and its SPU Local Memory is something more like 25GB/s (gigabyte per second); sustained actual bandwidth between all SPUs is greater than 100GB/s; peak theoretical is greater than 200GB/s (assuming all 8 SPUs present for simplicity).

    If you had access to the full version of the presentation (part of the full Sony PS3 SDK and technotes), you'd realise that that slide is part of a presentation about the RSX (the PS3's GPU). As such, when it refers to "Local Memory", it means RSX's Local Memory (eg graphics memory, video memory, VRAM or whatever you call it in fanboy/ps3/360-is-teh-suck websites). To be understood outside that context, the columns would be better labelled "Main System Memory" and "GPU Local Memory".

    The Inquirer article seems to suggest that this figure of 16MB/s (megabyte per second, by the way, what the fuck is it with journalists swapping bits for bytes? why don't they get their shift/capslock keys fixed?) is some kind of show stopper. No it isn't. It simply means that the Cell processor has 16MB/s bandwidth when reading directly from memory-mapped GPU address space. So what? Unless you're planning on calling memcpy() or some shit to bring your data back then it doesn't really matter.

    On RSX-initiated transfers you have 20GB/s bandwidth to do the same transfer (from RSX local to main system memory). Cell read bandwidth of GPU memory might as well have 0MB/s (ie no connection at all) and it wouldn't matter a bit.

  6. Re:Go Sony, go! by Retric · · Score: 5, Informative

    "The "Local Memory" is the RSX graphics memory. The Cell shouldn't need to read this. The PS3 would still work even if the Cell couldn't read this memory at all. This memory is where you store textures and other graphics data."

    IMO it's reasonable to have asynchronous communication with the graphics subsystem. The only stupid thing going on is calling graphics cards memory "Local Memory". It suggests that the X-Box got it right by having one big chunk of memory that is read by both the CPU and GPU even if most developers will make the same basic split anyway.

  7. Re:Go Sony, go! by Ford+Prefect · · Score: 5, Informative

    "The "Local Memory" is the RSX graphics memory. The Cell shouldn't need to read this. The PS3 would still work even if the Cell couldn't read this memory at all. This memory is where you store textures and other graphics data.

    Presumably in the (unlikely?) event you did need the output from the RSX graphics chip for manipulation by the Cell processor gubbins, you could get it to render to main memory, let the processor do the appropriate data-diddling, then have the RSX read it back again?

    The 'local memory' is presumably the RSX's private play area, and thus the RSX gets maximum-stupendous-speed priority, and the Cell gets occasional access at weekends. Which is a bonus, and not even necessary...

    --
    Tedious Bloggy Stuff - hooray?
  8. Re:Go Sony, go! by robosmurf · · Score: 5, Informative

    Sadly, you are also wrong.

    In the slide, the "Local Memory" refers to the RSX local memory, not the SPU local memory. The article says that the next slide is Sony telling devs to use the RSX to do the transfer instead, which only makes sense if it is talking about the RSX memory.

    Your conclusion is right though, as this also is memory that the Cell doesn't need to read from.

  9. Re:main memories read speed is 25GB/s by slick_rick · · Score: 5, Informative

    That is because you have never done any work in 3D graphics. It isn't at all unusual for the video memory to have incredible write speeds and painfully slow read speeds (back to the CPU that is). The reason is that in 3d graphics the video card does the actual rendering. Therefore you simply tell it "I want a blue triangle at the coordinates X,Y,Z (x3) with T texture applied". The card renders it and applies the texture from texture memory and then displays it onto the screen. You never need to read the (texture) memory, because the data contained in it is throw away (why would you need to read the texture in that you sent to the card?)

    So it is perfectly normal for texture memory to be nearly write-only. As long as writing to it is extremely fast (which it is in this case according to the PP slide), that isn't a problem.

    --
    apt-get install redhat please god - Me (take it easy, I love Debian)
  10. Broken benchmark, perhaps? by Mr+Z · · Score: 5, Informative

    Either that, or a broken benchmark. Each Cell processor (Synergistic Processing Element -- SPE) shares its instruction fetch port with its data memory port. The SPE can buffer up 80 instructions at a time (2.5 fetch words), plus an additional 32 from a branch target. Fetch will stall if the memory system gets saturated with loads and stores. Properly written memory-intensive code includes explicit fetches to keep these buffers full. Incorrectly written code will cause problems. Still, that doesn't explain a 3 orders of magnitude drop.

    If you look at the slides on the page I linked to above, you'll see the SPEs are not connected into the global address space. They connect to a private single ported memory, and to each other through two unidirectional rings. (The ring structure is not apparent from that diagram, but trust me, it's there.) These rings then connect to a DMA engine.

    If you wade through this paper, you'll see that the Cell compiler implements a software cache. (The same paper also explains the instruction fetch mechanism mentioned above, BTW.) That is, it emulates a cache in software, using the DMA to actually move memory around. Depending on the nature of the benchmark and how it was written, it could be that the read benchmark spends all its time allocating stuff into this cache and waiting for it to arrive. Writes would be faster because the cache can "write behind" without having to wait for the allocation to happen, if the compiler is smart enough to know that the previous data will be entirely overwritten. So, if the benchmark goofed, then the results are meaningless.

    Fact of the matter is that the SPEs are capable of reading 128 bits a cycle each (128 bytes / cycle across the 8 SPEs). Other benchmarks, such as the article recently posted to Slashdot about using Cell for scientific computation confirm that this thing hauls--and these are bandwidth-intensive tasks. The quoted paper did run some numbers on real silicon and showed numbers similar to their simulation results.

    With all this in mind, I find it hard to believe that Cell is broken.

    --Joe