Slashdot Mirror


A Glimpse Inside the Cell Processor

XenoPhage writes "Gamasutra has up an article by Jim Turley about the design of the Cell processor, the main processor of the upcoming Playstation 3. It gives a decent overview of the structure of the cell processor itself, including the CBE, PPE, and SPE units." From the article: "Remember your first time? Programming a processor, that is. It must have seemed both exciting and challenging. You ain't seen nothing yet. Even garden-variety microprocessors present plenty of challenges to an experienced programmer or development team. Now imagine programming nine different processors all at once, from a single source-code stream, and making them all cooperate. When it works, it works amazingly well. But making it work is the trick."

9 of 66 comments (clear)

  1. Sega Saturn Redux? by ToxikFetus · · Score: 4, Interesting

    As TFA mentioned, this has the potential of becoming another Sega Saturn boondoggle. Will the developers learn how to fully utilize this incredibly complex architecture? Relying on the "octopiler" to efficiently map to the Cell architecture seems a bit optimistic and naive.

    1. Re:Sega Saturn Redux? by SSCGWLB · · Score: 3, Interesting

      I seriously doubt they will write efficient programs in the lifetime of this console. The level of efficiency they will achieve depends on a lot of things. I didn't see it in TFA, but I am assuming you cannot treat each SPE as an individual processor.

      First of all, their dream of a general 'octopiler' is pure fantasy. I have written massively parallel MPI and Shared Memory applications and can testify to their complexity. Mapping an arbitrary piece of code transparently to multiple processor is a extremely difficult task. If the source is carefully written, it is possible to parallelize certain sections. This requires careful forethought and detailed knowledge of how the compiler works. If I where to guess, I would say they would use some type of middleware (a la CORBA) or libraries (a la MPI) to extend a programming language. That way, the programmer could specify sections of code that can be executed in parallel. This would help the compiler immensely and make much more efficient code. It would be really cool if the SPEs had some type of identifier, allowing you to task specific SPEs! I haven't read much about the CELL, so this may or may not be possible.

      Overall, I bet the vast majority of the parallel code will be in carefully crafted libraries of CPU intensive tasks. These libraries will grow over time, making utilization of the SPEs more and more efficient. Until then, the main CPU and one SPE will execute the majority of the game with occasional help from the other SPEs.

      ~nate

    2. Re:Sega Saturn Redux? by jd · · Score: 3, Interesting
      Not sure it's that complex. If anything, it sounds rather limiting. Eight isolated physical coprocessors, each supporting two threads? Why not have one coprocessor that supports 16 threads that maps onto as many virtual coprocessors as desired? Basically the same circuitry, but can dynamically remap to the problem being solved, as opposed to remapping the problem to the circuits provided.


      (Having the computer model itself to the problem reduces the complexity of programming and will make optimal use of the hardware. Having the program model itself after what the computer is tuned to do is merely an ugly hack and requires ugly compilers to specifically translate between the paradigms.)


      The cell processor is designed around 1980s concepts of load-balancing while keeping to many of the rules of second-generation programming. Technology has moved on. That's not to say the cell is bad. It's a definite improvement over the 1960s concepts used in many modern CPUs. However, it is still 20 years behind the curve. C'mon, guys, this isn't the Space Shuttle, it's a microprocessor. There is no excuse for network and design technology to be so far beyond the best of the best that industrial giants are capable of doing.


      Actually, it's worse than that. Modern multi-processor systems require specially-designed chipsets and become exponentially more expensive as you build them up. Single boards don't usually go beyond 16 processors. In comparison, people built single boards with 1024 Transputers without difficulty, with costs increasing linearly. So, in multi-processor architectures, we can't even match everything that could be done in the 1980s.


      How does this affect those using the Cell? Well, that's simple. It doesn't offer enough of an added advantage and is different enough that coders will have difficulty making good use of it. That means that coders will have to be inefficient OR dedicated to that one chip, which has no guarantee of making any money for them. Coders won't bother, unless there is something out there that will make it a guaranteed success. I'm not seeing this killer demo.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:Sega Saturn Redux? by Phil+Wilkins · · Score: 2, Interesting

      I am assuming you cannot treat each SPE as an individual processor.

      Your assumption would be wrong.

    4. Re:Sega Saturn Redux? by SSCGWLB · · Score: 2, Interesting

      Thanks for the condescending and uninformative remark. What I was not sure of was if the OS treated each SPE as a separate, autonomous core (i.e. SMP). I had assumed the context of my question made that clear. As it turns out, my assumption was correct.

      "The PPE which is capable of running a conventional operating system has control over the SPEs and can start, stop, interrupt and schedule processes running on the SPEs. To this end the PPE has additional instructions relating to control of the SPEs. Despite having Turing complete architectures the SPEs are not fully autonomous and require the PPE to initiate them before they can do any useful work." Courtesy of the cell wiki

      In other words, the OS tasks the PPE which tasks the SPEs. This is a entirely different beast from 8 autonomous cores.

      I also found an interesting article about programming the cell. Not all my assumptions survived *sigh*. Thanks!

      ~nate

  2. Not NINE processors, only EIGHT, since... by Harry+Balls · · Score: 4, Interesting

    ...on the average, one of the slave processors is non-functional.
    Read more about the yield problems of the Cell chip here:
    http://theinquirer.net/default.aspx?article=32978/

    Fabrication yield is estimated at only 10% to 20%, which is very low for the industry.

    1. Re:Not NINE processors, only EIGHT, since... by Anonymous Coward · · Score: 2, Interesting

      Fabrication yield is estimated at only 10% to 20%

      That's for a completely working package, the cell plus 8 SPEs. Because of the low yield of the "perfect" processors, PS3 will be using the ones with 7 working SPEs, since there are plenty of those. The IBM discussion linked by the inquirer shows that.

      Yield is so low due not only to the complexity but also the size, if there are an average of 10 defects on a wafer and you can only fit 10 processors on a wafer (these numbers pulled totally out of my ass) then you're basically hoping that those 10 defects won't be spread out evenly. if you can fit 1000 processors on a wafer, those 10 defects can kill 10 processors, and you're still doing just fine. Of course, as we go down in process size, more things that didnt matter before can become defects. When you're working under 100 nanometers, a sub-nanometer variation can be over 1% error.

  3. MOD PARENT TROLL by Anonymous Coward · · Score: 1, Interesting

    That "news" was thoroughly debunked as anti-Sony propaganda. There is almost no reason to read from the GPU's local memory from the Cell's SPEs or PPE. If you do have a legitimate reason, to do so that requires high memory bandwidth, your design is wrong. The GPU can read/write to its memory at blazing fast speeds, and talk directly to the SPEs and PPE at very high bandwidth as well. Any use of an SPE or the PPE to read directly from the GPU's local memory is a case of insane coupling between components and as we all should know is indicative of a bad design.

  4. Saturn was less well planned by Nazmun · · Score: 2, Interesting

    It was essentially an uber 2d platform with a 3dchip added in the last minute. The cell, rsx, and memory type were conceived a long time ago to work together. Neither the cell nor the graphics chip is a last minute addon to compete with a brand new foe (as psx was with it's new 3d capability).

    Also sony is hard at work at dev kits which will make programming with the cell much easier. How well they succeed in making these dev kits will be the primary factor in how programming for the beast goes.

    --
    Hmmm... Pie...