Slashdot Mirror


A Glimpse Inside the Cell Processor

XenoPhage writes "Gamasutra has up an article by Jim Turley about the design of the Cell processor, the main processor of the upcoming Playstation 3. It gives a decent overview of the structure of the cell processor itself, including the CBE, PPE, and SPE units." From the article: "Remember your first time? Programming a processor, that is. It must have seemed both exciting and challenging. You ain't seen nothing yet. Even garden-variety microprocessors present plenty of challenges to an experienced programmer or development team. Now imagine programming nine different processors all at once, from a single source-code stream, and making them all cooperate. When it works, it works amazingly well. But making it work is the trick."

16 of 66 comments (clear)

  1. Oh yeah, I remember my first time by llamalicious · · Score: 4, Funny

    I was 17 and she was 26 and ... oh shit, wrong first time.

    1. Re:Oh yeah, I remember my first time by neonprimetime · · Score: 5, Funny

      I was 17 and she was 26 and

      Per chance, did you have a MySpace account, and your parents didn't know about your little shin-dig?

    2. Re:Oh yeah, I remember my first time by Rakshasa+Taisab · · Score: 4, Funny

      I'm not familiar with any 26 processors, surely you meant 286?

      --
      - These characters were randomly selected.
    3. Re:Oh yeah, I remember my first time by goodenoughnickname · · Score: 2, Funny

      He had sex with a 286-year old?! What was she, a wookie?

  2. Re:eh by Zediker · · Score: 4, Insightful

    "Console gamers get consoles because they can't deal with installing video card drivers."

    Nope, console gamers buy consoles because they offer games that dont appear on the PC and/or dont have the money to buy a pc gaming rig. $1200+ (im talking building from the ground up with reliable and decent parts) to just start getting a decent computer together usualy isnt as justifiable as spending ($100:GC, $130:DS, $150:PS2/Xbox, $200:PSP, $400:360) for a console of some sort.

    --
    I love to slaughter the english language.
  3. Sega Saturn Redux? by ToxikFetus · · Score: 4, Interesting

    As TFA mentioned, this has the potential of becoming another Sega Saturn boondoggle. Will the developers learn how to fully utilize this incredibly complex architecture? Relying on the "octopiler" to efficiently map to the Cell architecture seems a bit optimistic and naive.

    1. Re:Sega Saturn Redux? by SSCGWLB · · Score: 3, Interesting

      I seriously doubt they will write efficient programs in the lifetime of this console. The level of efficiency they will achieve depends on a lot of things. I didn't see it in TFA, but I am assuming you cannot treat each SPE as an individual processor.

      First of all, their dream of a general 'octopiler' is pure fantasy. I have written massively parallel MPI and Shared Memory applications and can testify to their complexity. Mapping an arbitrary piece of code transparently to multiple processor is a extremely difficult task. If the source is carefully written, it is possible to parallelize certain sections. This requires careful forethought and detailed knowledge of how the compiler works. If I where to guess, I would say they would use some type of middleware (a la CORBA) or libraries (a la MPI) to extend a programming language. That way, the programmer could specify sections of code that can be executed in parallel. This would help the compiler immensely and make much more efficient code. It would be really cool if the SPEs had some type of identifier, allowing you to task specific SPEs! I haven't read much about the CELL, so this may or may not be possible.

      Overall, I bet the vast majority of the parallel code will be in carefully crafted libraries of CPU intensive tasks. These libraries will grow over time, making utilization of the SPEs more and more efficient. Until then, the main CPU and one SPE will execute the majority of the game with occasional help from the other SPEs.

      ~nate

    2. Re:Sega Saturn Redux? by jd · · Score: 3, Interesting
      Not sure it's that complex. If anything, it sounds rather limiting. Eight isolated physical coprocessors, each supporting two threads? Why not have one coprocessor that supports 16 threads that maps onto as many virtual coprocessors as desired? Basically the same circuitry, but can dynamically remap to the problem being solved, as opposed to remapping the problem to the circuits provided.


      (Having the computer model itself to the problem reduces the complexity of programming and will make optimal use of the hardware. Having the program model itself after what the computer is tuned to do is merely an ugly hack and requires ugly compilers to specifically translate between the paradigms.)


      The cell processor is designed around 1980s concepts of load-balancing while keeping to many of the rules of second-generation programming. Technology has moved on. That's not to say the cell is bad. It's a definite improvement over the 1960s concepts used in many modern CPUs. However, it is still 20 years behind the curve. C'mon, guys, this isn't the Space Shuttle, it's a microprocessor. There is no excuse for network and design technology to be so far beyond the best of the best that industrial giants are capable of doing.


      Actually, it's worse than that. Modern multi-processor systems require specially-designed chipsets and become exponentially more expensive as you build them up. Single boards don't usually go beyond 16 processors. In comparison, people built single boards with 1024 Transputers without difficulty, with costs increasing linearly. So, in multi-processor architectures, we can't even match everything that could be done in the 1980s.


      How does this affect those using the Cell? Well, that's simple. It doesn't offer enough of an added advantage and is different enough that coders will have difficulty making good use of it. That means that coders will have to be inefficient OR dedicated to that one chip, which has no guarantee of making any money for them. Coders won't bother, unless there is something out there that will make it a guaranteed success. I'm not seeing this killer demo.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:Sega Saturn Redux? by Phil+Wilkins · · Score: 2, Interesting

      I am assuming you cannot treat each SPE as an individual processor.

      Your assumption would be wrong.

    4. Re:Sega Saturn Redux? by SSCGWLB · · Score: 2, Interesting

      Thanks for the condescending and uninformative remark. What I was not sure of was if the OS treated each SPE as a separate, autonomous core (i.e. SMP). I had assumed the context of my question made that clear. As it turns out, my assumption was correct.

      "The PPE which is capable of running a conventional operating system has control over the SPEs and can start, stop, interrupt and schedule processes running on the SPEs. To this end the PPE has additional instructions relating to control of the SPEs. Despite having Turing complete architectures the SPEs are not fully autonomous and require the PPE to initiate them before they can do any useful work." Courtesy of the cell wiki

      In other words, the OS tasks the PPE which tasks the SPEs. This is a entirely different beast from 8 autonomous cores.

      I also found an interesting article about programming the cell. Not all my assumptions survived *sigh*. Thanks!

      ~nate

  4. Re:memory speed? by Space+cowboy · · Score: 5, Informative

    You are misinformed.

    This is the speed at which the Cell can read RSX's local memory. Memory bandwidth for the Cell itself is ~25 GB/sec. If the Cell ever wants to access the private RAM of the RSX (why ?) it *is* possible, but it's a lot more efficient to use the normal pathway through main memory...

    Simon.

    --
    Physicists get Hadrons!
  5. Not NINE processors, only EIGHT, since... by Harry+Balls · · Score: 4, Interesting

    ...on the average, one of the slave processors is non-functional.
    Read more about the yield problems of the Cell chip here:
    http://theinquirer.net/default.aspx?article=32978/

    Fabrication yield is estimated at only 10% to 20%, which is very low for the industry.

    1. Re:Not NINE processors, only EIGHT, since... by Anonymous Coward · · Score: 2, Interesting

      Fabrication yield is estimated at only 10% to 20%

      That's for a completely working package, the cell plus 8 SPEs. Because of the low yield of the "perfect" processors, PS3 will be using the ones with 7 working SPEs, since there are plenty of those. The IBM discussion linked by the inquirer shows that.

      Yield is so low due not only to the complexity but also the size, if there are an average of 10 defects on a wafer and you can only fit 10 processors on a wafer (these numbers pulled totally out of my ass) then you're basically hoping that those 10 defects won't be spread out evenly. if you can fit 1000 processors on a wafer, those 10 defects can kill 10 processors, and you're still doing just fine. Of course, as we go down in process size, more things that didnt matter before can become defects. When you're working under 100 nanometers, a sub-nanometer variation can be over 1% error.

  6. Re:The article's author is huffing crack here... by argent · · Score: 2, Insightful

    Except, of course, that ray tracing is not easily parallelizable as you need a significant amount of data to each of those postage stamp size pieces

    The mesh is common to all the processors, and not that big, it can be broadcast. Textures are the big chunk, but most pieces will only need high resolution versions of the textures in their direct view... unless a processor is looking at an optically interesting surface (for reflections or refractions) it can get by with mesh-resolution approximations to the textures outside its part of the scene.

    This requires new technology, yes. You need mesh caches shared among not-too-many processors, and techniques to broadcast the mesh to the mesh cache efficiently, and a front-end to apportion the space to the processors and parcel out textures, maybe even go to a finer subdivision for "interesting" areas. But raytracing is practically the poster boy for "embarassingly parallelizable" applications.

    Adding cache and cores is also, to some degree, the solution when you are out of ideas.

    Not to that great a degree, and we've really only scraped the surface with what we'll be doing with multi-core. DEC laid down a long term plan for the alpha in the early '90s and multi-core was planned for the early '00s right from the start. Compaqtion and having Intel pull a fast one on HP wasn't in their plans, but 4 or 8 cores and enough cache to keep them fed is just the next step.

    Another thing we're going to see, particularly for laptops, is super-integrated chipsets. Freescale's e600 would have been the next step for Apple if they'd been faster getting it to market (or if Apple had been less reluctant to break the G4 bus compatibility and they'd gotten started sooner), and it seems to me that adding the GPU in as well makes a lot of sense. Expect to see Intel CPUs with GMAxxx (or their descendants) on-chip, and AMD cutting deals with nVidia and ATI.

  7. Re:The article's author is huffing crack here... by argent · · Score: 3, Insightful

    I think the article's point was that once you get more and more transistors on there it becomes very difficult to design things to not end up overheating all the time and not use up insane amounts of power, not to mention just becoming extremely complex like x86 cores today.

    I wasn't talking so much about the article as a whole, but the insane levels of hyperbole in the particular paragraph I quoted. "We're capable of putting more transistors on a chip than we can think of things to do with". That's not even vaguely true.

    More transistors == more power, all else being equal, because it's all those junctions flipping state so quickly that uses the power.

    As for the insanity if Intel's processors... that seems to be a perversion particular to Intel. In the past three decades that I've been following the industry, Intel has only managed to produce *one* sane CPU design, the i960, and they promptly caponised it by removing the MMU and relegating it to embedded controls lest it outcompete their cash cow.

    The rest... from the 4004 through the 8080, the 8086 and its many descendants, iApx432, i860, and Itanium... have been consistently outperformed by chips with smaller transistor budgets built by companies with far fewer resources. They only occasionally broke past the midrange of the RISC chips, and were usually trailing back with the anemic Sparc. Where they have excelled has been marketing and in the breadth of their support... both hardware and business. IBM went with the 8088 because they could get them in quantity and they could get good cheap support chips for them: if you went with Motorola or Zilog or Western Digital or National Semiconductor you pretty much had to go back to Intel to build the rest of your computer anyway.

  8. Saturn was less well planned by Nazmun · · Score: 2, Interesting

    It was essentially an uber 2d platform with a 3dchip added in the last minute. The cell, rsx, and memory type were conceived a long time ago to work together. Neither the cell nor the graphics chip is a last minute addon to compete with a brand new foe (as psx was with it's new 3d capability).

    Also sony is hard at work at dev kits which will make programming with the cell much easier. How well they succeed in making these dev kits will be the primary factor in how programming for the beast goes.

    --
    Hmmm... Pie...