Slashdot Mirror


Cell Architecture Explained

IdiotOnMyLeft writes "OSNews features an article written by Nicholas Blachford about the new processor developed by IBM and Sony for their Playstation 3 console. The article goes deep inside the Cell architecture and describes why it is a revolutionary step forwards in technology and until now, the most serious threat to x86. '5 dual core Opterons directly connected via HyperTransport should be able to achieve a similar level of performance in stream processing - as a single Cell. The PlayStation 3 is expected to have have 4 Cells.'"

76 of 570 comments (clear)

  1. Seeing is believing by Anonymous Coward · · Score: 3, Informative

    It's not like we haven't heard it before. It usually turns out to be halfish-truish for some restricted subset of operations in a theoretical setting, you know where you discount busses, memory and latencies.

    1. Re:Seeing is believing by aphor · · Score: 4, Interesting

      No, this sort of architecture is a general trend towards paralellization. It is smart, and it is known to work, and I would expect some bright Sparc wise people to chime in and say "u-huh" and some SGI wise people to chime in and say "I've seen some of this before." The OS people are starting to move things in this direction, and I've heard that Darwin has had the asynchronous messaging type threading model for a while (RTFA: the article explicitly mentions Tiger's GPU leveraging techniques). If you have the head for it, try reading up on NUMA and compare that with SMP.

      The math is simple. CPUs are CPUs, and anyone can make one that is the same speed as the competition, and if they do it second they can do it cheaper. The guy that can make 20 CPUs work like one CPU that does 20 times the work in a given time will win because he can always just throw more hardware at the problem. The SMP guys have to go back to the drawing board. In this case, the only way to beat-em is to join-em. Maybe doing the specific "Cell" computing design isn't it, but the ol' PC is dead. If these things start hitting the commodity price-points.

      That's a big, fat IF. So, don't bet on it (yet), but it's even worse to ignore it.

      --
      --- Nothing clever here: move along now...
    2. Re:Seeing is believing by Raunch · · Score: 2, Interesting

      Then perhaps some of those bright people can shine some light in my direction.

      FTFA
      Caches work by storing part of the memory the processor is working on, if you are working on a 1MB piece of data it is likely only a small fraction of this (perhaps a few hundred bytes) will be present in cache, there are kinds of cache design which can store more or even all the data but these are not used as they are too expensive or too slow.

      APU local memory - no cache
      To solve the complexity associated with cache design and to increase performance the Cell designers took the radical approach of not including any. Instead they used a series of local memories, there are 8 of these, 1 in each APU.

      The APUs operate on registers which are read from or written to the local memory. This local memory can access main memory in blocks of 1024 bits but the APUs cannot act directly on main memory.

      By not using a caching mechanism the designers have removed the need for a lot of the complexity which goes along with a cache.

      This may sound like an inflexible system which will be complex to program and it most likely is but this system will deliver data to the APU registers at a phenomenal rate. If 2 registers can be moved per cycle to or from the local memory it will in it's first incarnation deliver 147 Gigabytes per second. That's for a single APU, the aggregate bandwidth for all local memories will be over a Terabyte per second - no CPU in the consumer market has a cache which will even get close to that figure. The APUs need to be fed with data and by using a local memory based design the Cell designers have provided plenty of it.


      Ok, so regular memory is too slow, or too expensive (apparently caching is the main problem although what I remember from comp. arch. was that caching was a Good Thing). So then they abolish the cache.
      Caching: CPU > cache (1 or 2) > main memory > HD
      and then implement another system that is *completely different*
      Non-Caching: APU > local memory > main memory
      Now first off, how is it different? Secondly how does this improve the physical memory speed? Is the author claiming that a page fault is what we are avoiding here? If that's the case, then the problem is hard drives not solid state memory. But (again if what I learned in comp arch is correct), because there is a tendency for programs to run the same code over and over again, then (assuming you have a good algorithm) the time saved by caching is signifigant.

      Anyone?

      --
      George II -- Spreading Freedom and American values, one bomb at a time.
    3. Re:Seeing is believing by Henk+Poley · · Score: 2, Interesting

      So then they abolish the cache.
      Caching: CPU > cache (1 or 2) > main memory > HD
      and then implement another system that is *completely different*
      Non-Caching: APU > local memory > main memory
      Now first off, how is it different?


      Let me take a vaguely educated guess.

      Currently the cache managers in x86 CPUs "predict" what part of the memory space is needed. This prediction isn't always that good, and efforts to make programs hint to processor what to cache haven't worked good enough (or at least according to Cell CPU designers). So they force to program to operate in a 'small' memory space where data can be read in from a large RAM storage.

      I don't know if it will actually help. To me it seems a bit like going back tot he 80268/80386 era with himem.drv under DOS so programs could acces higher memory regions by commanding the driver to swap in and out memory to the lower 640k.

      But then, maybe Bill Gates -or whoever put up that quote in his name- was right, 640k RAM is enough for everyone (multiplied by the number of cores on dye..).

    4. Re:Seeing is believing by be-fan · · Score: 2, Interesting

      The local storage is different for the following reasons:

      1) It must be programmed differently. Instead of just accessing memory how you want, you must explicitly copy the part of memory you need at the moment. So, if your APU is acting as a vertex shader, you need to copy the shader code into the LS before you start processing. Essentially, the LS can give you the time savings of a cache, but you have to manage it yourself to get the benefits.

      2) Since the LS isn't managed by the hardware, it doesn't need a lot of management hardware. You don't need cache tags, lookup hardware, hardware to manage misses, etc. This saves a lot of transistors.

      3) A regular cache has to do some management on each access. It has to search the tags to find what cache line holds a given memory word, it has to perform write-back, etc. Since the LS doesn't need to do any of this, latency can be cut down.

      4) Since the LS is addressed directly, and isn't mapped onto memory, there is no need for cache coherency protocols. A cache-coherent multi-processor system needs to communicate with it's peers to coordinate access to the cache. For example, when it writes to a memory location, it must notify all other processors caching that location that their copies are now invalid. The LS doesn't need to do any of this, and that cuts down on both management hardware and latency.

      The APUs are stream processors. It is common for stream processors to not have a general memory cache. The geforce 3's vertex processor, for example, has enough cache to hold 18 vertices. The are not an LRU cache like a Pentium's, but a FIFO (much cheaper to manage), and are only used in certain circumstances. In comparison, the Cell's 128KB LS per PE is enormous!

      --
      A deep unwavering belief is a sure sign you're missing something...
  2. Human skin cells by MarkRose · · Score: 2, Funny

    Yeah, but can my inkjet print them?

    --
    Be relentless!
  3. This is beginning to sound more like by gotr00t · · Score: 2, Funny

    a DBZ reference: "Part 4: Cell Vs the PC"

    1. Re:This is beginning to sound more like by goodbadorugly · · Score: 4, Funny

      a DBZ reference: "Part 4: Cell Vs the PC"

      The 45 episode saga in which:

      Bill Gates becomes a cyborg and summons the forces of evil.

      A new Cell is constructed out of unsold Itaniums (Not to be confused with the Cell built by Sony, which is a friendly robot that is found out to be good. ( Until he is found out to be evil when the heroes notice he is under the control of the cyborg Bill Gates who has been behind the charade the entire time) and challenges the world to a rematch of earth shattering proportions

      Second string characters have meaningless conversations that take up entire episodes

      There is hilarious comic relief from common citizens in various towns as their cities crumble around them

      Krillin dies

      The dragon is summoned

      Goku gets a haircut ...Good lord I should have my anime viewers license revoked for knowing all that crap.

    2. Re:This is beginning to sound more like by Rethcir · · Score: 3, Funny
      You forgot that the first 10 minutes of each episode will recap the prior episode, and the last 10 minutes of each episode will be a preview of the next episode.

      Inuyasha? KAGOME!

  4. if it sounds too good to be true.. by gl4ss · · Score: 4, Insightful

    ..it probably is.

    was the ps2 the supercomputer it was said to be...?

    the author goes on to suggest that cell workstations would smoke x86 counterparts.. but says at the same time that there probably wont be that many of them.

    wtf? though in-between the lines you can read at the end that he also thinks a single g5-cpu workstation would 'smoke' x86's...

    --
    world was created 5 seconds before this post as it is.
    1. Re:if it sounds too good to be true.. by kai.chan · · Score: 2, Interesting

      was the ps2 the supercomputer it was said to be...?

      I don't remember Sony making any big statements about the Emotion Engine being a supercomputer. What I do remember, is that when they released the clock speed of their processor, people knew the relative power of the PS2. From what I see of the Cell architecture, I can guarantee that the Cell is much more powerful than any AMD and Intel processors.

      It seems like you didn't read much into the technical aspect of the Cell architecture presented in the long write-up. From just looking at a simple top-level diagram of the Cell architecture, it is clearly shown that the Cell is much more powerful than any other processors currently available. A Cell contains a Processor Unit with 8 additional Processor Units, each with its own registers. The architecture is also a distributed computing network capable of splitting tasks and computations over a wide variety of home electronics. Each Cell product you buy, you are increasing your processing power of your household. In conclusion, yes, it would smoke a x86 counterpart.

    2. Re:if it sounds too good to be true.. by CoolGuySteve · · Score: 3, Informative

      When it was released, the Emotion Engine in the PS2 actually would have been pretty wicked for supercomputing applications if Sony had sold a version with faster interconnects and more RAM. The processors in the PS2 are designed almost entirely to crunch vector operations, which is what most scientific codes rely on. It's really an excellent computer, it just sucks at graphics. The 4MB of uncompressed video memory and lack of hardware texture support are particularly ugly.

      I suspect that the main reason there was never an Emotion Engine based cluster product was because the high performance market is tiny, especially compared to the console market, and Sony was already having trouble meeting demand with their exotic chipset when it first came out.

      Anyways, I think the guy does go overboard about this new architecture. It probably will be a lot faster than PCs at certain tasks but you can only fit so many transistors in a chip. The cell stuff is cool though, it seems to fit a lot better with what most computers spend their time processing unless you're doing a lot of compiling or database operations.

    3. Re:if it sounds too good to be true.. by Jaysyn · · Score: 2, Funny

      Stopping China from getting them? Aren't they made in China?

      Jaysyn

      --
      There is a war going on for your mind.
    4. Re:if it sounds too good to be true.. by zogger · · Score: 3, Interesting

      Because IBM is an R&D and service company mostly, or it looks like they are headed that direction eventually. They can make better profit margins by just designing then licensing out the tech. By concentrating on their core missions they can maximise ROI, and leave the headaches and drudge work of mass production and marketing of consumer level stuff to some other company, and still get paid well for it. Granted, you get a higher gross income with being the manufacturer, but you get a better net income by just licensing and developing.

      At least it looks that way to me, and it's following their past business model of selling off consumer level production, like they did with hard drive manufacture to Hitachi. Whether that will be a very long term smooth move I have no idea, but in the short term it's actually making them money. Profit margins at low end retail are small, they want no part of that, too clunky for them. Fabbing the chips is a different story, they need to be able to have a place to build what they R & D, so in that sense its logical for them to do that,and get that aspect subsidised by licensing and direct sales (saves them research costs long run) but after that point it's just manufacturing vacuum cleaners or blenders, they don't want to, and that's all PCs are now, just another consumer appliance.

  5. What always confused me by hyu · · Score: 5, Insightful

    Something that has always confused me in gaming consoles is that, despite incredibly powerful hardware (processors, graphical chips, etc.), the system developers seemingly always neglect to put in enough RAM for most games to perform to their potential. Many PC ports often have portions compromised due to the lack of RAM, and system speeds also suffer because of this.

    Seeing how RAM is increasingly becoming cheaper, is it possible that new systems like the PlayStation3 might be able to provide RAM that actually allows games to reach their potential along with this new cell hardware?

    1. Re:What always confused me by FRAGaLOT · · Score: 2, Informative

      Actually I find the opposite to be true. Take for example an Xbox, which is basically a PC from about seven years ago. (Sub gigahertz P3, 64megs RAM, GeForce3 video)

      But it plays all the popular games of today's PC with little to no lag. Where as you need a very high end PC to play the same game!

      This is mostly due to the fact that the architecture with the video is more direct, than it is on a PC. There's no AGP bus, or any bottle neck to access video ram. It's more direct which is probably why an Xbox can perform as well as a current PC rig.

      But then an Xbox is only running at 800x600. LOL

      --
      -FRAGaLOT
    2. Re:What always confused me by John+Betonschaar · · Score: 2, Insightful

      But then an Xbox is only running at 800x600. LOL

      Actually, it is running at 720x576 (a PAL XBOX that is) but I don't see why this is so funny, because that's just the resolution of a PAL TV. Having a higher framebuffer resolution would probably only decrease the output quality when displayed on a normal television.

      That said, if you have an HDTV, the XBOX can output at 1920x1080i...

      Your sig is mine

    3. Re:What always confused me by Troed · · Score: 2, Interesting
      Either chip your box or use a software exploit, switch it into NTSC video mode (you can do that without having to change the game region mode - your PAL originals will still work), connect a component videocable, enter the Microsoft Dashboard, select the HDTV resolutions you want from the list under Video - Settings. Play in HDTV.

      ... yes, all this - and you can still play PAL originals in higher resolution on XboxLive - as long as the chip is off (or the software exploit not used) afterwards.

    4. Re:What always confused me by Gordonjcp · · Score: 2, Insightful
      Take for example an Xbox, which is basically a PC from about seven years ago. (Sub gigahertz P3, 64megs RAM, GeForce3 video)


      I would *love* to know how you had a P3 and a Geforce 3 in 1998. I had to make do with a brand-new top-of-the-range PII-350 with a Matrox Mystique and SLI-ed Voodoo 2s. Did you pull the Geforce through one of these parallel universe wormholes?

    5. Re:What always confused me by LarsWestergren · · Score: 2, Informative

      Take for example an Xbox, which is basically a PC from about seven years ago. (Sub gigahertz P3, 64megs RAM, GeForce3 video) But it plays all the popular games of today's PC with little to no lag. Where as you need a very high end PC to play the same game! This is mostly due to the fact that the architecture with the video is more direct, than it is on a PC. There's no AGP bus, or any bottle neck to access video ram. It's more direct which is probably why an Xbox can perform as well as a current PC rig.


      Well no, not exactly. The reason console games don't suffer from lag is that unlike a PC, the hardware specs are not a moving target during development. Developers can optimize textures, audio, algorithms etc with a specific platform in mind. This makes it much easier to create content that you know won't overwhelm the machine.

      Compare this with a PC developer. They have to estimate the time it takes to develop the game. Then they have to estimate the average gamer hardware and the cutting edge gaming hardware at the time the game is released. They have to take into consideration at the very least processor speeds, main memory size and speed, graphic card speed and memory size.

      If the developers overestimate, the game will be unplayable (when the first System Shock came for instance, I remember reviewers writing "You actually need a PENTIUM to play this game, it's insane!"). If they underestimate hardware or take too long, they will be killed by reviews complaining about "outdated graphics". Oh, and preferably there shouldn't be any problems with any special configuration.

      This is extremely difficult to achieve. Half-Life 2 for instance was praised for the fact that it managed to scale its graphics so it was playable on low end yet good looking on high end machines. However, some people experienced stuttering of audio as levels started, this was even more noticable in Vampire: Bloodlines, a great game that uses the HL2 engine. I think this had something to do with the hard drive loading textures or level geometry (I noticed it especially when loading the huge LA Downtown level in Vampire, sound was stuttering for 10 seconds after level loaded). People with fast hard drives, especially those that chose or had to chose low resolution textures didn't suffer from it as much as those with graphics cards with lot of memory for high resolution textures and comparatively slow hard drives.

      So, the interaction between the many different hardware configurations on PCs makes it difficult to optimize and that is what causes the lag, not the lack of any AGP bus or anything. A console developer can test on just on one console and be fairly certain it will run the same on all target machines.

      --

      Being bitter is drinking poison and hoping someone else will die

    6. Re:What always confused me by DeadScreenSky · · Score: 2, Informative

      I think that a much bigger factor is that Warren Spector wasn't the lead designer anymore. A game like DE simply isn't something that can reasonably be accomplished by someone new to that position. The game has all the hallmarks of an inexperienced dev team (in particular, note the tonal and pacing issues).

      Anyway, hardware really didn't affect the game like some people pretend it did. The streamlined gameplay was because Harvey Smith and his team wanted it that way (the Xbox has certainly seen plenty of more complicated gameplay systems than Deus Ex 1!), and the vast majority of them would have occurred even if the game was PC only. I disagree with the decision as well, but the devs wanted the gameplay to be simpler and more focused.

      Most of the engine limitations were simply because they chose poor technology - the hacked-up UT2k3 engine didn't scale or perform well on the PC, either. Lots of Xbox games feature huge levels with minimal or even non-existent load times (see Riddick, Halo series, Ninja Gaiden...).

      (And I would point out that the original DE was pretty spotty when it came to tech as well. Very slow on most systems when it was released, without particularly nice graphics to compensate. I suspect they just donn't have the calibur of 3D programmer required...)

      And as much as I love DE1, it really didn't leave room for a sequel. Near-future stuff works fine, but once you get close to or even past the Singularity, it is almost impossible to create a realistic or interesting setting (as any human just wouldn't be close to where the real action is). Most of the truly interesting conspiracy theories were already dealt with (seriously, you already used the Templars, the Illuminati, and Majestic 12). Most of the interesting future tech was used (nanotechnology especially, though I do think Invisible War expanded on it in interesting areas). And you couldn't reasonably expand on the cyberpunk theme too much, because the world had already been pulled back from the brink in the original. (The globalization issues that Invisible War brought up was a good attempt, but that is really hard to address in a FPS - there are no real masses, you understand?)

      --
      There is no excellent beauty that hath not some strangeness in the proportion. -- Francis Bacon
  6. I'll believe it when I see it by Anonymous Coward · · Score: 5, Insightful

    I'll believe it when I see it. Sony made outrageous claims with the PS2 in the year or so before launch, I see no reason to believe this will be any different.

    On paper an Emotion Engine was supposed to destroy everything, but achieving maximum throughput was difficult and other contraints such as I/O and memory hampered performance. Programmers had to learn a very different way of programming to make full use of the processor and it's two vector units.

    A Cell might be a killer chip on paper, but real-world hardware with I/O latency and memory contraints will bring things down to a more reasonable level. Don't forget that multiprocessor programming is *hard*.

    Hopefully, developing software for Cell chips will be easier then the early days of the PS2, Sony has already said as much a few months ago.

    1. Re:I'll believe it when I see it by evilviper · · Score: 2, Informative
      Yes, wasn't this the fate of the sega saturn? Or was it the atari jaguar? It used multiple processors which made it a fast and cheap system, but developers steered toward the ease of programming on the N64

      The Sega Saturn used dual processors, and was nearly a clone of top-end Sega arcade systems. Unfortunately, it was terribly hard to program, so only in-house Sega titles were developed to utilize the full potential of the device, such as Virtua Fighter, while other titles were only using half the performance of the system.

      It was not, however, cheap.

      The Jaguar was cheap and was hard to program for (primarily because of bugs in the cheap hardware), but it could hardly be considered a powerful system. It was well ahead of NES/Genesis, but it came in late in the game, when they were both well established, and the Playstation/Sega Saturn were just around the corner.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  7. Can this be taken seriously? by Anonymous Coward · · Score: 5, Insightful

    Quotes from article:

    "GPUs will provide the only viable competition to the Cell but even then for a number of reasons I don't think they will be able to catch the Cell."

    Did this guy forget that NVidia is designing the GPU for PS3? If Cell is so almighty, why does Sony uses NVidia GPU instead of using more Cells for graphic prosessing?

    "There is another reason I don't think Nvidia or ATI will be able to match the Cell's performance anytime soon."

    Of course, Cell based products won't be available anytime soon either. According to the current rumors, PS3 will be available in Japan in Spring 2006 and elsewhere in Autumn 2006. One and half years equals a generation in the GPU world...

    I love this kind of articles where some future products are compared against current ones and declared as a clear winners...

    1. Re:Can this be taken seriously? by jovetoo · · Score: 2, Informative
      First of all, as another post says, the GPUs contain a video controller, DAC and so on. Second, the Cell will still be able to accelerate graphics performance by doing all kinds of vector pre-processing. Last, it will be a lot more easy for software companies to build PS3 games fast if they have somewhat the same computing/graphics environment as on a x86. Reasons enough, I think.

      But what struck me most is that you seemed to have missed the whole point the authors seeks to make. Yes, Moore's law will double the performance of the GPU within 18 months. So? It still does not give them the raw processing power of those Cells. Nor the scalability. (Damned! These things will be in you TV, your DVD, your stereo and they all cluster...) If these Cells really become low cost chips, I seriously doubt x86 will survive.

  8. Re:What's that? Microsoft isn't supporting it? by popo · · Score: 4, Insightful

    "No Apps"? Try every single video game publisher in the world.

    And besides, this isn't about "Office" style apps. Its about games, and more importantly: its about home media centers. I think the Windows MCE is going to have its rear-end handed to it by the PS3.

    When you consider that a cell-based PS3 could have a computational power of *several times* a 3 GHz Pentium...

    You have to ask, what's more likely: that Intel can get around IBM/Toshiba patents in time for Windows to conquer the living room with a faster box? (That's if they can even build a secure, stable OS with a decent UI). Or that Sony, now armed with the worlds fastest consumer-computing platform, an enormous user base and years of TiVO experience, will own the living room media center market.

    If I had to bet on who builds a better media-center PC .. Sony or MSFT... I'd say its absolutely no contest. Sony would crush MSFT. They have better interface design, fewer conflicting platform goals, and they'll put a PS3 in your living room for a fraction of what MSFT could.

    --
    ------ The best brain training is now totally free : )
  9. Re:What's that? Microsoft isn't supporting it? by metricmusic · · Score: 2, Funny

    listen to the wise old man:

    With great power comes loads of software

    --
    http://www.livejournal.com/users/metricmusic
  10. not a new architecture, and it's going to be tough by idlake · · Score: 4, Insightful

    This sounds like a little PVM-cluster-on-a-chip. It also sounds like it's a pain to program and will, in the short term, suffer from the same problems that Intel's Itanium suffers from: it tries to push too much work on the compiler or software developer.

    In the long term, it's nice that companies are exploring these kinds of architectures. It's not nice that they are trying to monopolize what are pretty straightforward architectural choices with patents. This may be a new CPU, but there is little that is new about having a bunch of fast processors interconnected via a reconfigurable network; these just happen to be on the same chip.

  11. Cool! by Jacco+de+Leeuw · · Score: 4, Funny
    85 Celcius operation with heat sink

    Well, perhaps "cool!" is not the correct response...

    --
    -------
    Warning: Slashdot may contain traces of nuts.
    1. Re:Cool! by Trimbo2 · · Score: 2

      +1 Funny btw :D

      On a serious note, The cell misses the point, we don't NEED any more CPU power, what we need is existing levels of power but without the need for excessive cooling and the fan noise that goes with it (I can hardly type this over the noise of my P4 3.0) !.

      Fast, Silent and power efficent is what's needed next.

    2. Re:Cool! by ponos · · Score: 2, Insightful


      85 Celcius operation with heat sink

      Well, perhaps "cool!" is not the correct response...


      It says with a heat sink only. Not with a fan!
      The last chip that worked without a fan was the 486DX33 and
      486DX40(I'm talking mainstream desktop PC hardware, not mobile solutions). You could probably stick a fan and get it down to
      40 degress, while a Pentium 560 will produce liquid plasma and/or a fusion reaction if operated without a fan.

      P.

  12. Is it just me? by morriscat69 · · Score: 3, Interesting

    Or does the logical extension of this chart:

    http://www.blachford.info/computer/Cells/Cell_Dist ributed.gif

    Make it look a little more like a HAL than a Cell?

  13. Re:What's that? Microsoft isn't supporting it? by darthdrinker · · Score: 2, Insightful

    Indeed, sounds to me like Sony's marketing behemoth is getting into top gears promoting cell in any way possible. Although this might not be directly connected to Sony. Wild claims and theorecal performance papers have been wrong in the past when it came to yet another product with mind blowing specs(Crusoe anyone).

  14. Cells everywhere! by mrgsd · · Score: 2, Funny
    The full specifications haven't been given out yet but some details [Specs] are out there:

    * 4.6 GHz
    * 1.3v
    * 85 Celcius operation with heat sink


    In toasters.. ovens..
    --
    End Communication.
  15. Is that really that revolutionary? by drgonzo59 · · Score: 2, Informative
    The idea of having many processing units in a personal workstation is not new. They thought that Moore's law was going to fail years ago and predicted that by now we would all have massively parallel machines at home on our desks. Well it turned out that Moore's law didn't fail and most importantly that many of the software algorithms are not easily parallelizable. So what if I can have 100 cells at home in my workstation. I could run SETI, weather or some other kind of simulation but I couldn't really play my video games much faster or have a more responsive user interface if I ever install Longhorn. I just can't think of too many programs run on home user's machines that would benefit form a parallel architecture.

    Now if the can be made very fast and have only a few (2-8) coupled together...well,as it was said, that is what a nice Opteron machine does anyway nowadays.

  16. Compiler technology by sifi · · Score: 4, Informative

    One question which was not addressed fully in the article was how do you compile/test programs for this thing.

    The potential of parallel architectures has never been in doubt since the early days of the Cray monsters - but how to compile code to use all the features efficiently has.

    I don't believe that we see the full advantage of these types of architecture exploited without some similar break-through in software tools.

    Mind you the hardware rocks...

    --
    Sig (appended to the end of comments you post, 120 chars)
    1. Re:Compiler technology by mr_jrt · · Score: 2, Interesting

      Program in a language that is referentially transparent.

      ...once you can assume that any function is able to be concurrently executed all you have to solve is the communication between processors/storage. The latency of current networking technologies makes this unpractical for general tasks, but this is less of a problem with a low-latency internal bus.

      Time to drag those Haskell textbooks out of the closet and dust them off. ;)

      --
      Boo.
  17. Serial and Parallel by Anonymous Coward · · Score: 2, Insightful

    No matter how well a processor or group of processors can run tasks concurrently it will always come down to the fact that most tasks are serial in nature and will not scale to a concurrent processing architecture. Aside from this developing multi-threaded software is extremely difficult and is rife with problems. Just ask any developer about the hardest problem to find/debug. It is pain incranate and some MT bugs can take 5+ days to find. People design serially, because a lot of tasks are essentially serial in nature, and until this design paradigm gets a major shift and we design parallel only software [LOL] then cell has no future.

  18. Re:What's that? Microsoft isn't supporting it? by a3atom · · Score: 2, Insightful

    Who cares? Mac OS X and Linux will provide all the applications required. Windows apps will be likely be available under emulation. The Windows market will still dominate but there will be a gradual migration when people realise there are cheaper/better realistic alternatives available at last.

  19. Re:Microsoft isn't supporting it? Who Cares? by fyngyrz · · Score: 2, Insightful
    don't give me crap about NT running on Alpha. It ran on 32bit version, and there was a early beta of W2k that ran 64bit native, but the Win32 API and everything else you use on your computer is and always has been x86-only

    It's not crap; we produced release versions of our graphics software for Windows on x86, PowerPC, MIPS and Alpha at one point. Shipped some, too. We had machines for all four architectures (still have them, in fact, though the Alpha and PowerPC's are mothballed), development tools, and working Windows OS's on all of them, and they all ran Windows NT, approximately the same version. Perfect, definitely not -- but Windows under x86 isn't perfect either. It worked well, certainly no worse than the x86 versions. We still use one of the MIPS machines as a backup file server. It refuses to die.

    Now, I'm no fan of Windows, but if you think MS couldn't port Windows to another architecture beyond x86, you're only fooling yourself. They can any time they want to, they have already, three times that I know of for certain, not counting whatever credit you want to give Windows CE ports, if any, and there you have it. For all I know there may have been ports to 68k archtectures... I wouldn't be in the least bit surprised.

    You have to consider that MS has more money than anyone, and if they decide to go this route, there is no reason to think they cannot do it. I doubt there is any market force, including Sony and the largest governments in the world, that could put a serious roadblock in front of them in this arena.

    --
    I've fallen off your lawn, and I can't get up.
  20. next please by aixou · · Score: 4, Insightful

    I'm sorry, but Sony can kiss my ass.

    This is from the company that said the Playstation 2 would have Toy Story quality graphics, and be able to render FF8 quality FMVs in real time (thus making FMVs no longer required). It was essentially that bullshit hype that killed the Dreamcast... so yeah, now they're at it again.

    Maybe I'll be proven wrong, but I doubt their system will be able to do anywhere near what they say it can in practical application.

    1. Re:next please by Rinikusu · · Score: 2, Interesting

      Sony can kiss my ass, too. But I'll probably be in the fucking line to buy it when it comes out. See you there?

      --
      If you were me, you'd be good lookin'. - six string samurai
    2. Re:next please by Abcd1234 · · Score: 2, Insightful

      That, of course, would be the brilliance of Pixar. Because they do cartoon-style animation, they can get away with a reduced level of realism (and thus decreased rendering complexity). This also means people are more... err, satisfied with their work. After all, there's nothing worse than seeing a poorly rendered human (in fact, there was an article on this very topic a while back, though I can't find the link), as opposed to a cutely rendered cartoon fish. Of course, that's not to say they don't do damned impressive stuff... the hair rendering in Monster's Inc. was really breathtaking, and their work in Finding Nemo was pretty impressive, too. But they definitely don't aim for photorealism.

    3. Re:next please by Jerf · · Score: 3, Insightful

      Maybe this time Sony will see fit to include that really high-tech mipmapping stuff, so their console isn't the King of Sparkle.

      (Stupid Sony, I've had my PS/2 for about a year now and I still notice it almost every time I play. Can't believe how unbelievably stupid they were not to include it. That one change, which by computer graphics standards is dirt cheap, would have massively improved its graphics. Anti-aliasing, on the other hand, is expensive to do right, so while I expect it on this next generation, at least while running in NTSC or PAL, I wouldn't have expected it in the PS/2 era. Though some managed, I think....)

      After that, I don't trust them any farther than I can throw them. The PS/2's graphics subsystem wasn't an Eighth Wonder of the World, it was an incompetent disgrace. Fortunately most of their fanboys are so stuck up the ass with Sony that it took them years to notice, instead of it jumping out at them in 5 seconds.

      I have it for the game selection, and I like the games, I like the controller, I like the case, etc... but the graphics are far, far worse than what they should have been. You have to reach back for years and years to find anything else that didn't do mipmapping.

      (I've also played the Dreamcast some more lately. It definately pumps out fewer polygons, but equally definately, they are higher quality polygons, and the fact that the Dreamcast clearly has mip-mapping is no small part of that. The PS/2 was a step forward in some ways, but a big step back from the DC in others.)

  21. How could they possibly do this cheap? by brett42 · · Score: 5, Insightful

    I'm willing to believe that a 4.6 GHz chip with 8 ALUs and high bandwidth memory would be fast, but even in bulk, there's no way they can afford to put 4 of them in a sub-$500 game console.

    I've been reading PR about the Cell for years, and nothing I've ever read has seemed even remotely plausible. Is there any objective information that even comes close to substantiating any of these claims?

    1. Re:How could they possibly do this cheap? by KirkH · · Score: 2, Informative

      Eh? MS is leaving x86 behind for the Xbox 2. They're going with some type of PowerPC based chip from IBM, rumored to be multi-core. ATI provides a custom graphics chipset that will not have a PC counterpart.

      Sony is going with Cell from IBM and an nVidia graphics chipset. So I don't see a huge difference. My guess is that both consoles will have extremely similar performance and this next generation of consoles will be the most boring ever -- lots of multi-platform games that look identical.

  22. STI by smallguy78 · · Score: 3, Funny

    i didn't understand any of the document, but damn it looks fast

    --
    Nothing costs nothing
  23. Re:What's that? Microsoft isn't supporting it? by Yoda's+Mum · · Score: 2, Insightful

    It only performs like 20 opterons in highly parallelisable tasks. Which excludes almost every task performed on the average PC, with the exception of some gaming graphics tasks (which, incidentally, are performed on specialised GPU's which vastly outperform x86 cores for their tasks anyway). Most of the time, a single cell core will perform pretty much identical to the single Power chip that controls it.

  24. Reason why IBM sold PC unit to China? by kyonos · · Score: 2, Interesting

    Wonder if IBM looks into the future and doesn't see PCs anywhere? Intriguing possibility.

  25. multicore, stream-processing, vector-oriented BS by YE · · Score: 5, Insightful

    While I tend to agree the Cell is an impressive architecture, this article is a steaming pile of B.S.

    No cache for CPUs? A breakthrough? Hello! Both PSone and PS2 have the so-called scratchpad, which is what the Cell seems to have: a cache which has to be managed explicitly by the programmer. Breaking news: This is a royal pain in the ass. And calculating bandwidth when reading from this tiny scratchpads makes about as much sense as calculating the speed at which a x86 processor can execute MOV EAX, EBX.

    Magically "the OS solves everything", and, in an obvious attempt to automatically get OSS-crowd support (is that "slashdot-trolling" or "slashdot-baiting"?) the triumph of Linux is predicted, because it's portable. Good luck getting the Linux kernel and GCC compiled, let alone running well on a massively parallel array of tiny CPUs without cache.

  26. Re:What's that? Microsoft isn't supporting it? by arivanov · · Score: 5, Insightful

    You have not read it. It will be on a specific class of tasks. It is similar to modern GPUs. They are faster then 10 opterons on a specific task.

    Back to the article. The guy seems to understand hardware, but he does not understand shit about software. Once he got past the first 3 parts he started babbling. Linux on cell, so on, so fourth. If he just read his previous parts he should have hit himself on the head. The only type of linux this can run is mcLinux. There is no memory protection as such. So no Linux, no Windows past 2000, no MacOS past X, so on so fourth.

    Similarly, it is all nice and well about cell software beasties making herds by themselves and cooperating on a task. I am going to be a spoilsport and ask a nasty question: Err.. What about a security model? Memory protection? Privilege model for communications? So on so fourth...

    To continue on this, the power of a modern general purpose OS is the task switching. How long does it take to load and store the context of the vector processing units? Doing so requires moving their dedicated memory to main memory. This will take ages.

    Overall, this is a design similar to Cray 1 initial design. Cray initial design smashed the IBM, DEC (and lesser fish) monopoly on big computing iron to bits. Unfortunately the next thing the people buying the Cray asked for was "can we share this resource between two people?". The answer was provided eventually, but by the time Cray could do all the nifty time sharing and memory management tricks necessary to do this its advantage was no longer phenomenal. And all people who could use Crays for single tasks with manual scheduling actually continued to use it that way. But it did not even dent the general purpose big iron market.

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  27. Unfair comparison by Stripsurge · · Score: 3, Interesting

    Since the main goal of the chip is to pump through graphics, regardless of what device its in, a GPU is better grounds for comparison.

    From TFA: "Existing GPUs can provide massive processing power when programmed properly, the difference is the Cell will be cheaper and several times faster."

    Its supposed to do 250GFlops when? 2 years from now? Apparently the Geforce 6800 Ultra will do 40GFlops and thats today.... extrapolate with some doubling here and there it seems a lot more reasonable.

    So the big thing is that it comes down to programming. It came up a few times in the article "Doing this will make it faster but will make for one hell of a time for the programmers" It may have a huge potential but may take a while to get everything efficiently as Sony would like. Reminds me of when the GF3 first came out and was beaten by the GF2U in some tests. IIRC it took a while for games to come out that took advantage of its programability. It'll be interesting to see how well the programmers can fair between now and Cell's release.

  28. Re:What's that? Microsoft isn't supporting it? by Ash-Fox · · Score: 2, Informative

    AROS probably could run on it.

    --
    Change is certain; progress is not obligatory.
  29. It needs some serious software to work! by ponos · · Score: 3, Interesting

    There are several assumptions that lead to tremendous theoretical performance figures. The simple fact is that like the Itanium, the Cell processor depends on some rather complicated software that will solve issues like parallelism, coherency etc. The article clearly states that the Cell architecture is a combination of software and hardware (1st page). This is good because performance can always increase (via a better OS or microcode) but it is also bad because it means that initial versions may not stand up to their performance claims.

    Also, let's not forget that developers will be unable to keep up, unless some highly sophisticated libraries and languages are made available. I really don't expect the majority of developers to be able to cope with massive parallelism from the beggining (not just 2x SMP or hyperthreading, this needs a totally different mindset).

    To sum this up: the hardware will deliver, but the software is a critical unknown in the equation. I have faith in IBM ;-)

    P.

  30. Locked Up by DingerX · · Score: 2, Interesting

    I read all five sections at once, intending to stream each chapter through separate phases from character recognition to criticism. Unfortunately, every time the article used "it's" in a predicative sense, everything ground to a halt.

    Fortunately, cell reading meant I hardly noticed the claim that hardware would compete with the x86 because, unlike the x86, cell computers need all their software written for the specific hardware.

    I like how "hardware-specific" becomes "OS-independent". Great I can plug my HDTV into my G/Fs "electrically powered adult novelty device", and harness the extra computing power to find out we are really alone in the world. Of course, no firmware will stand in the way.

    I'm also surprised that, in pandering to all the OS underdogs in the slashdot crowd (Great day for Apple, since they like G5s; Great day for Linux, since many obsessive-compulsive coders work on Linux projects anyway), he left out a true lightweight OS designed from the ground up for just this sort of multitasking: Amiga OS 4.0. To get something like this to actually work, you'll need more than iPod huggers, OSX preachers or Linux fans. You need genuine madwomen and madmen. You need AmigaOS.

  31. Re:What's that? Microsoft isn't supporting it? by binary42 · · Score: 2, Insightful

    There is memory protection. Read the whole thing. What I think bit you was the fact he said there was no virtual memory... well even then his wording is confusing as virtual memory is just swaping out pages of memory as you need more. This can be done on the Cell. What I think he is talking about it adress translation. Paging hardware must not implement a full LogicalAddr==>LinearAddr==>PhysicalAddr paging/segmentation unit(I have not read the patent myself). He mentions that during runtime the adress must be physical/real and that, when running on an APU, they may be given access restrictions. I must regress though and tell you that I am no expert either. The OS is in for quite a bit of work when dispatching apulets as i can see adjusting addressing and other things will be as interesting (or more) as different scheduling mechanisms are today in current systems. To get a secure system out of this will require protected memory and if i remember correctly the Cell may be capable of running multiple OSs in parallel VMs. This can be explained by considering that IBM has their own software layer that ones OS would talk to (at least the article made it seem that way). Its amlost like having a micro kernel (or exokernel in some ways) that then have real things atached to it. Like linux for example. Linux can already be run in user mode and even ontop of the L4 micro kernel. Linux has shown to be portable enough (along with most good modern software). I would not have any doubt in seeing this happen with IBM.

    --
    ruby -le"32.times{|y|print' '*(31-y),(0..y).map{|x|~y&x>0?' .':' A'}}"
  32. Re:What's that? Microsoft isn't supporting it? by LarsWestergren · · Score: 2, Insightful

    Maybe you (and others) haven't noticed, but the desktop PC is a deer in the headlights. Game machines will take over before you can say 'service contract'.

    Pft. People have been saying this every time a new console generation is coming. When the upcoming Playstation 2 was hyped, some people were claiming it would easily emulate a PC at many times the speed of an x86. When it came, people couldn't take full advantage of the hardware. When they could some years later, PC hardware had surpassed it. Besides, people value the flexibility of a PC. In other words, bs then, bs now.

    --

    Being bitter is drinking poison and hoping someone else will die

  33. Nicholas Blachford is an idiot. Please don't read. by Anonymous Coward · · Score: 5, Funny

    Nicholas Blachford is an idiot. Do not read any of his articles. Just to give you the best of Nicholas, read his antigravity article and visit his web site:

    http://www.blachford.info/quantum/gravity.html

    Also, look at the nose pictures of him ;)

    http://www.blachford.info/other/me.html

    Seriously, the guy has burned most of his sane braincells.

    For serious laugh, read his article series 'building the next generation' from osnews. I really got good laughs from that 4 part series.

    Also, it didn't take long to spot a totally idiotic statement from todays slashdotted article:

    > Parallel programming is usually complex but in this case the OS will look at the
    > resources it has and distribute tasks accordingly, this process does not
    > involve re-programming.

    Here Nicholas misses the core problem of parallel programming. The program algorithms _always_ have to made parallel. The OS can't do it.

  34. Re:What's that? Microsoft isn't supporting it? by Glock27 · · Score: 4, Insightful
    Back to the article. The guy seems to understand hardware, but he does not understand shit about software.

    This part I agree with. His statements regarding abstraction are just flat out incorrect. Is this going to be programmed in assembly only? I think not...and if not there is significant abstraction involved. The thing that's closest to his point is that multiple *layers* of abstraction tend to add significant overhead. That doesn't mean that program-level abstractions do.

    Once he got past the first 3 parts he started babbling. Linux on cell, so on, so fourth. If he just read his previous parts he should have hit himself on the head. The only type of linux this can run is mcLinux. There is no memory protection as such. So no Linux, no Windows past 2000, no MacOS past X, so on so fourth.

    There is memory protection if the PU is in fact "something like a G5". IBM would have to be insane not include a MMU, and it has already stated that it's going to build workstations based on the Cell architecture.

    All in all, interesting stuff...we'll see how it plays out. :-)

    To continue on this, the power of a modern general purpose OS is the task switching. How long does it take to load and store the context of the vector processing units? Doing so requires moving their dedicated memory to main memory. This will take ages.

    This, of course, depends on how many cells are in the box (with 8 vector units per cell) and how many tasks need vector units. The main purpose of the vector units in an interactive workstation will be multimedia processing. How many multimedia applications can you view at once? For me, the answer is one. The vector units may be useful for other things like engineering simulation and pattern matching, but once again how many different tasks using those features will be running at once? Plus if the processors are cheap enough to put 4 in a Playstation, one hopes the workstations will have 8 to 32 of them.

    Overall, this is a design similar to Cray 1 initial design. Cray initial design smashed the IBM, DEC (and lesser fish) monopoly on big computing iron to bits. Unfortunately the next thing the people buying the Cray asked for was "can we share this resource between two people?". The answer was provided eventually, but by the time Cray could do all the nifty time sharing and memory management tricks necessary to do this its advantage was no longer phenomenal. And all people who could use Crays for single tasks with manual scheduling actually continued to use it that way. But it did not even dent the general purpose big iron market.

    Two points. First, this is based on an already successful processor - the Power series. It already multitasks :-) and is used in a wide range of applications. Second, this will be a low-cost part. Crays were a super high-end system, which cost millions of dollars. Your analogy doesn't work.

    --
    Galileo: "The Earth revolves around the Sun!"
    Score: -1 100% Flamebait
  35. Re:OK, it's theoritically faster than PCs. So? by nagora · · Score: 2, Insightful
    Firstly: all your points are addressed in the article.

    Secondly: anyone that buys a PC to play games on has more money than sense and is quickly parted from the latter.

    TWW

    --
    "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  36. Re:Microsoft isn't supporting it? Who Cares? by Anonymous Coward · · Score: 4, Informative
    Microsoft is still dicking around with porting Windows to AMD64... a platform mostly compatable with x86. (don't give me crap about NT running on Alpha. It ran on 32bit version, and there was a early beta of W2k that ran 64bit native, but the Win32 API and everything else you use on your computer is and always has been x86-only)

    There are two operating systems Microsoft have developed called Windows. DOS/Windows, the original one, was based on an x86 clone of CP/M that Microsoft bought. The first version, "Windows 1.0", was released in 1985. The last version, called "Windows Me", was released in 2000, IIRC. This OS was always x86-only, originally ran on archaic CPUs without memory protection and never supported full protected memory, symmetric multiprocessing or other (now) basic OS features.

    The second OS developed by Microsoft that's marketed as Windows is Windows NT (now just called "Windows"). It was started in 1988, and never had any relation to DOS/Windows, except insofar as it can (to some extent) emulate it for compatibility reasons (including an x86 emulator on hardware that can't natively execute x86 code). Windows NT was developed on the MIPS platform, not the x86. The original plan had been to use the Intel i860 (an LIW architecture completely different from the x86) as the development platform, but the i860 hardware never met its promise, so MIPS was chosen instead.

    The first version of Windows NT was released in 1993, and called "Windows NT 3.1" (3.1 was used for marketing reasons, since that was the latest version of DOS/Windows at the time). Like UNIX, it was mostly written in C, with assembly at the low level to handle hardware dependencies. At its release, Windows NT 3.1 ran on 32-bit MIPS (the development platform) and 32-bit x86 (the first port).

    The second version of Windows NT (3.5) was released in 1994, and planned to add 64-bit Alpha (in a semi-crippled, 32-bit mode) and 32-bit PowerPC. However, IBM and Motorola ran into problems with the hardware (in part because of ongoing disagreements with Apple, who wanted to use their own, proprietary platform), so Windows NT 3.5 only added Alpha support. In 1995, after IBM and Motorola had managed to (mostly) sort out their problems (but with Apple declining to follow the IBM/Motorola PReP standard), the PowerPC port of Windows NT was completed, and released as version 3.51. At this point, the OS ran on MIPS, x86, Alpha and PowerPC.

    In 1996, the user interface of Windows NT was upgraded to match the user interface of the popular 4.0 release of DOS/Windows (called Windows 95). Windows NT 4.0, which copied the user interface of DOS/Windows 4.0, ran on MIPS, x86, Alpha and PowerPC.

    By the late 1990s, as Microsoft continued work on version 5.0 of Windows NT, the market had lost confidence in non-x86 systems for general-purpose PCs (apart from Apple Macs, which didn't follow the PReP standard, so couldn't run OSes ported to it, like AIX and Windows NT). As a result, Microsoft and the vendors of MIPS and PowerPC workstations agreed to cease development and marketing of NT 5.0 for those platforms. Windows NT 5.0 continued to be developed for the x86 and DEC Alpha architectures, into the beta releases.

    DEC (which was taken over by Compaq) had continued to have hope for the Alpha as a general-purpose alternative to the x86, but financial difficulties led to the project being abandoned towards the end of the developent cycle for Windows NT 5.0 (marketed as "Windows 2000"). As a result, Windows NT 5.0, completed at the end of 1999, was the first version of NT that only ran on one platform (the x86).

    A port of Windows NT 5.0 to the 64-bit Intel Itanium, including 64-bit versions of the Windows APIs (unlike the earlier Alpha port), was released in 2001, but only to select customers.

    Windows NT 5.1 (marketed as "Windows XP) was also released in 2001, and again only ran on the x86, apart from another 64-bit limited release for Itanium (in 2002, IIRC).

    Windows NT 5.2 (marketed as "Windows Se

  37. Compiler technology - OpenMP by S3D · · Score: 2, Informative


    One question which was not addressed fully in the article was how do you compile/test programs for this thing. The answer is OpenMP. OpenMP is mulithreading API wich can hide parallelization from the user almoste completly. It's embarassingly easy to use - only one line of code is enouth to parallelize a loop. All threads creation/synchronisation remain hidden from user. It's extremly efficient too - I was never able to achime the same level of performance if duing multithreading myself.

  38. Re:OK, it's theoritically faster than PCs. So? by Siker · · Score: 2, Interesting
    So what will this tremendous power be used for? Since the GPU will handle the rendering task, what will the vector units do (the vector units is where the power of the system is)?

    Actually, the CPU speed has a lot to do with graphics speed. If you look at recent performance charts for nVidia's high end GPUs in SLI setups you will find that their performance levels off unless you run the absolutely highest resoultion with top filtering and antialias settings. In fact, the high end cards are still CPU limited at the highest settings for many but the most recent games. [Tom's Hardware Guide]

    In addition, programmers will always find things to do with additional CPU power. Ray traced occlusion culling to reduce the number of polygons sent to the GPU is one idea if you have extreme amounts of processing power just sitting around. That in turn would allow you to use extremely advanced pixel shaders as overdraw is almost eliminated. It would also allow you to add a few more polygons to every scene, knowing that most polygons are correctly culled.

  39. Re:4.6 Ghz ? I don't belive it by rob_osx · · Score: 2, Interesting
    Look at this article and then believe it.

    http://www.siliconvalley.com/mld/siliconvalley/103 23259.htm

    IBM has made the Cell for servers and embedded applications. I don't know much about the author of the article, but the Cell will change computing.

    Here's my analysis on why Apple will use the Cell http://www.siliconvalley.com/mld/siliconvalley/103 23259.htm

  40. Re:4.6 Ghz ? I don't belive it by rob_osx · · Score: 2, Informative

    My link to the analysis of Apple's use of the Cell was wrong. http://www.tweet2.org/wordpress/index.php?p=13

  41. Low on substance... by STratoHAKster · · Score: 2, Insightful
    The author obviously has neither access to any Cell hardware nor any substantial information yet makes the claim that a SETI unit on a single Cell would take 5 minutes to process. Yet in 'references' we read;
    5 minutes for a SETI unit? This could be completely wrong... It is based on the difference between a 1.33GHz G4 (6 Hours / unit @ 10 GFlops) and a 250 GFlops Cell, this assumes the SETI client is using Altivec on the G4 at full speed and the PS3 has 4 Cells. I rounded up to 5 minutes to be conservative.
    Not a scientific paper. Move along...
  42. fanboy article by egarland · · Score: 3, Insightful

    This was not a technology article. That was a "I for one, welcome our new cell processor overlords.." article.

    I don't see anything in the cell arcitecure that would fundamentally make the same number of transistors at the same speed operate faster. I see lots of bottlenecks, IO overhead and wastet transistors. If there is some magical powerful thing that these can do SO much better than the current X86 instruction set and hardware, guess what, it'll adapt.

    x86 adapted to RISC being "wildly faster" and, in the end, became better RISC than RISC was by translating more memory efficent X86 instruction onto a RISC backend. It adapted to SIMD (Single Instruciton, Multiple Data) efficiency issues by adding MMX/MMX2/SSD/SSD2 and 3DNow. It adapted to the reality of 64 bit address space and the need for more registers with the new X64 instruction set extensions. AMD and Intel could add cell hardware and instructions too if they offered anything special, which I highly doubt they will.

    --
    set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
  43. Re:OK, it's theoritically faster than PCs. So? by mwvdlee · · Score: 2, Interesting

    Since the Cell processors are basically arrays of vector processors, quite similar to the shader units in GPU's, I suspect NVidia will just implement the specialized low-level 3D stuff and leave all the shader work to be done by the Cell processors.

    So basically you'll have a fixed graphics core which isn't likely to change (since it hasn't for the last couple of years) and an extremely flexible and powerful array of shader units.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  44. Wouldn't IBM design a JIT for Java for this thing? by buddhaseviltwin · · Score: 2, Interesting

    Considering how much IBM has invested and banked on Java, wouldn't you think they would try to design a virtual machine that would take advantage of this architecture? I wouldn't expect the JIT to be able to parallelize (???) everything, but I would think it would know how to detect and translate certain segments of code which are easy to translate to a parallel architecture.

    I don't know about you, but when I first heard about cell processors (and that fact that IBM was behind it), I immediately began speculating how IBM would exploit this architecture in their server market. This sounds like the sort of thing that will enable them to sell 256 processor monsters running AIX, DB2 and J2EE.

    Even if designing to take advantage of this architecture is terribly difficult, just porting your webserver, database server, and transaction will solve the scalability issues for most Web/Client/Server applications.

  45. Very interesting... by It+doesn't+come+easy · · Score: 2, Insightful
    I believe this development (the CELL processor) is related to IBM's recent sale of their PC business...I will bet that they are counting on the CELL to revolutionize the state of the CPU, and that demand for the old style chips will begin to decline. The timing looks like they struck while the iron was HOT and got the best deal for old tech that they could (plus, it should also mean they can focus more resources on something they think is the future of processors and computing in general).

    Wonder if they have also been working to optimize Linux for the CELL processor? I for one will be watching this very closely...

    --
    The NSA: The only part of the US government that actually listens.
  46. x86, Apple etc Vs Cell my arse by theolein · · Score: 2, Interesting

    I'm not actually surprised that so-called journalists, especially the technical kind, get good salaries. If you look at the painful clowns running the show at ZDNet, and most technical publications for that matter, including such wonder rags, such as the Register, you know that the Agenda is almost the most important thing. The actual realities of the tech world be damned as long as you have someone passing you your monthly wad of cash.

    And this story is no different.

    As many have noted, Sony did exactly this kind of hyping the last time around when the PS2, with its emotion engine, was supposed to be the future of all things computing. As everyone knows, the PS2 was a real pain to code for, and the actual performance was not better than the PC's of the day. The Cell will undoubtedly suffer from the same problems when it comes to coding real applications. Concurrency and parrallelism do not an easier coding experience make.

    I have no doubt that this thing will be good, but I absolutely doubt that it will have much or any effect on the x86 world of computing. The G4 processor, when it came out with the Altivec SIMD processsor, which was apparently better than SSE at the time didn't turn Apple into the next Microsoft overnight either, did it?

    So, I expect that the x86 world will continue to thrive and that Apple will stick some of these Cell processors, having as they do a PPC 970, aka G5, in their core, in some of their machines and will make the usual wild RDF claims about how hot it is while it will be used by only a small fraction of actual Mac developers in reality, the Mac having to maintain backward compatibility only slightly less then the x86 world does.

    In other words, it'll be business as usual.

  47. Re:Nicholas Blachford is an idiot. Please don't re by fitten · · Score: 2, Informative

    Lots of people have been working on auto-parallelizing compilers. The idea is to take existing code that isn't parallel and during compile time (or run time) make those decisions intelligently and speed up processing. So far, there have been zero successes at it without explicit user directives to tell the compilers where good targets for parallelization are and how to do it (specifically creating threads and/or marking loops that can be parallelized).

    If you (or anyone) can solve this problem well, you'd be famous and wealthy beyond the dreams of avarice (assuming you patent it and license it out :))

  48. Think another way by marcus · · Score: 2, Interesting

    The current processing bottleneck, and the reason for caches in the first place, is the bandwidth between the processor and the memory. A "normal" memory bus cannot keep up. This is why you see so many attempts to speed this particular part of the system up. There is RAMBUS, DDR, even HyperLink.

    What these guys are trying to do is move the processor to the memory rather than the inverse. Having fast expensive caches near the processor is an attempt to get the memory closer to the proc. What has been happening of late is that lots and lots of on-chip transistors have been spent on the cache. The Cell architecture is a step in the other direction. They want to spend those transistors on processors instead of memory.

    At the limit of this idea you would see something like a super-granulated architecture with a processor on each memory chip. Imagine a PC with 32/64/whatever cell processors *and* no classic "processor socket" on the motherboard, just some DIMM-like "cell" slots. Each proc would have exclusive access to the memory on its own chip and all would communicate via some sort of bus or fabric of links. So, instead of one mega proc with tens of millions of transistors(perhaps half would be cache) at 4GHz with a 400MHz x 32, 64, 128, whatever bit width memory bus you'd have maybe 64, 128, 256? simple ARM-like procs at 400MHz each with something like 400MBs or more available memory bandwidth per proc.

    Of course the extreme limit would be to have millions of 1 bit processors, but I don't think that anyone is proposing that just yet. Things do get more and more neuron-like as you approach this limit, interesting eh?

    --
    Good judgement comes from experience, and experience comes from bad judgement.
    - W. Wriston, former Citibank CEO
  49. Re:Wow by GileadGreene · · Score: 2
    Now you have to write what effectively amounts to large amounts of multithreaded code - behaving cooperatively on a system with an unknown number of nodes.

    Now you either need:
    a) A really intelligent compiler
    or
    b) A really intelligent programmer

    or
    (c) A language and corresponding underlying concurrency theory that allows you to design and analyze complex interacting multithreaded systems with ease.

  50. IBM said Sell not Cell. by Pinback · · Score: 2, Funny

    It is all a big misundertanding between Sony and IBM.

    IBM told Sony it was going to "Sell" its PC busines. Sony has been telling everyone about IBM's "Cell" PC ever since.

    Seriously though: For all we know, the PS3 may have four cells. (One CPU core, and three "APU" cells.) One APU for the boobs, one APU for animated low polygon count "hair", and one for inane dialog.

    Maybe the new splice() based pipes in Linux can be used to move data between APUs.

    Ever notice how much in common the Gamecube and the Mac Mini have in common?

  51. Seeing DRM in Cells? by MonkeyBoyo · · Score: 2, Insightful
    What nobody has objected to is the IBM claim that the architecture has:
    On-chip hardware in support of security system for intellectual property protection.
    Much of the curent discussion has been on how to program and coordinate all the little digital signal processors (DSPs - aka DPUs). I think these questions are moot because the envisioned DRM (digital rights management) will make the "cell data" and "cell programs" uninspectable. Even by the the on-chip PUs (processor unit - something like a power-Mac running Linux).

    In other words, media data and processing algorithms will be behind an impenetrable DRM hardware wall. "Cell programs" (the little vectorizable data manipulators) will be trade secretes. Outsiders that want to program something new will only be able to string together DRM approved cells. For example, there might be an approved MPG6 cell that will report meta-data found initially in a MPG6 stream but Rights Management interests will never permit any cell that exports all of the MPG6 data.

    Why does the recommended single chip PE (processing element) include 8 DPUs? My guess is that a certified library of Cell Programs will not allow anything to be sent off chip that is not strongly encrypted. Thus one might have an 8 DPU chip where 3 are used to decrypt the input, 2 to do the actual processing, and 3 are used to encrypt the output. This off-chip disadvantage is a strong reason for putting multiple PUs and their 8 DPUs on one chip - If intercommunication between Cells cannot be detected externally then there is no need for the encryption/description stuff.