Slashdot Mirror


Prospects For the CELL Microprocessor Beyond Games

News for nerds writes "The ISSCC 2005, the "Chip Olympics", is over and David T. Wang at Real World Technologies put a very objective review of the CELL processor (the slides for the briefing are also available), covering all the aspects disclosed at the conference. Besides the much touted 256 GFlops single-precision floating point performance the CELL processor has 25-30 GFlops in double-precision, which is useful enough for scientific computation. Linus seems interested in CELL, too."

246 comments

  1. I'll believe it when I see it by chris09876 · · Score: 5, Insightful

    This is a very positive review for the cell processor. It does seem like a really exciting new piece of technology. It promises a lot, and if it will do everything people say it will do, it really has the possibility to give the entire industry a big leap forward.

    That being said, I think it's important not to get too excited about it... it's hard to say if it will live up to everything that people have written about it. I'm a bit skeptical. Until I see some production units doing amazing things, I'm cautiously optimistic.

    1. Re:I'll believe it when I see it by Neil+Watson · · Score: 2, Insightful

      I too, am skeptical. Especially when I see Rambus mentioned. I keep looking around expecting to see a school of their lawyers circling. Biding their time until before a patent law suit frenzy.

    2. Re:I'll believe it when I see it by BobPaul · · Score: 5, Insightful

      That being said, I think it's important not to get too excited about it... it's hard to say if it will live up to everything that people have written about it. I'm a bit skeptical. Until I see some production units doing amazing things, I'm cautiously optimistic

      I'm a little bit concerned about the PowerPC Element. The article states that it's not simply a Power5 derivative, but a core designed for high mhz at the cost of per stage logic depth. To quote the author: "The result is a processing core that operates at a high frequency with relatively low power consumption, and perhaps relatively poorer scalar performance compared to the beefy POWER5 processor core. "

      The means the PPE in the CELL @ 4Ghz will not perform as well as a Power5 would could it reach 4Ghz (but since the CELL has 8 SPEs, I would hope it performs better as a whole than a POWER5 at the same frequency). It would be interesting to know at what frequency the two are similar, but since the PPE is integrated into an extended system, this isn't something that can ever really be benchmarked.

    3. Re:I'll believe it when I see it by sjf · · Score: 3, Informative

      They licensed technology from Rambus.

    4. Re:I'll believe it when I see it by Anonymous Coward · · Score: 1, Insightful

      >I would hope it performs better as a whole than a POWER5 at the same frequency

      Better vector performance, but not so good for Excel.

    5. Re:I'll believe it when I see it by BobPaul · · Score: 4, Interesting

      I keep looking around expecting to see a school of their lawyers circling, biding their time until before a patent law suit frenzy.

      I'd be more worried about that if they DIDN'T use Rambus's technology. Rambus can't sue someone who's licensing their tech... they can only sue someone they THINK is using tech too similar to theirs without licensing it. If cell used some sort of DDR or maybe an inhouse memory tech instead, maybe then Rambus would try to sue.

    6. Re:I'll believe it when I see it by Anonymous Coward · · Score: 0

      The thing is, it is much simpler, but that has good effects as well (lower latency, etc.), it is dual headed, and so on. If you IPC drops by 25%, but your clock and memory increases by 25%, you're way ahead. If they can keep it fed, this thing will eat up Excel and other things.

    7. Re:I'll believe it when I see it by adam31 · · Score: 4, Interesting
      The fact is that this will be a much more difficult processor to program efficiently for. This is the same situation that faced developers when the PS2 came out. It's taken game developers 4 years to finally tame the beast, and this chip is everything that made PS2 programming difficult, times 8.

      But look at the graphics in PS2 games now compared to 1st gen titles. The improvement is incredible! The hardware hasn't changed: it's still just a 300Mhz cpu with 4MB graphics and no pixel shading. I think we'll see the same maturation process with Cell/PS3, where the 1st gen games don't live up to the hype but more and more of the Cell's enormous potential is realized with successive generations.

      The question is whether Sony decides that part of the slow evolution in efficient PS2 programming was because of the small, exclusive development community. I would love to see Sony push a Linux PS3 similar to the version of Linux PS2 they released.

    8. Re:I'll believe it when I see it by Frobozz0 · · Score: 1
      The means the PPE in the CELL @ 4Ghz will not perform as well as a Power5 would could it reach 4Ghz (but since the CELL has 8 SPEs

      No, it means it might not. The author suggested his opinion was up to debate. However, it's important to note the different design goals of a Power5, A 970 (G5), and a Cell. They have different needs, and for general purpose computing I think Cell will hold up just fine.

      --
      "Politicians find new names for institutions which under old names have become odious to the people."
    9. Re:I'll believe it when I see it by Anonymous Coward · · Score: 0

      Hasn't Playstation always been a good reference customer for Rambus? I bet the license terms are pretty attractive for the high-profile, loyal customer that Sony is.

    10. Re:I'll believe it when I see it by xero314 · · Score: 4, Interesting

      The PS3 should not have nearly the problems that the PS2 had in regards to it's difficulty of development (a.k.a. Lazy developers). Because Cell is a joint project by IBM, Toshiba and Sony it will have a much larger install base. Rather than being a specialized chip for a specialized system, it is to be a general chip useable in many systems. These means more people will be programing for it, not just game developers which are notorious for there lack of desire to change (hence why the 68000, 6502 and z80 were so popular for so long). Cell chips should end up making it into systems designed for scientific computing, where developers (a.k.a. computer scientists) will be willing to take more chances and dig deeper into the architecture.

      We will see some of the typical ramp up time in cell programs but being as the cell, if you beleive what you read, is so far above and beyond other modern processors (and that lazy developers for the PS3 can always let the NVIDIA GPU carry the load in a more traditional fashion) we should see leaps and bounds in program performance fairly quickly.

    11. Re:I'll believe it when I see it by fm6 · · Score: 1
      That being said, I think it's important not to get too excited about it... it's hard to say if it will live up to everything that people have written about it.
      That's a reasonable attitude towards any new technology. There's always a difference between how something will perform on paper and how it will perform in the real world. And that's assuming that we have a serious innovation, like this one, rather than the vague hype that's much more common.

      Still, we can hope. In computing, change and innovation seems to be the only constant. Predicting which innovations will catch fire is the hard part.

    12. Re:I'll believe it when I see it by CTho9305 · · Score: 2, Interesting

      It's worth noting that various research papers have done analysis to determine the optimum level of pipelining, and found about 6 to 8 FO-4 gate delays* per stage is optimal - Intel's cancelled Tejas processor was apparently around there and would likely have run at similar clock speeds to the Cell processor. Note that in the real world, you hit other limitations earlier - right now, the main issue is power: chips that fast just run too hot.

      *a FO-4 gate delay is a "fan-out of 4 gate delay" - it's the amount of delay from one inverter (NOT gate) which drives 4 identical inverters as load.

    13. Re:I'll believe it when I see it by Anonymous Coward · · Score: 0

      Paper 1, Paper 2

      Replying AC because it's poor form to reply to yourself ;)

    14. Re:I'll believe it when I see it by jdb8167 · · Score: 1
      According to the paper the per stage circuit delay is 11 F04 throught the entire design.

      Figure 2 - Per stage circuit delay depth of 11 FO4 often left only 5~8 FO4 for logic flow

      The author of the article seems to think an 11 F04 is pretty aggressive.

    15. Re:I'll believe it when I see it by jericho4.0 · · Score: 1
      "game developers which are notorious for there lack of desire to change"

      Tell me about it. All the game developers I know are always "640k polygons a second should be enough for anyone!", and "pixels smaller than your thumb detract from gameplay!" or "why would anyone want stereo!?". Losers. Developing finacial software is so much more bleeding edge. Why, some of our kids don't even know FORTRAN! They don't even realize that it was the demand for bigger and bigger spreadsheets that delivered those fancy video card GPUs!

      --
      "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    16. Re:I'll believe it when I see it by cpeterso · · Score: 1


      It would be interesting to know at what frequency the two are similar.

      0 MHz?

    17. Re:I'll believe it when I see it by taniwha · · Score: 1
      looks like they've licensed the latest RAC from Rambus ... they won't be being charged that much for it - I bet Rambus expects to make money on the DRAM side if they end up using it.

      For all their (business) faults Rambus makes cool technology - in particular stuff that allows parallelism in the CPU to be exposed to the memory hierarchy (or vice-versa) - but their hardware hasn't worked well with existing CPUs (x86 for examples) because of the bottleneck that the FSB in traditional designs presents. To use the real power of these sorts of memory subsystems can give you need to be able to attach the memory system to the CPU directly so you can retire transactions out of order

    18. Re:I'll believe it when I see it by master_p · · Score: 1

      But PS2 games' graphics are almost horrible compared to a top end PC's graphics right now. If it took 4 years for PS2, how long will it take for PS3 to reach its graphics height? and what will PCs be able to do at that time?

    19. Re:I'll believe it when I see it by Mycroft_VIII · · Score: 1

      I was under the impression the Rambus people were just an IP company, they don't make anything at all and derive thier income from licenses on patents they hold and litigation where they believe (or at least want believed) thier patents have been infringed.

      Mycroft

      --
      https://signup.leagueoflegends.com/?ref=4c3ed6600b6ea
    20. Re:I'll believe it when I see it by taniwha · · Score: 1

      well yes and no ... everything's IP these days .... while they do license their patents they also design and implement (and then license) the cells that chips need to talk to RamBuses with - they're difficult, somewhat evil analog/multi clock domain designs that live in a die's pad ring and tend to run on the hairy edges of the current processes - something you'd rather have someone else design for you (I've used a few over the years). [disclaimer - don't work for Rambus, don't own any stock, I'm a mostly happy past customer]

  2. Transmeta by Anonymous Coward · · Score: 3, Funny

    Why should Linus be interested in the cell when he has the Transmeta Crusoe?

    1. Re:Transmeta by Anonymous Coward · · Score: 3, Informative

      Transmeta isn't doing the low heat processors anymore. Quoted from http://arstechnica.com/news.ars/post/20050105-4501 .html .

      CPU manufacturer Transmeta, known for their low-power processors, is evaluating an exit from the CPU market. Instead of manufacturing chips themselves, their business focus would shift towards buzzwords: licensing their intellectual property and the formation of strategic alliances to utilize their processor design as well as their research and development skills.

    2. Re:Transmeta by mirko · · Score: 3, Insightful

      These are not buzzwords : ARM have been doing this for years and are a very profitable R&D company.

      --
      Trolling using another account since 2005.
    3. Re:Transmeta by Anonymous Coward · · Score: 0

      Rambus too! And look who's on the CELL!

    4. Re:Transmeta by BobPaul · · Score: 2, Interesting

      Transmeta isn't doing the low heat processors anymore. Quoted from http://arstechnica.com/news.ars/post/20050105-4501 .html .

      Just because they aren't manufacturing anymore doesn't mean they're exiting the business entirely. There just might not be a "Transmetta" anymore. Instead there will be something like an "Intel Pentium 5 using lowerpower Transmetta Technology" (well probably not, but you get the idea.)

      Transmetta will be doing R&D for low power processors for years to come, I'm quite sure.

    5. Re:Transmeta by shurdeek · · Score: 1

      Linus quit Transmeta over 1.5 years ago dude.

    6. Re:Transmeta by Anonymous Coward · · Score: 0

      One "t", my man, one "t".

      Seriously though, I'm not the only (or first) one to think their potential strength isn't in low power -- which even Intel learnt to do very well eventually, thanks to the brilliant Israel team behind the Banias -- but in the "code morphing". (Which idea they borrowed from the Russian "Elbrus" architecture.)

      They should put out a system (processor + chipset) where you can switch CPU architectures upon reboot or heck, on the fly even. Nice for developers working on ports, nice for admins handling a switch to a new architecture (e.g. from SPARC to x86), possibly nice for a consolidated server running the company's whole happy combo codebase enchilada in a single server with dynamic partitioning and single-point administration...

      Okay, I'm daydreaming.

  3. Jeez. by game+kid · · Score: 1
    (TFA) the CELL processor has 25-30 GFlops in double-precision

    That's just sick (I think). Even cooler for Mac users who'll like the "dual threaded" PowerPC core of it, no? Can't wait for that PS3...

    --
    You can hold down the "B" button for continuous firing.
    1. Re:Jeez. by Anonymous Coward · · Score: 0


      A new NVIDIA videocard is rated at '76GFLOPS'

      That stuff doesn't mean jack.

      More overhyping from Sony.

    2. Re:Jeez. by Anonymous Coward · · Score: 0

      Can't wait for that PS3...
      Yeah, but will it run Linux...OSX?

    3. Re:Jeez. by KingPunk · · Score: 0

      cant wait for that PS3 eh?

      i cant wait to mod the hell out of that PS3 ;)
      har har har..

    4. Re:Jeez. by witte · · Score: 1

      Great ! Imagine all the things we can do with that much horsepower !
      Now we can *really* write bloated code !
      Anybody want a Java VM coded in VBA inside a MSWord document ? (Running in Wine.)

    5. Re:Jeez. by Anonymous Coward · · Score: 0

      Except for the fact that it DOES mean something with video cards. Video card computation is very vectorizable/parallelizable, which means you should be able to frequently hit the burst FLOPS rate on the card.

  4. Comment removed by account_deleted · · Score: 3, Insightful

    Comment removed based on user account deletion

  5. I bet this Cell processor will kick ass... by Anonymous Coward · · Score: 2, Funny

    ...playing The Game of Life.

    1. Re:I bet this Cell processor will kick ass... by ceeam · · Score: 4, Funny

      You mean.. If one part of this chip is surrounded by more than three other parts actually doing anything useful then it will die from overheating? : )

  6. Deja Vu by DrSkwid · · Score: 4, Interesting

    Sony so badly wants its next-generation game console to offer a super-realistic "virtual reality" experience, the company will design and build its own advanced 128-bit processor to realize this goal.

    Processors inside game consoles usually toil away in anonymity, derided as as poor cousins to desktop chips such as Intel's Pentium line. But with Sony Computer Entertainment's ambitious plan, its chips could outclass the offerings of the world's largest chipmaker--if all goes well.

    ...

    The system is so advanced, MicroDesign Resources analyst Keith Diefendorff wrote in a report that the system "has the potential to swipe a chunk of the low-end market from under the noses of PC vendors." He wrote that the platform may "signal the company's intention to move upscale from current game consoles, cutting a wider swath through the living room," with its abilities to function like a stand-alone DVD player and Internet set-top box.

    Sony puts on game face with new chip
    Published: May 5, 1999, 1:25 PM PDT
    By Jim Davis
    Staff Writer, CNET News.com

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
    1. Re:Deja Vu by nutshell42 · · Score: 4, Informative
      He wrote that the platform may "signal the company's intention to move upscale from current game consoles, cutting a wider swath through the living room," with its abilities to function like a stand-alone DVD player and Internet set-top box.

      Well one reason the PS2 sold like hot cakes was that it was one of the cheapest DVD players at that time (at least in Japan). There is media player software available and it's quite popular the reason it isn't a internet set-top box is that noone wants internet set-top boxes they died a painful death. Now there's no EE desktop PC because it's too slow but the difference between Cell and PS2 in this regard are

      (a) Cell was co-designed by IBM which has an interest in selling workstations etc with that chip, Sony didn't it's not their business
      (b) Cell is designed for multiprocessor environments so if it becomes too slow for a task you can simply throw more processors at it
      (c) 2000 the clockspeeds still doubled every 18 months that stopped. x86 goes the way of multiple cores too so the programmers will have to get used to parallel design anyway

      That doesn't mean it will replace x86 or even make a dent but it means that contrary to the EE it's designed for such stuff and one of the companies behind it sells specialized workstations so it's at least a possibility.

      And this time you can find more credible sources than CNET (CNET's part of the yellow press of computer news sites. Almost as bad as yahoo news) who'll tell you that.

      --
      Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
    2. Re:Deja Vu by Anonymous Coward · · Score: 0

      It's worth noting, in the middle of all the gushing, that the cell architecture is 100% DRM ENABLED -- as you'd expect from Sony. In other words... it's a crippled processor that actively works against the owner of the machine.

    3. Re:Deja Vu by Johnny+Mnemonic · · Score: 1


      (a) Cell was co-designed by IBM which has an interest in selling workstations etc with that chip, Sony didn't it's not their business

      There's a lot of vaio developers that will be unhappy to hear that.

      Sure, IBM and Sony both like the Cell CPU a lot. However, IBM likes the PPC chip that Apple uses, and yet it still hasn't a) taken over the world, or even b) been put into use by IBM themselves. Why doesn't IBM use Apple workstations across the enterprise? After all, they make the CPU, and for awhile even made the hard drives. Answer: cause it doesn't run the apps they need, and they don't control enough apps themselves to make the switch. The Cell will suffer the same fate outside of it's dedicated use in the PS3.

      Would you buy a new Cell workstation for anything besides PS3 development? What would you run on it if you did? Yellow Dog Linux, maybe? If you're enamored of the PPC, are you more likely to develop for the G5, already with a marketshare, or for the non-existant marketshare of the Cell? Maybe--and this is a big maybe--if you needed a CPU that needed high visualization components. But then I guess you'd go with SGI.

      Really, in the mature PC economy of today, I don't see how any new CPU architecture can get a foothold; it's a chicken and egg thing with developers and consumers to support the developers. Even Apple, with a legion of crazy fans (of whom I am one) can just barely sustain itself, not insignificantly due to inertia. If Apple has trouble getting developers to code for their CPU, I just don't see who would develop for a VAIO (or ThinkPad) Cell workstation or laptop, until 1 Million of such units are sold; but who buys them until the developers are there? Gamers, as a PS3--and after Sony sells 100 MIllion, releases a browser and office suite?

      --

      --
      $tar -xvf .sig.tar
    4. Re:Deja Vu by MrResistor · · Score: 3, Interesting

      Maybe--and this is a big maybe--if you needed a CPU that needed high visualization components. But then I guess you'd go with SGI.

      And why wouldn't IBM be going after SGIs market? I think your points hold in the consumer space, but in a specialized market like that I think it becomes a lot easier to gain a foothold simply based on technical merit.

      Heck, better yet, and in what seems to be more inline with IBM's current direction, why wouldn't they try to get SGI to switch to Cell?

      IBM likes the PPC chip that Apple uses, and yet it still hasn't a) taken over the world, or even b) been put into use by IBM themselves. Why doesn't IBM use Apple workstations across the enterprise? After all, they make the CPU, and for awhile even made the hard drives.

      Are you sure that isn't one of their long-term goals? IBM is a big company, and it hasn't been that long since they've decided to change how they do things. Just because you can't see any evidence that they're making that switch doesn't mean they aren't working on it. I mean, they aren't even out of the Wintel PC business yet, and won't be, at least in name, for another 5 years. Given how much MS loves it when their resellers start offering competitive products, that seems like a very important first step in any such plan.

      When you walk into an IBM facility, what brand of computers are sitting on the desks? I honestly don't know, but I would hope they eat their own dogfood. I very much doubt you'd see a Dell on every desk.

      If Apple has trouble getting developers to code for their CPU, I just don't see who would develop for a VAIO (or ThinkPad) Cell workstation or laptop

      Porting Linux takes care of a large portion of that. Yeah, I know Linux is pretty much in the same boat as Apple, but it's a real easy way to significantly boost their development community, and provides a huge amount of instant functionality.

      --
      Under capitalism man exploits man. Under communism it's the other way around.
    5. Re:Deja Vu by HiredMan · · Score: 1
      IBM likes the PPC chip that Apple uses [] even b) been put into use by IBM themselves.

      Really? That might surprise IBM . Guess they better stop selling them then...
      And if by likes you mean designed and fabbed the 970 for Apple at their request then yes they likes it fine. And while you think it hasn't taken over the world the core design is going to be used in (to varying degrees) in all 3 next gen gaming systems. Since IBM is simply actiing as chip fabricator that ain't bad for them at all. (How many millions of units is that in year 1?)

      IBM has Power server line (current performance champ) which is where their real interest lies and they currently have chip fab market mindshare bringing the vast majority of chip fab innovations (copper, SOI, etc) to market first and best and they're willing to work with AMD, Sony, Toshiba or whoever to bring their PowerPC to market in whatever form you might want it in.

      It's not taking over the world but they'll settle for getting PPC into as many non-desktop spaces as possible. And IF Apple's OSX work and their Linux efforts happen to make inroads into the desktop space - especially as they push from the top down with Power server pressure - then that's all gravy for them.

      =tkk

    6. Re:Deja Vu by Dan+Ost · · Score: 1

      Don't pass judgement on this chip yet. You haven't even seen how they intend on
      positioning it yet. We know it will be made in relatively vast quantities for
      a new chip since it is being used in the PS3. The fact that it will experience
      some economy of scale at the beginning is a little unusual for a new chip, so
      the normal expectations shouldn't apply.

      Problem is, I don't know what expectations to set. I'm not a fan of x86, but
      I don't see how it could be replaced yet.

      --

      *sigh* back to work...
  7. Linux on Cell by satsuke · · Score: 1

    Reason why would Linux be ported to a gaming platform or scientific platform. (Current PS2 runs linux)

    Why

    Because they can.

    Depending on Sony's marketing, think of the DBZ tie ins .. Imagine playing Cell Games on a cell (based) game.

    1. Re:Linux on Cell by Anonymous Coward · · Score: 0

      because windows is 100% useless as a scientific platform.

      UNIX has dominate scientific cince day 1 and only posers and wannabe's use windows.

      What moron would run his particle accelerator on windows?

    2. Re:Linux on Cell by zootm · · Score: 1

      To be fair, most scientific apps are nowadays written in Java, and are hence multiplatform...

    3. Re:Linux on Cell by LiquidCoooled · · Score: 2, Funny

      I use a particle accelerator with Windows all the time.

      I can't stand LCD monitors, CRT all the way ;)

      --
      liqbase :: faster than paper
    4. Re:Linux on Cell by satsuke · · Score: 1

      Who said anything about it running windows ?

      You might remember the platform the EFF (and others) used to crack some of the RSA encryption last year. It was dedicated silicon, designed for the purpose. If you wanted to run a GUI on the mass number of processors, I suppose you could .. but that's not what it's designed for.

      These CELL processors are more general purpose than this example, and they will (on the PS2) have a way to address a display of some sort .. so a GUI (windows, X, Aqua) isn't out of the question .. but that's hardly the point.

    5. Re:Linux on Cell by Anonymous Coward · · Score: 0

      Java is not really used for apps that need number crunching, it is just too slow. Also, OO languages are not that well suited for the type of quick and dirty programming most scientists do.

      Also the original post seemed to imply that Linux is not used in science, which is definetely wrong. Linux is probably the dominate science platform now, at least for serious computation.

    6. Re:Linux on Cell by ponos · · Score: 1
      To be fair, most scientific apps are nowadays written in Java, and are hence multiplatform...

      I find this rather hard to believe! Scientific applications are supposed to be extremely demanding and are the driving force behind expensive workstations and huge clusters. Traditionally, I would expect scientific applications to be coded in C/C++ and Fortran, or maybe some special languages. In my experience with Java applications, they are usually much slower than their native counterparts.

      P.

    7. Re:Linux on Cell by zootm · · Score: 1

      Most University simulation and science courses I've seen teach Java, I'm not going to speculate why, other than the fact that perhaps the actual processing time is less significant than the development time, and using Java over other options (particularly when programming is not the developers first skill) probably cuts this proportionately.

    8. Re:Linux on Cell by Physics+Nobody · · Score: 1

      Bullshit! I write number crunching scientific apps for a living and I wouldn't touch Java with a 10-foot pole. I don't know a single person who would. Generally it's C++ or Fortran (Fortran is alive and well in scientific computing and the language of choice for a lot of apps).

      Of course, most number crunching apps are multiplatform anyway because if you don't care about a GUI it really isn't that hard to write multiplatform code.

      --

      Physics is good

    9. Re:Linux on Cell by zootm · · Score: 1

      That's a good point. I mentioned Java simply because I know that it's what the physics department here (and it's not a small department) and several others around here teach. As I said in another part of this thread, in general the time taken to develop physics apps is more important than the time to execute, and a simpler programming language aids this.

      Perhaps "number crunching" physics apps were not the type to which I referred...

  8. Re:Cool, as a co-proc by Anonymous Coward · · Score: 0

    The marketing got you.

    What the Cell has is a "PowerPC Processing Element" which is a stripped down version of the Power5. It runs at a high frequency, but each instruction takes longer to process (more RISCy than the standard Power5).

  9. Re:Cool, as a co-proc by Anonymous Coward · · Score: 1, Informative

    It isn't a POWER5. It is more like a 64-bit variant of the 750VX with SMT, a chip that never appeared but otherwise looks rather similar to what has been described as the PowerPC portion of Cell.

  10. Re:Cool, as a co-proc by The_Mr_Flibble · · Score: 2, Insightful

    But isn't the point of the cell processor a distributed model.

    From the reviews I've seen they are touting it as if the cell communicates with other cells to handle all the processor intensive stuff.
    so where one cell would not be as powerful as an x86 cpu two cells would be. And the way they have designed the things is as a seperate computer on a chip so you can basically upgrade your ?? just the same way you upgrade your memmory.

    Or have I gotten the wrong end of the stick and they are designing these things for pointless fun.

  11. Re:hell ya, cheep awsome computers! by Anonymous Coward · · Score: 0

    Huhuhuh! Or, like, put it on a SPEEDBOAT or something...

    Man, that would be SWEET!

  12. Isn't linux running already ? by Eternally+optimistic · · Score: 1

    I thought I read, within the last 2 days, that linux is the only OS they admitted to have running on these things. But I didn't find the source of that, does anyone know?

    --
    What keeps me going is my inertia.
    1. Re:Isn't linux running already ? by LiquidCoooled · · Score: 2, Informative

      Here you go, I found the source for you :)

      --
      liqbase :: faster than paper
    2. Re:Isn't linux running already ? by Anonymous Coward · · Score: 0

      Latest karma arbitrage:

      Post any Linux related link as a joke - watch the mods +1 informative your karma up the wazoo.

  13. Reminds me of Chuck Moore's 25x multicomputer chip by LourensV · · Score: 5, Interesting

    Some time ago Chuck Moore proposed the 25x , a single chip holding a 5x5 array of simple processors. That's what this reminded me of when I first read about it. As Mr. Moore said in that Slashdot interview, "[...] the 25x is a solution looking for a problem." Cell theoretically has a lot of performance, and we're talking FLOPS not MIPS. It will certainly be useful or even revolutionary in televisions and game computers, as well as for scientific calculations. I don't see it making your desktop or server much faster though. Those don't need more FLOPS, they need more I/O bandwidth and faster peripherals, and perhaps more MIPS. I can see Cell workstations, but in the same way as you have SPARC workstations and laptops now: as development tools for the "real" hardware.

  14. Clock speed by 91degrees · · Score: 1

    I'm a little concerend about the clock rate. How are they getting this sort of speed? There are ways to do this, but most of them would reduce the efficiency or increase size. Hardly seems to make sense to do this when it's a lot easier to simply add more processing cores.

    1. Re:Clock speed by Anonymous Coward · · Score: 0

      5+ ghz isn't to tremendous. Pentium4's are at 3.6ghz and they probably have test samples and experimental chips that run much faster then that.

      I wouldn't expect it to be ready for production any time soon.

    2. Re:Clock speed by 91degrees · · Score: 1

      The P4 has a 20 stage pipeline. 20 stage pipelines are very inefficient, giving a big penalty for a branch misprediction, and requiring a lot of logic to handle the complexity of waiting that long for a result.

    3. Re:Clock speed by beardz · · Score: 1

      Actually, doesn't the Prescott P4 core have a 31 stage pipeline?

    4. Re:Clock speed by thpr · · Score: 2, Interesting
      So look at what is IN that Intel architecture pipeline that Cell shouldn't need (based on what we know):

      (1) fetching and prefetching (multiple P4 stages) because the extra processors on Cell can directly address their local 256KB of memory.
      (2) decoding x86 instructions into microops - since the extra processors are running code directly rather than running kludgy x86 code on a non-x86 microcore
      (3) branch prediction (since the load penalty is a lot lower due to local 256KB of memory and shallower pipeline, these stages are unnecessary)
      (4) scheduling the microops isn't necessary as Cell will require that to be done in software during compilation (ala VLIW)
      (5) retirement (since Cell isn't doing out-of-order execution, no reordering and retranslation from the microop to the x86 world is necessary)

      So given that potentially half of the 20 P4 stages (later P4s have 31) are unnecessary, that saves a lot of logic and allows the same clock speed with less stages. There has (apparently) been a lot of architecture work here to think through what adds the extra hardware and how to avoid that... the result is the ability to use higher clock speeds without having the same types of penalties the IA-32 processors encounter.

    5. Re:Clock speed by InvalidError · · Score: 1

      There are not many modern processor architectures today that do not convert OPs to micro-OPs... and even micro-OPs are further expanded into micro-code on most architectures. Micro-OPs is the actual "recipe" telling each pipeline stage exactly what to do as a particular micro-OP is moving through it.

      Much of the "unnecessary" logic you enumerated is critical to maximizing single-threaded performance that dominates desktop apps. If desktop apps were optimally massively multithreaded, Intel&all could hypothetically drop branch prediction, out-of-order and a few other such features with little to no impact on overall throughput.

      There is also logic associated with backwards compatibility. Supporting legacy modes, historic kludges (bugs turned features such as HMA) and other such must "waste" quite a few transistors. To some extent, the micro-OPs decoder could be considered as part of this overhead. Cell's relative virginity probably helps quite a bit here.

    6. Re:Clock speed by be-fan · · Score: 1

      They get that fast by being really simple. They don't do speculative execution, instruction reordering, etc. They don't have TLBs, caches, or a number of other things that make circuits more complex.

      --
      A deep unwavering belief is a sure sign you're missing something...
  15. More Cell reviews? by Anonymous Coward · · Score: 3, Insightful

    Sheesh, /. might as well make a Cell image & category, they post so many articles about it!

    1. Re:More Cell reviews? by adam31 · · Score: 4, Informative
      Well you can't say it isn't news for nerds. And this article has enough added information in it that I thought it to be worth posting. Most Cell news stories are dumbed down for the nonnerds, whose most pressing question is "Does it run Windows?" This article is the best source I've seen of all the info we know about Cell, without a painful amount of editorializing.

      It seemed there was a lot of misinformation/confusion going around because some people heard it supported DP floats and some people heard it used Altivec (which doesn't support DP). So half the people extrapolated that IBM had ditched Altivec (i.e. VMX), and the other half assumed there was no DP support... both of which angered people. The truth (according to this article) is that it uses BOTH: A version of VMX that supports DP. whew!

      The article also points out that the SP floats aren't truly 754-compliant, as they round-toward-zero on cast to int. This makes it compatible with that horrible C/C++ truncation cast (If anyone knows why C opts to round-toward-zero, please let me know!). However, rest assured, DPs are 854-compliant.

      Also, the article suggests that there is a memory limit (at least initially) of 256MB:

      The maximum of 4 DRAM devices means that the CELL processor is limited to 256 MB of memory, given that the highest capacity XDR DRAM device is currently 512 Mbits. Fortunately, XDR DRAM devices could in theory be reconfigured in such a way so that more than 36 XDR devices can be connected to the same 36 bit wide channel and provide 1 bit wide data bus each to the 36 bit wide point-to-point interconnect. In such a configuration, a two channel XDR memory can support upwards of 16 GB of ECC protected memory with 256 Mbit DRAM devices or 32 GB of ECC protected memory with 512 Mbit DRAM devices.

    2. Re:More Cell reviews? by NanoGator · · Score: 0

      "Sheesh, /. might as well make a Cell image & category, they post so many articles about it!"

      You're right! This space could have been used to pick on Bill Gates, flip the bird at SCO, or give Mozilla a reach-around!

      --
      "Derp de derp."
    3. Re:More Cell reviews? by Anonymous Coward · · Score: 0

      These so called "reviews" are nothing but previews, or rather hyped-views. How about stop speculating until we see a real chip run and official data are available?

      There are lots of hype about performance, but little about the on-chip DRM. The chip has tera FLOPS capability, but will only run "trusted software".

      Big Deal!

    4. Re:More Cell reviews? by Anonymous Coward · · Score: 0

      C rounds towards zero because I think it is faster than truncation - or vice versa :) There is an issue where function A takes 1 instruction and function B takes many instructions.

    5. Re:More Cell reviews? by Anonymous Coward · · Score: 0
      "round toward zero" means the same thing as "truncate (toward zero)". The reality is (on x86 machines which have several rounding modes including the 754 compliant round-to-nearest-even, and the C-compliant truncate-toward-zero) casting float-int is "slow" because the current rounding mode must be saved, the rounding mode set to truncate-toward-zero, the round actually being made, then the previous rounding mode being restored, all of which are a dependency chain. The cast itself is not slow.

      The article seems to imply that there aren't multiple rounding modes for each core, only truncate-toward-zero to be compliant with C-style cast (and presumably cut down on core logic circuit complexity). So this rounding should be fast, just not 754-compliant. This is also consistent with the DP floats being 854-compliant, where speed is not quite as important (at a 10:1 disadvantage to SP according to the article), and consistency is paramount.

      Anyway, I highly doubt that the C standard would be dictated by what is most efficient for some particular processor implementation (though stranger thing have happened, true...)

  16. Re:Cool, as a co-proc by ezavada · · Score: 2, Informative

    From what I've seen, it will be rather low horsepower compared to the current G5s, since it will be lacking deep pipelines, caches and other bits that give the G5 much of it's speed. That's not to say that it's not really a G5, it sounds like it will support the full G5 instruction set (including Altivec) and be a true 64 bit processor core, just not a particularly fast one.

    The role of the G5 cores seems to be to handle higher order logic that prepares and parses out tasks to the very fast vector units (SPEs).

    So it probably does make more sense to have it as a coprocessor in a Mac, at least until compilers and software writers routinely target the cell's SPEs -- if that day ever comes. More likely specialized code will need to be written, and particular subtasks pulled out.

    I suspect things like physics libraries, sound & video processing libraries, plus apps like SETI@home would be quickly written to use the SPEs, but most other software wouldn't be.

  17. Obviously a TROLL by gorim · · Score: 1, Insightful

    And a good one. Someone actually modded this person as Interesting. :)

    Having said that, if the original poster of this thread truly does think its underpowered, one should provide a bit more elaboration besides a trollish reference to the IBM/Sony marketing machine.

    1. Re:Obviously a TROLL by Anonymous Coward · · Score: 0, Offtopic

      and this post was modded as insightful?! i preferred the one he called a troll.

  18. Lost acronyms in the article.. by acomj · · Score: 2, Funny

    I like the fact that the presenters didn't remember/know what all the acronyms were in the cell diagram. I like the interview technique too. Get em drunk and watch em talk.

    I was wondering why the article was so in depth.

    Quoth
    "
    After some discussion (and more wine), it was determined that the ATO unit is most likely the Atomic (memory) unit responsible for coherency observation/interaction with dataflow on the EIB. Then, after the injection of more liquid refreshments (CH3CH2OH), it was theorized that the RTB most likely stood for some sort of Register Translation Block whose precise functionality was unknown to those outside of the SPE. However, this theory would turn out to be incorrect.
    "

    1. Re:Lost acronyms in the article.. by MrP-(at+work) · · Score: 0

      Mmm I love CH3CH2OH!

      But it's not as good as (CH3)2CHOH.. mmm

      --
      [an error occurred while processing this directive]
  19. Re:Cool, as a co-proc by Anonymous Coward · · Score: 0

    Thankyou.

    I've been saying something like this to my friends until I'm blue in the face, but it's amazing how well products benchmark on paper to those who have limited technical knowlege.

    My prediction for the PS3 is that the games will look graphically gorgeous - pushing the bar in terms of animation complexity and polygon count - but that's going to be just about it. New look, same old games that play exactly the same as the ones from the previous generation.

  20. Perhaps, but FOUR of them... by game+kid · · Score: 2, Interesting

    ...might be used to run the PS3 (assuming this is true). Outside of a weighty OS (assuming you use Windows, Mac or a Linux GUI with that nVidia) they should do better.

    Besides, 256 GFlops in single-prec. can't be too bad either...can it?

    --
    You can hold down the "B" button for continuous firing.
    1. Re:Perhaps, but FOUR of them... by Paladin128 · · Score: 1

      Actually, you'll need a pretty andvanced OS. Each SPE, rather than just having cache, has its own local memory that is managed by software. Scheduling on this will be a unique challenge, and most game developers will be unwilling to write thier and tune own OS schedulers. They're going to want to use high-level libraries (hopefully OpenGL 2 will be available...) so an OS would be needed.

      --
      Lex orandi, lex credendi.
  21. Re:x86 compatibility? by 91degrees · · Score: 1

    Is it compatible with x86 in anyway?

    Only in the same way that a G5 is. Through emulation.

    What good is a new chip, no matter how fast it is, if you can't run anything on it?

    No use at all. What's your point;) But seriosuly, we can expect to see softrware written for this. It has a lot of potential applications, and most serious number crunching hardware has a custom OS.

    How fast will this chip be at general purpose stuff? Who cares if it can do 100GFLOPS on a couple operations.

    That's a good question. Vector units are optimised for a certain class of operations - those where exactly the same set of operations are run on a large number of items. For a graphical application, with procedural textures we can expect very good performance, but this will fall off considerably for general purpose desktop application type stuff. This probably doesn't matter too much. These actually don't need the sort of performance modern chips can offer.

  22. Thanks! by Eternally+optimistic · · Score: 1, Funny

    Yes, that is helpful indeed. Now, can you get me a few of these processors, so I can go port my own version?

    --
    What keeps me going is my inertia.
  23. Re:x86 compatibility? by Decaff · · Score: 4, Informative

    What good is a new chip, no matter how fast it is, if you can't run anything on it?

    There is this really neat group of operating systems called Unix/Linuxes. They have a major advantage in that you only need a small amount of assembler to get going on a new chip, then the rest can be ported over in C/C++. This has been the situation for decades - Unix (and now Linux) has been the initial OS for almost all new chips.

    How fast will this chip be at general purpose stuff? Who cares if it can do 100GFLOPS on a couple operations.

    Reasonable point, but FLOPs are a good general measure of the speed, as they are pretty complex operations. We all used to measure speed in MIPS (Million Instructions Per Second), but as chips got so diverse, one chip's instruction could not be easily compared with another's (particularly if RISC chips were involved, where the instructions could be very minimal). FLOPs are a better measure, as a divide is a divide and a multiply is a multiply no matter what chip architecture you use.

  24. Re:x86 compatibility? by Anonymous Coward · · Score: 3, Informative

    It would be compatable with PowerPC software.

    Which means that the vast majority software I use everyday would work just fine on it.

    Although it would be slow... Cell isn't optimized for general purpose and the extra 'SPE's add another 128 registers to the PowerPC and VMX ISA's. Which wouldn't get used by normally compiled PowerPC code.

    You would have to have GCC worked over to provide 'vectorized' code to use as much as these SPE's as possible for single threaded applications, and even then you wouldn't get much more performance out of it then a normal G4-class PowerPC proccessor.

    Then you have memory managment problems to work out, probably thru a extensive firmware-based controller which would add to execution time and slow things down a little bit more.

    The advantage would be if I was doing extensive multimedia or 3d work or special types of scientific research then I could use a familar enviroment (linux) as a platform to run special applications that themselves would benifit from the tremendious performance capabilities of a few of theses cells.

    It would make a great chip for embedded multimedia player (at lower clockrates) and would be great for something like a non-linear video editor, but a Wintel killer it definately woudn't be.

    Probably would be somewhat usefull for normal desktop usage as more and more applications are multimedia in nature, but it's not going to be substancially faster then a Intel or AMD proccessor to the end user.

  25. Re:Cool, as a co-proc by stilwebm · · Score: 3, Interesting

    But what it can do is provide backup horsepower as a math co-processor.

    I see great potential for the STI Cell Processor as a SETI@Home accelerator.

    Seriously though, there may be good scientific uses for these exactly as you envisioned - in a coprocessor role. From folding proteins and weather simulations to cryptoanalysis, these could provide a great entry for distributed scientific computing.

  26. Re:Cool, as a co-proc by Mattsson · · Score: 1

    Just like games on PC's, mac's, Xbox, etc...
    That has to do with poor imagination at the game-producers and nothing to do with the performance of the cell-cpu.

    --
    /.Mattsson - My native language is not English, so please don't whine over linguistic errors. (That's lame anyway...)
  27. Coral Cache for The Slides by Anonymous Coward · · Score: 0

    Here is a Coral Cache of the images page.

  28. Re:Cool, as a co-proc by R.Caley · · Score: 1
    It's way underpowered for anything resembling a primary CPU.

    Then again, today's CPUs are way overpowered for the jobs they are actually doing. Most of the power is used for sometimes important, sometimes pointless stuff around the edges such as antialiasing fonts and making icons bounce up and down.

    A chip designed to be able to cooperate with others should have an advantage in that kind of environment. If the CPU can concentrate on actually running the word processor, and efficiantly coordinate with others doing the peripheral activities for it, that should be a big win.

    --
    _O_
    .|<
    The named which can be named is not the true named
  29. Re:x86 compatibility? by Anonymous Coward · · Score: 0

    Also keep in mind that the SPE's, the secondary smaller cores.. the 8 cores are called 'vector' but they are more general purpose.

    They aren't realy realy vector, they just act like it. They are actually general purpose cores just very very 'risc'.

    For instance they can run integer operations, too. And are SIMD and 128bits.. similar (but not compatable) to VMX/Altivec

    One one core you can run 4 single precision floating point operations at once, OR 4 32bit-sized integers OR 8 word-sized operations, OR 16 byte-sized operations...

    So this chip all at one time can proccoess on the SPEs a total of something like 256 32bit operations in a single clock... If your application was that parrellel, which is unlikely.

  30. Re:Cool, as a co-proc by Zphbeeblbrox · · Score: 1

    gameplay is not a function of hardware. It's a function of the game designer. So basically your saying the CELL will do exactly what they are saying it will? You just don't think the game designers are going to be designing games you want to play. This is cleverly offtopic.

    --
    If you see spelling or grammatical errors don't blame me. I tried to preview but IE here at work borked the CSS
  31. What software will it run by akc · · Score: 5, Interesting

    I've been reading about the Cell processor for a few weeks now, and there is never any discussion about the operating system architecture necessary to get this thing to perform.

    As I see it, its a Power PC of OK quality with 8 subsidiary processors optimised for operating a relatively simple task on a relatively small amount of memory.

    So - port Linux to it? But how?. Relatively easily, to make use of the main processor, but what sort of subsystem do you build so that the subsidiary processors get used to their full potential. Perhaps part of X could be configured to run on these processors - but that would be a very manual tweak to make use of the architecture. And with the best will in the world, these processors would then sit around unused for most of the time.

    What you need is a more general concept, probably at the programming language level, in which algorithmns can be expressed in such a way that the operating system can detect that they can be loaded into these subsidiary processors to be executed.

    But there doesn't seem to be anything about that in the news out there. Presumably Sony are going to do something for the PS/3 - what? and is it going to be general purpose, since much of the benefit from their purposes will be a super motion graphics processor for games.

    Until we understand what the software infrastructure to make use of the architecture of this new chip will be, then I can't see how we can make predictions of its success in the more general processor market. Before then its just marketing hype.

    1. Re:What software will it run by Anonymous Coward · · Score: 2, Insightful
      Well, it seems to be ccNuma. The coprocessors can access shared memory but copy to local memory to do the processing. The ppc control processor is there to set up stuff for the special processors since they're not equipped to communicate with the outside world themselves.

      The iteresting thing which most commentators seemed to have missed is the virtualization technology. If you're going to have cell based devices job out stuff to execute on any nearby cell processors on the network, you're going to need a really good sandbox. One that's better than Java's which isn't that good. IBM's virtualization technology is more secure than anything else I've seen out there.

    2. Re:What software will it run by fitten · · Score: 1

      If you're going to have cell based devices job out stuff to execute on any nearby cell processors on the network, you're going to need a really good sandbox. One that's better than Java's which isn't that good. IBM's virtualization technology is more secure than anything else I've seen out there.


      Heh... before you make the sandboxes you had better write a distributed OS for the things... and make sure that all devices with Cell in it are running that distributed OS.

      Oh... and a distributed OS is a pretty hard thing to write and what feats have been attributed to the yet-to-be-seen distributed OS that runs on all these Cell machines make the distributed OS look like "hello world".

    3. Re:What software will it run by Anonymous Coward · · Score: 0
      1. Until we understand what the software infrastructure to make use of the architecture of this new chip will be, then I can't see how we can make predictions of its success in the more general processor market. Before then its just marketing hype.

      The OS won't care or use the cells much if at all...just as they don't use the FPU in current CPUs much.

      System libraries will be optimized to just use the cells if available and to handle background cell networks. Individual apps will either use the system libs or use other optimized libs. In rare situations the apps will specifically target the cells, though I don't see this as practical in the long term.

      None of these require core changes to the operating systems. It's possible that a non PPC could have these libraries and instruct the cells over a network to do work, though that is not as likely. (It's possible that the hardware is lacking on the non-PPC side and would require a costly (in CPU power) emulator. It's also possible that the transport mechanism (network connection or physical layer) would be awkward or nearly impossible to implement on a non-PPC system.)

    4. Re:What software will it run by cow-orker · · Score: 2, Insightful

      I don't think the operating system could make much use of the APUs. The best that can be hoped for is an OS that somehow allocates apulets to the APUs, but since the APUs will work best if used as stream processors this allocation is... well... non-trivial.

      However, given a way to allocate these units to userspace programs, there are lots of programs that could benefit. X and mplayer come to mind, provided someone implements the critical code for APUs, which may well mean coding in assembly.

      What you need is a more general concept, probably at the programming language level, in which algorithmns can be expressed in such a way that the operating system can detect that they can be loaded into these subsidiary processors to be executed.

      This will remain a dream for "general purpose languages" like C. However, I could imagine Parallel Haskell or something similar for the Cell. That would be way cool and could even work.

      Anyway, the architecture without adequate software is quite useless. I'm still very much interested.

    5. Re:What software will it run by ReelOddeeo · · Score: 3, Insightful

      What software will it run? Software "cells".

      A software cell runs on one of the APU's (or SPU's, or whatever we're currently calling them). It is sandboxed. When the main processor sends a software cell to one of the sub processors, it specifies exactly what memory that the hardware will allow that processor to access.

      You can run a software cell from an untrusted source. The software cell is a combination of code/data. The processor performs some function on it. While running, the sub processor has access only to the memory that the main processor designated.

      Applications like X Window system, Xine, MPlayer, mpg123, LAME, XMMS, etc., ad-infinitum, can be designed with their own software cells. In fact, entire libraries of software cells can be constructed and re-used. Libraries of multiplexors, demultiplexors, encoders, decoders, compositing, FFT's, transcoders, renderers, shaders, GIMP Filters (blurr, effects, etc.), etc.

      If you're building an application, such as SETI at Home, then you organize your program as software cells. You can farm out as many software cells as you have hardware cell processors to handle.

      Cells can be safely shuffled from device to device. Spare cell capacity in your TV or PS3 can run your SETI at Home, or your Xine cells.

      The Cell processor isn't very helpful for, say OpenOffice.org spreadsheets or drawings, or spellchecking. But word processing isn't the function that usually needs super fire-breathing processor power.

      It is not inconceivable that things like spreadsheet calculations can be effectively improved using software cells. But this is not as obvious (at least to me) as the former applications that I mentioned.

      So if you had a 2 GHz main processor and one or more Cell co-processors (a variable, expandable number) you would have a tremendous amount of computing power. The applications that demand extraordinary power would have it -- even with just one cell coprocessor. And this was quite a list of applications I mentioned above. Just about anything audio-visual or doing massive parallel operations on pixels, or 3d.

      --

      Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
    6. Re:What software will it run by 21mhz · · Score: 1

      But how all this is going to be multitasked?
      When my process is being switched out in the main CPU, should the running SPUs be also suspended somehow and their context saved along with the main context? Since their local memory isn't protected in any way, that would be quite a massive context, wouldn't it? If this is not to be done, access to the SPUs should be policed by the OS. Say, while some process has a device opened that controls access to an SPU, no other process can open the same device.

      --
      My exception safety is -fno-exceptions.
    7. Re:What software will it run by Anonymous Coward · · Score: 0

      How about Taos Intent/Elate virtual machine, I here that's slightly better than Java's. If these SPE's where just vectoring units I wouldn't see how that would be managable but it looks like they're more than just a altivec style vectoring unit and more programmable. But you still need a ppc control processor to handle all those "coprocessors".

    8. Re:What software will it run by ReelOddeeo · · Score: 1

      When Apache and Postfix have the main CPU, you don't want your mp3 decoding to stop do you?

      A single encode/decode task would ideally be coded as a single software cell. Perhaps even multiple functions in a single software cell. I.e. decode mp3, and add reverb as a single software cell that uses up a single SPU.

      I run The GIMP and do a massive filter, and it realizes that there are seven SPU's available, so it issues five hundred software cell problems (non serial) that are consumed and processed by the seven SPU's. By non-serial, I mean like a single SETI work unit. Blurring this 64x64 pixel area is a single software cell problem. I divide my gazillion x jillion pixel image into a bunch of cells, and dispatch them to the pool of SPU's that I have.

      Now I start Xine (yes, while still listening to mp3), and Xine issues (let's suppose) two software cells that continuously take up two SPU's. Now my GIMP blurr runs slower.

      It is possible that with some rethinking, that even non-AV problems can exploit the cells. If you can look at the dependency graph of a spreadsheet and isolate cells that can be calculated in parallel, you can potentially send "streams" of spreadsheet-cells to a stream processing "software cell" (in an SPU) to process each formula. Now if I have five cells available, then I can potentially be calculating up to five non-interdependant spreadsheet-cells at once.

      Next year, I add an additional two cell processors (16 more SPU's, for theoretical max of 2 additional Teraflops). Now my X Window system can use several additional cells to have many new kinds of eye-candy going on in real time, while I'm still using The GIMP, watching Pr0n, and listening to mp3's.

      --

      Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
    9. Re:What software will it run by akc · · Score: 1

      What I reckon is needed is something akin to the memory management subsystem in a traditional OS - ie something that allocates SPU's to requesting tasks (possibibly on a priority basis) and puts the "stalled" tasks on backing store.

      Without doing any sums, it may be that some tasks are sped up so much that the SPU can be multiplexed between lots of tasks per second, so that they are effectively shared by several tasks at the same time - much like the CPU is today.

      The other thing to perhaps consider then is that the SPUs are very much like little "object" processors. and that something like KDE (which is of course full of objects, havng been programmed in C++) could be loaded into these SPUs (each one multiplexing several objects)

    10. Re:What software will it run by jdb8167 · · Score: 1

      Go to the article and read about the streaming in the SPE. You can overlap incoming with outgoing. So thread 1 is DMAing its data in while thread 0 is DMAing its data out.

    11. Re:What software will it run by 21mhz · · Score: 1

      Yeah, yeah, just imagine a Beowulf cluster of... nevermind :)
      But what prevents all these programs from stepping on each others' toes when they submit tasklets to SPUs? Will the arbitration be performed benevolently by a mutual convention or enforced by the OS?

      --
      My exception safety is -fno-exceptions.
    12. Re:What software will it run by 21mhz · · Score: 1

      These "threads" from the pretty picture are not what I'm talking about. They are software controlled, which means there must be some code that takes care of them at once.

      --
      My exception safety is -fno-exceptions.
    13. Re:What software will it run by forkazoo · · Score: 1

      Something like pixar's prman could theoretically do quite well on a system like this. During shading, the renderer breaks all the geometry into tiny fragments, and runs a shader program on each fragment. Basically, the cell should be able to work on 8 fragments simultaneously. Memory access can be reasonably small, as you would only need to fetch a few pixels from your texture maps, and the actual code for the shader, and so the bandwidth limitations probably wouldn't kill you... Could make a fun fun platform for rendering.

    14. Re:What software will it run by Anonymous Coward · · Score: 0

      Wouldn't something like the OpenMP be perfect?

      Either that or a mini MP OS that scheduled work loads amongst 8 processors and provided a means to chain them?

  32. IBM by Anonymous Coward · · Score: 1, Insightful

    don't forget that this time ibm is part of the whole show. they aren't going to risk their reputation witch cheap tricks, that's their main business after all

  33. Yes but... by GejTOO · · Score: 1, Funny

    does it run NetBSD? had to do it... sorry.

    1. Re:Yes but... by Anonymous Coward · · Score: 0

      Ofcourse it runs NetBSD!

    2. Re:Yes but... by Anonymous Coward · · Score: 0

      In Korea, only old chips run NetBSD.


      Wait... That's true for everywhere... nevermind

    3. Re:Yes but... by Anonymous Coward · · Score: 0

      After seeing the NetBSD performance for MySQL, do you really want to run NetBSD on this?

    4. Re:Yes but... by Kehvarl · · Score: 1

      does it run NetBSD?

      It will once they build a toaster around it.

  34. What's the point? by jeif1k · · Score: 4, Insightful

    Unless you are computing digital orreries, whether it has 256GFlops or 256TFlops makes little difference if the memory bandwidth isn't substantially increased, and people don't increase the memory bandwidth because that has expensive consequences all over the system.

    On the whole, my impression is that current mainstream CPUs have a pretty reasonable balance between CPU power and all the other system components. Changing just the CPU without making substantial (and expensive) changes to the rest of the system will not magically give you more performance.

    1. Re:What's the point? by dfj225 · · Score: 4, Informative

      It seems like Cell will have more memory bandwidth than the processors commonly used today. From this article:

      " The memory and processor bus interfaces designed by Rambus account for 90% of the Cell processor signal pins, providing an unprecedented aggregate processor I/O bandwidth of approximately 100 gigabytes-per-second. "

      --
      SIGFAULT
    2. Re:What's the point? by jdb8167 · · Score: 3, Informative

      Why do you think they licensed the XDR interface from RAMBUS?

      There are 2 dual XDR interfaces. Each interface is running at 6.4 GB/s. So 4*6.4 = 25.6 GBytes/sec.

      So the CELL memory design is at least 4 times faster than current DDR2 memory systems.

    3. Re:What's the point? by Anonymous Coward · · Score: 0
      Others have mentioned the high-speed interfaces to memories.


      Also note that each of the cores has its own significant local memory; so apart from the DMA to bring in code and data to operate on, most of the processing will be done directly on these local memories that'll be as fast as a L1 cache.
      Only the final results of complex calculations ever need to be writtin out to the main PC's memory.

    4. Re:What's the point? by sheck · · Score: 1

      So it seems memory won't be a bottleneck.

      What about disk access? Serial ATA runs at 1.5 GBytes/s. Would that be fast enough to keep a CELL system busy?

    5. Re:What's the point? by Anonymous Coward · · Score: 0

      What about disk access? Serial ATA runs at 1.5 GBytes/s. Would that be fast enough to keep a CELL system busy?

      Last I checked, you filled the memory with the program in disk and then the program would be accessed from memory (loops and etc.), most times involving reading data already in memory since disk load. You load from disk once, you read from memory until you don't need the data/program any longer. Don't forget disk caches.

      But looks like some people are eager to find extreme bottlenecks to prove their 1337n355. I guess next we will move to networking bottlenecks... Can it download my pr0n faster, dude!?

    6. Re:What's the point? by CosmicDreams · · Score: 1

      SATA I currently has a theoretical maximum bandwidth of 150MBytes/s. SATA II, which is hitting the market, has a theoretical maximum bandwidth of 300 MBytes/s. And SATA III, which is about 3 years away, will have a theoretical maximum bandwidth of 600 MBytes/s.

      So 1.5 GBytes/s it is not. You may have confused bits with bytes.

      For more information look here.

      --
      Go Gusties
    7. Re:What's the point? by jeif1k · · Score: 1

      Well, then it's not the cell architecture that speeds it up, but the increased memory bandwidth. You can expect that a more traditional(perhaps multi-core) Intel or AMD chip will likely come out soon with similar memory buses and similar perforance.

  35. Re:Cool, as a co-proc by sjf · · Score: 2, Interesting

    it will be lacking deep pipelines, caches and other bits

    And that is the whole point of this processor. The G5 NEEDS those pipelines and caches in order to feed the multiple execution units, reorder instructions and avoid reading slow host memory.

    The CELL on the otherhand will have the instruction ordering done in software. All those 'bits' you describe are replaced with software: a much smarter compiler.

    Yes this processor will perform poorly with today's code. With appropriately written code it will scream.

    This chip is not going to compete with other general purpose CPUs. It's going to compete with custom ASICs and FPGAs.

    -S

  36. these are max figures by Anonymous Coward · · Score: 2, Insightful

    folks need to keep in mind these are max figures assuming software is perfectly written to take care of parallelization (does that word exist?). this means that most computer programs will hit no where near these rates, but super optimized versions of things like SETI-Home and an mpeg encoder/decoder could take advantage of it.

    just remember how many developers complained about the Emotion Engine from the SP2 and how it was such a bitch to program for, this will be worse. it's first gonna require a special compiler or at least a tool to fill the code to all the independent mini-procs and reorder all the instructions to take advantage of it's little quirks. they seem to be a bit different from pipelines, but the some of the same concepts with regards to stalls will apply. so if you're working heavily on one set of data, it's quite possible only one of these mini procs will be used, and the rest will stand there and do nothing.

    i think this is something that'll work much better on a video card and a maybe a soundcard than as a main processor, except in the cases where mostly only media processing is requird. settop boxes, game consoles, tvs, stereo systems, etc.

    1. Re:these are max figures by Anonymous Coward · · Score: 0

      It isn't hard to write for VMX, that's used by many in MacOS X. I believe Cells will be easier to write for, I suspect IBM will have compilers to help, and the point is the things it is good for, will give you incredibly high returns for that investment. Sure if you just writing some simple C program to copy files from one place to another, it's a waste of time. But doing number crunching, stream processing, etc., it isn't that hard....

    2. Re:these are max figures by Anonymous Coward · · Score: 0

      that's making the assumption that they work similar to SIMD instructions. i've yet to see a very clear description on how it all works, but i it's most likely not going to be SIMD. so it's going to be more like programming for multiple very very limited cpus and there will be tons of synchronization / stall issues depending on the exact implementation.

    3. Re:these are max figures by adam31 · · Score: 2, Informative
      these are max figures assuming software is perfectly written to take care of parallelization ... this means that most computer programs will hit no where near these rates, but super optimized versions could take advantage of it...just remember how many developers complained about the Emotion Engine from the SP2 and how it was such a bitch to program for, this will be worse

      This is essentially what happened with the PS2. 1st gen game teams thought the compiler would handle more of the task of keeping the vector units and GS busy. Didn't happen. However, PS2 teams have learned a lot of valuable lessons in the past 5 years to prepare for this jump. PC developers are going to have a horrible time trying to get performance out of the Cell.

      Most notably, devs learned that the PS2 is bus-bound. With only 16kb caches, memory-layout is paramount to avoid requisitioning the bus every 100 cycles to refill both caches just for a vtable look-up and jump.

      So the Cell forces programmers to think in this paradigm. No caches, just 256k local storage for threads. So your performance will only suffer if you fail to learn the new principles... no more cache-agnostic coding. No more memory accesses to any random place in memory.

      Then programmers figured out that the VU0 was basically sitting dormant the entire time. That's a third of the total proccessing power wasted. Little by little, they started moving tasks to the VU0-- skeletal animation, particle dynamics. The problem was that the VU0 only had 4kb of local memory, so between loading a microprogram, double-buffering memory for DMA in/out, and running the damn thing, the EE couldn't do anything useful besides babysit.

      The VU1, OTOH, was a totally different beast. It took over responsibility for the T&L stage in rendering. It had 16kb of memory and could consume a chained DMA stream of microprograms to run and the associated memory (basically a series of display lists). Once you wrote your display-list chain to a buffer and began DMA, it required no babysitting at all. Without a doubt, You can see where the design decisions behind the Cell are coming from. The PS2 was basically just a prototype Cell to see what worked.

      And here are the results. They ditched the VU0, multiplied the number of VU1s by 8, and gave each one 16x the memory, jacked up memory bandwidth. The EE (PPC) is now officially the arbiter of threads... Except now the SPEs are capable of generating execution chains-- i.e. one produces, another consumes, so the PPC doesn't need to have all the brains.

      Another interesting thing is that while certain large portions of the render loop need to be executed serially (Game Logic, Animation, Collision/Dynamics, then Render), many operations within those category can be parallelized. For instance, devs have resorted to huge hacks to make AIs look like they're running in threads when they're really not. It's simply been the case that games were definitely only running on one processor, and context switches are expensive. It's actually much more convenient to have multiple cores and turn these hacks into actual LWPs. The real question is how is Sonyibm going to handle concurrency? Are they going to write a special pthreads for the PPC threadmaster? Are they going to use chainable microprograms, where the PPC is just a glorified VIF? That is the big question eating at me right now.

      But if you look at the latest games-- Jak3, GT4... they're hitting pretty much near the theoretical limit of the PS2, no matter how unlikely that seemed just 5 years ago. The first half of the Cell learning curve has already been traversed by those brave PS2 freaks, it's up to the rest of us to learn from where they've been.

    4. Re:these are max figures by be-fan · · Score: 1

      All the indications I've seen will show that the SPEs will be programmed via a "job" model, not a thread model. So you have jobs ("cells") that have some code and some data, you ask the OS to ship it off to an SPE, and then go do something else while you wait for the results.

      --
      A deep unwavering belief is a sure sign you're missing something...
  37. Re:x86 compatibility? by 91degrees · · Score: 1

    One one core you can run 4 single precision floating point operations at once, OR 4 32bit-sized integers OR 8 word-sized operations, OR 16 byte-sized operations...

    All well and good, but they must be non-dependent. If operation 2 depends on the result of operation 1,or we have a lot of branching then you're dividing that performance by 4 or 8 or 16. This sort of result is not all that common for most applications that need this sort of performance, but it does happen.

  38. Re:Cool, as a co-proc by gabebear · · Score: 1

    It looks like the Cell will be remarkaly crappy for general purpose calculations (regular integer math, etc), but does that matter? Anything that can be vectorized and/or parallized will run really well on the Cell.

    My school's 2000Mhz machine running XP feel slower to me than my old 200Mhz machine running 98. IF they write/pick the OS/software for the Cell appliances correctly I could see it making some headway as a desktop replacement. If most monitors/TVs are shipping with a good office suite and web browser then how many people are going to spring for a regular computer? I doubt it will happen but it's not out of the realm of possibility.

  39. Analysis... by Anonymous Coward · · Score: 0

    I did my own analysis: I think this is going to be a big deal. Get it one of the following:

    http://homepage.mac.com/dke/.cv/dke/Public/CellP ro cessor.pdf
    http://www.mymac.com/fileupload/CellPr ocessor.pdf
    http://www.igeek.com/CellProcessor.pd f

    1. Re:Analysis... by JawzX · · Score: 1

      Nicely written paper. Easy to read and not TOO technical. +Interesting, +Informative. Maybe a little Pro-Biased, but well done. (you need to not trust the spell check so much though.)

    2. Re:Analysis... by Anonymous Coward · · Score: 0

      Thanks... The grammar mistakes are my own. (Its It's, their there, dropping words, it's my style ;-)

  40. Massively Parallel Promises by Doc+Ruby · · Score: 3, Interesting

    The real promise of these Cells is Internet MPP. IBM (and Sony) claim that Cell PCs will be able to cluster "natively" across Internet-latency TCP/IP networks, like broadband. If they deliver on that, then performance questions will revolve around interoperable network apps, not just the raw CPU HW.

    Intel's Pentium architecture was built to accomodate 6-way direct CPU interconnects. The idea was to build "cubic" structures for MPP computers. It took until the P4 to really deliver any of those, almost 10 years after the architecture was released. And the software is still bleeding-edge, and hand-rolled for each install. MPP SW techniques have evolved a lot since then, so perhaps the Cell will actually deliver on these "distributed supercomputer" promises.

    --

    --
    make install -not war

    1. Re:Massively Parallel Promises by Anonymous Coward · · Score: 0

      The real promise of these Cells is Internet MPP. IBM (and Sony) claim that Cell PCs will be able to cluster "natively" across Internet-latency TCP/IP networks, like broadband. If they deliver on that, then performance questions will revolve around interoperable network apps, not just the raw CPU HW.

      Yes, it will be just like existing HPC clusters except the links will be high latency and low bandwith.

      Hang on a minute! How well would an HPC cluster work if it had high latency, low bandwith links between all of the members?

      I believe that if you actually think this through, instead of parroting Sony's buzzword-bingo hype machine, you will come to the staggeringly obvious conclusion that it won't.

    2. Re:Massively Parallel Promises by Doc+Ruby · · Score: 1

      SETI@Home has a MPP model that tolerates long latency. There are several message-passing algorithms that handle those constraints. The real question is whether the Cell will run SW that is interoperable enough to bring MPP to the masses.

      --

      --
      make install -not war

    3. Re:Massively Parallel Promises by Anonymous Coward · · Score: 0

      The real question is whether the Cell will run SW that is interoperable enough to bring MPP to the masses.

      The real question is whether the masses want MPP, no?

      As an experiment why don't you pick a random member of the public and try explaining to them what MPP is and why they would want it?

      I thing you will find that most people don't know what they want. Although if sufficiently prompted, they might like fries with that.

      Also, I find your claim that SETI@Home is an MPP application ridiculous. Or do you think that there is direct communication between the nodes (meaning end-users PCs in this case)? SETI is just a client/server setup - you don't need a massively over-hyped new processor architecture to take advantage of it.

    4. Re:Massively Parallel Promises by Doc+Ruby · · Score: 1

      People don't want "MPP", we want a killer app that depends on distributed MPP. We want a LAN game that gets all the networked CPUs to generate a single realistic 3D scene that we all share. We want our multimedia terminals to gang up on noise processing on the single stream we're sharing. We want our office, home and mobile CPUs to cooperate at whichever task we ask the nearest to initiate. We want all kinds of new apps that benefit from total utilization of all the horsepower. The experiment that matters is the release of an app like that, and its popularity. MPP is a means to an end for the masses, an end in itself only to some geeks like us.

      The point of SETI@Home is that it handles the latency. There's no reason it can't be direct P2P. And it is clearly MPP, though apparently not the one *you* want. But I don't hear how the kind that meets your narrower definition (whatever tht is, exactly) would work.

      --

      --
      make install -not war

    5. Re:Massively Parallel Promises by Anonymous Coward · · Score: 0

      Bah and double bah. I don't give a rat's ass about hardware - there is no software that exists that provides necessary abstraction for the set of problems that plague internet cluster computing. Parallelization, fault tolerance, locality and distribution, result aggregration, resource ownership... the problems go on and on and on and a new CPU solves none of them.

  41. Re:Reminds me of Chuck Moore's 25x multicomputer c by Anonymous Coward · · Score: 0

    If use of this processor became common, it could change the way we approach common problems, problems whose current solutions we take for granted. Instead of doing the same stuff faster, we might be doing the same stuff differently. Fractal based compression might come into normal everyday use, for example.

  42. Re:x86 compatibility? by PureCreditor · · Score: 1

    >> What good is a new chip, no matter how fast it is, if you can't run anything on it?

    This is the good ol' anti-new-architectural speak. A new architecture is not necessarily a bad thing, provided :

    1) it's massively scalable to it's targeted size and hopefully beyond (either large or small)

    2) it's easily portable

    3) it's architecture doesn't have a super bottleneck (namely, x87 float point stack)

    Apple managed to embrace MacOS from 68K to PowerPC.

    HP wrote HP-UX for Itanium (non-emulation mode).

    Digital went from VAX to Alpha.

    also, just because an architecture can run everything on it doesn't mean it's successful. say, Transmeta. They're 100% capable of x86 execution, and promised support of multiple architures through virual emulation onto the native 256-bit Crusoe system. end result? a plain ol' x86 architecture with an emulation fat padded on.

    Apple has good reason to embrace Cell, primarily because they wanted their machine to be a multimedia hub, and the Cell processor is perfect for that goal. Different cells will process different items of the system, and share idle resources. This doesn't mean Apple *needs* to switch MacOS totally over to Cell. Keep a generic PowerPC as the general purpose processor, but distribute multimedia code to different cells, thus freeing up the main PowerPC for non-vectorizable tasks.

  43. Single Precision Rounding error by BobPaul · · Score: 2, Interesting

    Besides, 256 GFlops in single-prec. [realworldtech.com] can't be too bad either...can it?

    Unfortunately single precision number ignore certain rounding conventions in order to boost the speed. You'll get super fast single precision results, but they won't be as acurate as on other systems. Probably won't matter for physics rendering in a video game (Sony's Emotion Engine did the same thing) but it could make a big difference when applied to general purpose situations.

  44. Sure thing by Reanimated · · Score: 2, Interesting

    5 years ago the "Emotion Engine" from Sony was supposed to "steal a chunk" of the PC processing market. Didn't happen. Won't happen.

  45. Is there another link? by skeptictank · · Score: 1

    All I can get from the links in the headers is a page in chinese.

  46. Re:Cool, as a co-proc by fitten · · Score: 1

    The CELL on the otherhand will have the instruction ordering done in software. All those 'bits' you describe are replaced with software: a much smarter compiler.

    Yes this processor will perform poorly with today's code. With appropriately written code it will scream.


    Hmmm... seems like I've heard this before... oh yeah... Intel's Itanium.

  47. Re:Cool, as a co-proc by BobPaul · · Score: 2, Informative

    IF they write/pick the OS/software for the Cell appliances correctly I could see it making some headway as a desktop replacement.

    Which is the key, exactly. As Linus wrote in one of his linked form posts (from the blurb) it's gonna be a pain to program general purpose for those vector units (SPEs).

    However, judging from the main review, it doesn't look like the PowerPC Element was casterated too much. It looks like it'll suffer from Pentium4 syndrome (boosting the frequency doesn't do as much as it used to) so it might not be as good as an equally clocked Power5 based processor, but I think you're looking too much at the SPEs when considering whether or not it'll compete with the x86 and Power5.

    Right now, there aren't x86 and Power5 chips at 4+Ghz, and looking at Intel and AMD's roadmaps, there probably won't be for quite a while. Even if this thing is horribly inefficient for general tasks, it'll be great for Graphical/Video work, great for Physics/Scientific work, and probably at least as fast for everything else as a single core P4 3.8Ghz (which does a better job melting candles than it does holding them, most of the time).

  48. Programming secondary processors by Anonymous Coward · · Score: 0

    Writing code for the secondary processors will most likely be writing microcode that will be downloaded to the processors along with the data to process. It will be completely different from writing your typical application code. I'm sure Apple, Adobe, and the large 3d software companies will have the ability to make use of it, but the only way most of us will make use of it will be through libraries providing very specific functionality. That is, of course, if they ever release enough tech details about the processors to allow for us "norms" to develop on it.

    I think initially it will be libraries like the SIMD library mentioned here the other day (http://www.pixelglow.com/macstl/) that might make use of it. However, unlike AltiVec or Intel's SIMD functions, I don't think it will be possible for GCC to automatically make use of the extra processors. We could probably write an amazingly fast MP3 encoder, but if it's only single precision floating point, then maybe we won't.

    Anyhow, don't get your hopes up that this magic CPU will make all your compiles go faster.

    1. Re:Programming secondary processors by 21mhz · · Score: 1

      We could probably write an amazingly fast MP3 encoder, but if it's only single precision floating point, then maybe we won't.

      Why? Single precision floating point can accomodate up to 23 bits of precision, and full 24 if you consider the sign (all sound applications should use zero-centered FP samples because floating point becomes more precise towards zero). Sure many modern digital sound systems are exactly 24 bit so there is no margin for errors, but the lowest bits are for marketing and bit-padding purposes anyway.

      --
      My exception safety is -fno-exceptions.
  49. Re:Reminds me of Chuck Moore's 25x multicomputer c by fitten · · Score: 1

    Yeah, that 25x reminds me of a CM-2 (ConnectionMachines), the main difference being that I (and others) actually wrote and ran code on the CM-2.

  50. Here's a more accurate review by YU+Nicks+NE+Way · · Score: 3, Informative

    You may not like Michael Kanellos usually, but I think he's hit the nail on the head here.

    This is a bigger, hotter, less stable chip with an exotic and hard to write-for architecture. That's fine for a gaming system with a dedicated revenue stream and no competition. It's not gonna make it outside that domain.

    1. Re:Here's a more accurate review by bnenning · · Score: 1

      This is a bigger, hotter, less stable chip with an exotic and hard to write-for architecture.

      We should reserve judgment on the "hard to write-for" until we actually have details. This alleged sample code doesn't look too bad.

      --
      How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
  51. Re:What's the point? - What are your assumptions? by thpr · · Score: 3, Informative
    Changing just the CPU without making substantial (and expensive) changes to the rest of the system will not magically give you more performance.

    Substantial changes, maybe. Expensive? Perhaps not. This all depends on the base assumptions from which you operate. One of the fundamental assumptions in today's existing systems is that any and all work should be done to maximize the utilization of the CPU. However, when considering how to design other types of systems, such may not be true (it may make sense to minimize the memory footprint, for example).

    If you've ever done some detailed algorithm work, you will quickly realize that there are many algorithms where you can make tradeoffs between memory and CPU time. The 'simplist' of these are the algorithms that are breadth first vs. depth first, which can trade off exponential in memory vs. exponential in time. [For a 'trivial' example, try forming the list of all operational assignments containing 6 variables and which use %, +, -, *, /, ^, &, ~, and ()... less than 50 lines of perl and you'll quickly blow through the 32-bit memory limit if written depth first, or take overnight to run breadth first]

    The significant question which has been brought up - and which remains unanswered - is what software development tools will be made available. Once this is better answered, we will all be in a better position to determine what fundamental assumptions have been changed, and therefore how we can follow the new assumptions through to conclusions about the net performance of the processor and machine in which it is contained.

  52. ColdFusion Server by zardo · · Score: 1

    The ColdFusion server with linus's comments is down. Is this any surprise? What did he have to say?

  53. No more Moores Law? by hachete · · Score: 0

    what it says. Surely this is an implicit admission that Moore's Law has finally been laid to rest.

    It's just not economically viable spending time trying to squeeze more power out of the current methodolgies.

    --
    Patriotism is a virtue of the vicious
    1. Re:No more Moores Law? by Anonymous Coward · · Score: 0

      that depends upon how you want to really define Moore's Law. this is basically throwing a ton of limited processors onto one chip and letting them handle simpler tasks. similar to hyperthreading, but i believe even more limited in terms of memory access. (hyperthreading is limited to performing integer math i believe) if moore's law is applied in terms of how many processing units you can throw on one chip, then maybe.

      however, as pointed out repeatedly here in this thread, don't expect to be buying these as main processors for a while, as they're meant to process parallelized information. things like encoders and decoders as normal applications will stall most of the many processing units, so they basically just sit there and go to waste. sort of like using a dual processor machine with win95/98...

    2. Re:No more Moores Law? by rlp · · Score: 1

      Great point! As long as processor speed kept increasing, developers could be sloppy with code - and faster processors would result in large monolithic single threaded apps running faster and faster. With new chips getting more parallel rather than raw higher clock speed - the advantage shifts to more sophisticated multi-threaded apps. So Web servers will get faster, but (most) conventional office apps will not. Companies may even have to invest in tuning their code.

      --
      [Insert pithy quote here]
    3. Re:No more Moores Law? by Anonymous Coward · · Score: 0
      No, it's a confirmation of Moore's law.

      Both Sony's Nagasaki fab and IBM's East Fishkill fab are leading edge fabs capable of the smallest geometries. Currenty, both very expensive fabs are running 65nm processes and I don't think we know yet how far that'll be improved upon for higher volumes over the next couple years.

    4. Re:No more Moores Law? by ReelOddeeo · · Score: 1

      So Web servers will get faster, but (most) conventional office apps will not. Companies may even have to invest in tuning their code.

      NOOOOOOOOOOOOOOO!!

      We're addicted to the upgrade treadmill.

      Wouldn't it be preferable to just keep trying to push up the clock speeds, even artificially high clock speeds even if it meant lower actual performance, while at the same time building ever more bloated software applications?

      Think of our poor corporations! What will happen to the econoomy if they are forced to start tuning their applications? What if people aren't forced to keep buying bigger/faster new PC's, and then forced to buy new software? Having to innovate to use more parallel processing might put our corporations at a major competitive disadvantage? It might affect their business models of stamping out CD's and raking in profits!

      --

      Those who would give up liberty in exchange for security and DRM should switch to Microsoft Palladium!
    5. Re:No more Moores Law? by madprof · · Score: 1

      Not so sure - isn't it close to the doubling of transistors?

  54. If you think Windows sucks... by Nunsexmonkrock · · Score: 1

    Beware the coming SonyOS PC. If you've ever used any of Sony's PC software (bundled with Vaios, Camcorders, and such), you'll know exactly what I mean.

    1. Re:If you think Windows sucks... by DigiShaman · · Score: 1

      And you can be damned sure that it will be DRM certified. Hell, it will be DRM at the kernel level.

      --
      Life is not for the lazy.
  55. Re:Cool, as a co-proc by Anarke_Incarnate · · Score: 1

    Same story I heard about Itanium. (It will be really super turbo ultra fast if we rewrite everything and toss out everything we own. Niche detected

  56. OSX Tiger & Longhorn? by Viewsonic · · Score: 1
    Since both of these OS's are pretty much going to be running over 3D acceleration with glossy effects on every window, would this benefit both of these OS's? People mention this processor wont speed up your normal desktop stuff, only big 3D apps and whatnot. Well, all the next gen desktops ARE big 3D apps for the most part now.

    No?

    1. Re:OSX Tiger & Longhorn? by Anonymous Coward · · Score: 0

      Not longhorn. Probably never. Microsoft's OS is tied with the fate of x86 for the forseeable future.

      It's true that NT would run on PowerPC, but that was long ago and porting to PowerPC would mean that Windows would have a OS with no applications. Either that or the transfer would take many years.

      OS X could run on it potentially, but again you would (as a general user) probably experiance a substantial drop in performance over the current dual G5 setup. If it was ever to run on Cell, it would not be any time soon.

    2. Re:OSX Tiger & Longhorn? by Anonymous Coward · · Score: 0

      It's true that NT would run on PowerPC, but that was long ago and porting to PowerPC would mean that Windows would have a OS with no applications. Either that or the transfer would take many years.

      But the current Xbox development is done on PMG5 running NT. Win NT PPC may have been updated already or may not that far behind for commercial release (if MS decides to). I think it's a question of support, a lack of software and a head-to-head competition with OS X Server.

  57. Re:x86 compatibility? by MrMickS · · Score: 0, Offtopic

    Please don't feed the trolls.

    --
    You may think me a tired, old, cynic. I'd have to disagree about the tired bit.
  58. Maybe... by gUmbi · · Score: 3, Funny

    Since IBM is now involved, should it be called the PS/3 instead of the PS3?

  59. It's actually 2 different kinds of processors by songbo · · Score: 2, Interesting

    My view of the Cell chip is that it's actually 2 different kinds of chips put together. It has a general processor (the POWER5 core) core, and essentially co-processors that are optimized for a totally different class of programs. The POWER5 chip would let it run your normal office applications, but the SPEs allow the chip to do things like graphics processing, audio processing, simulations, etc. All those problems that lend themselves naturally to a vectorizes solution. Together, the 2 kinds of cores on a single chip has the potential to do a lot. But there has to be tools to allow developers to make use of the potential. Especially as vectorized programs are not easy to write and optimize, that makes the quality of the development tools very important in deciding the success of the chip.

    --
    There are 10 kinds of people in the world - those that know binary, and those that don't.
    1. Re:It's actually 2 different kinds of processors by bnenning · · Score: 1

      It has a general processor (the POWER5 core) core

      Essentially correct, but it's not a Power5 derivative.

      Together, the 2 kinds of cores on a single chip has the potential to do a lot. But there has to be tools to allow developers to make use of the potential. Especially as vectorized programs are not easy to write and optimize, that makes the quality of the development tools very important in deciding the success of the chip.

      Right. And it's interesting that the CoreImage and CoreVideo APIs in the next version of OS X are designed to take code fragments and run them either on the CPU or GPU depending on the specific machine's capabilities. Doesn't sound like that would be hard to extend to Cell...

      --
      How to solve most of our problems: 1.Lots of nuclear plants. 2.Cure aging.
    2. Re:It's actually 2 different kinds of processors by herc_mk2 · · Score: 1
      My view of the Cell chip is that it's actually 2 different kinds of chips put together.

      ... which isn't that different, architecturally, from the Emotion Engine -- a main processor, and the Vector Units.

      As has been pointed out elsewhere, this led to the hype (before the PS2 launch) about how the EE was going to be the fastest mainstream processor (at least in terms of MFlops), but it took a long time for the developers to be able to produce code to take full advantage of it. Coupled with the somewhat-lackluster 300MHz MIPS core, it was eventually outpaced by 2GHz P4s (with a separate GPU on the video card).

      As a result, the EE makes for a fine gaming platform, but not too useful for general purpose use these days. As anyone with the PS2 Linux kit will attest, gcc and X11 suffer terribly from the low clock speed and poor integer performance.

      Of course, that's not to say that Sony et al. haven't learned from this -- and having IBM in their corner can't hurt. They're investing a lot of money into this which would be hard to recover by just selling under-$300 gaming consoles...

      Time will tell.

  60. Sign of the Future, but not revolutionary by skeptictank · · Score: 1

    SoCs with a general purpose core attached to special purpose logic are the future of computing. More and more companies are licensing ARM and PPC IP to put in FPGA fabric to control their own custom I/O, DSP logic, etc. Several things don't bode well for the Cell. It appears to be made to work only with Rambus memory - not a good sign. Size, heat and power consumption are the dominant factors when it comes to choosing a processor for embedded apps and the Cell looks like it's gonna have plenty of all three. Finally, there doesn't seem to have been any parrallel work on compiler technology to support this chip - just the standard "ohhh, we'll fix it in software, later" mindset.

  61. The most important is... by Anonymous Coward · · Score: 0

    Is it the first BOGOTIPS(1,000,000 BOGOMIPS) chip ever?
    Being the chip that breaks the record of ammount of nothing done per second surely is sweet!

  62. 250 Gigaflops? by CTho9305 · · Score: 4, Insightful

    People seem to think this is leaps and bounds above everything else, but they're missing the details. In order to obtain that much performance, you'll need a task which parallelizes well so it can be broken up into chunks for the 8 SPEs. Graphics rendering falls into this set of tasks, but a lot of general applications just don't gain that much from parallel processors. Even when you have a task that does parallelize, writing parallel code is quite a bit harder than writing code for just a single thread of execution.

    I've seen a lot of hype about having the Cell in your laptop talk to the Cells in your desktop, microwave, and TiVo, but you have to consider real-world limitations. When you set up a network like that (presumably wireless), you're going to be limited to around 100Mbps. In computer clusters and supercomputers, one of the main limitations of performance is the communcation bandwidth available between processors, and the latency of the network. To build a "home supercomputer", you not only need a task that parallelizes well, but one that doesn't require so much inter-node communication that it's held back by a slow network. You can't work around this problem with hardware magic - if the task you're working on requires lots of communication bandwidth, you're going to be held back.

    So how much beyond a modern PC is 250GFLOPS anyway? Not much! A GeForce FX at 500MHz does 200 gigaflops. An AMD Athlon's peak performance is 2.4 GFLOPS at 600 MHz... if we scale this up to 2.2 GHz (high-end Athlon), that's 8.8GFLOPS (note: As we're talking about theoretical performance, nonlinear factors like bus speeds can be ignored). Basically, if the Cell dedicates most of its power to graphics rendering, you'll have computation power in the same range as a fast PC of today. Given that we're not going to see any products based on the Cell for a while, this isn't going to be the end of the world for Intel and nVidia (let alone the fact that Cell isn't x86).

    Consoles using the Cell will have the advantage of only having to render for TV resolutions - at most 1080 lines, while PCs will be rendering at up to 1600x1200, but if you look at recent history, you can compare the xbox to a then-good PC with a GeForce3 (which came out at around the same time) - the xbox looked better, but PCs did catch up and surpass it's performance and it didn't take all that long. Consoles have to be very high-end when they're released, because the platform doesn't change for 2-3 years, and they still need to be "good enough" after a couple years, before the next generation is released.

    1. Re:250 Gigaflops? by be-fan · · Score: 1

      The GeForce FX's 200 gigaflops aren't all general-purpose though. A lot of them come from fixed-purpose circuits that you can't use for your own calculations. For a general purpose program, you've got about 80 gigaflops, of which you can extract 50-60 gigaflops in real-world programs.

      --
      A deep unwavering belief is a sure sign you're missing something...
  63. Sure it did... by Anonymous Coward · · Score: 0

    ...in the form of PC games.

    Anyone else notice a bit of a drop in the number of titles being produced for the PC in the last few years (or at least being severely delayed to bring out on PS2 first)?

  64. Visual feedback and quality by Anonymous Coward · · Score: 0

    Visual feedback and font quality may be unimportant to you, but to a large number of people not running the applications from text-only terminals, they are very important elements. Along with thing like alpha channels and drop shadows. People want these things.

  65. Time for a Linux-only desktop! by Anonymous Coward · · Score: 0

    Imagine if you will something like the proposed PS3, but built with only peripherals for which there is excellent Linux support. Now imagine the cost savings of building millions of them, perhaps with Walmart backing the project. If this chip lives up to the promise, then we could very well have our killer commodity desktop computer.

  66. Remember by temojen · · Score: 2, Informative

    POWER5 is not the same as PowerPC 970 (G5). POWER5 is a really really expensive high performance mainframe chip. G5 is a server/desktop chip.

    1. Re:Remember by Ohreally_factor · · Score: 2, Informative

      Also the PPC 970 (G5) is based on the POWER4 cpu.

      have you ever seen a picture of the POWER5? It's slightly smaller than a Mac mini.

      --
      It's not offtopic, dumbass. It's orthogonal.
    2. Re:Remember by UranusReallyHertz · · Score: 2, Informative

      Those photos actually show a ceramic multi-chip module containing 4 power5s and 4 cache chips. Up to 8 of em can go into a single chassis. Truly geek porn. Also I've read that the CELL would be about the same size as the Emotion Engine at 25nm. So SONY has already shown that they aren't afraid of using a big chip.

      --
      Smoking is an expensive, slow, and unreliable method of suicide.
  67. You kid, but have a point by DumbSwede · · Score: 1
    Some people may not be familiar with John Conway's Game of Life, though they have probably seen screen savers that demonstrate it. It is not really a game but the unfolding of a Cellular Automata simulation where each grid point state depends on the state of its 8 neighbors by a set of simple rules.

    I actually thought immediately of Cellular Automata when I read some of the specs on the new Cell, and the name may just be a coincidence, but maybe not. It would be interesting to see a Cell architecture where there are 27 Cell sub-processors, because my Life is more than two dimensional.

    1. Re:You kid, but have a point by jericho4.0 · · Score: 1
      The life algo is all integer math, and won't improve much by se of the Cell....

      --
      "A language that doesn't affect the way you think about programming, is not worth knowing" - Alan Perlis
    2. Re:You kid, but have a point by DumbSwede · · Score: 1

      True, true, and any modern CPU is way overkill for "The Game of Life" anyway, but this doesn't mean Cellular Automata have to be written as Integer only. What new routines could you write that comunicate between 27 processors to simulate 3D processes in a Cellular Automata way? Some new Protean Folding algorithms perhaps.

    3. Re:You kid, but have a point by daver969 · · Score: 1

      SIMD units can handle integers. If it only has to count up to 8, that's basically 4 bits, so a 128 bit wide unit might could do 32 ops a pop!

  68. Re:econ0meter: beyond the bs, sell by Anonymous Coward · · Score: 0

    I am dumping my MSFT shares this afternoon before Bill dumps even more of his shares. I have friends in Redmond who indicated me two weeks ago already that the rumor kitchen is cooking that STI(Sony, Toshiba, IBM) have declined Microsoft's request of collaboration in porting Windows to the Cell platform. The Cell has the potential to become the big new thing and even if Bill does not want to admit it he will try to get out some his money in time.

  69. Not only but also... by stephend · · Score: 1

    This weeks I, Cringely also touts cell processors as the NextBigThing.

  70. OpenMP by Anonymous Coward · · Score: 0

    http://www.openmp.org

  71. What about AI? by Anonymous Coward · · Score: 0

    The CELL SPUs might be wonderful for all of the graphics processing, but what about the other stuff? Are they at all suitable for collision detection, AI, pathfinding? If all of those tasks need to remain on the central processer, then games will just become limited by the central processor in AI code, instead of polygon rasterizing.

    1. Re:What about AI? by mabinogi · · Score: 2, Interesting

      Well, considering that there's going to be a dedicated graphics chip from nVidia in the PS3 too, I'd imagine that the SPUs are designed specifically with all that stuff in mind...

      --
      Advanced users are users too!
  72. Re:I'll believe it when I see it -- IBM SW patents by Anonymous Coward · · Score: 0

    I agree that the potential is exciting and have a further thought. Could the development of the Cell processor have anything to do with IBM Opens Their Patent Portfolio to Open Source? This would seem to foster porting to the new arch...

  73. Re:hell ya, cheep awsome computers! by king-manic · · Score: 1

    How much will a PS3 cost to manufacture?
    If I was a computer company, I could buy them without the game-specific stuff, load on linux, and sell them as cheep alternative computers.. but that's just me. (assuming linux and friends are compiled for CELL in the next few months of course).


    The problem wiht that is a ps3 won't be anywhere close to a GP machine. It's going to require a lot of driver tweeaks, a load of hardware reconfiguration, defeat the drm. By the time someone figures hwo to do that cheap, computers will already be more powerful at a similiar cost so theres no incentive except nerd prestige.

    --
    "There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."
  74. FP rounding mode by 21mhz · · Score: 1

    The article also points out that the SP floats aren't truly 754-compliant, as they round-toward-zero on cast to int.

    As far as I remember from implementing the spec years ago, the rounding mode can be varied. Indeed there are C runtime functions on many platforms that set this and other properties for floating point operations.

    --
    My exception safety is -fno-exceptions.
    1. Re:FP rounding mode by nothings · · Score: 1

      Not just round to int. FP rounding mode refers to how all operations round the bottommost bit of precision--IEE754 specifies that operations must (effectively) be copmuted with sufficient extra precision at the bottom to then round in the various directions.

  75. To the putz who submitted this news post: by Hannibal_Ars · · Score: 5, Informative

    If you're going to rip the links out of one of my Ars news posts and submit them to slashdot (in the same order in which I linked them, no less), then at least credit your source.

    --
    Senior CPU Editor | Ars Technica | http://arstechnica.com/
    1. Re:To the putz who submitted this news post: by rawgod0122 · · Score: 1

      OK. For those of you who are not familiar with Jon "Hannibal" Stokes, he Knows his stuff about CPU architecture and provides a great service to hobbyists and programmers (such as myself, as I do scientific computing).

      OK so if you want to understand pipelines and why longer ones are not always better?
      http://arstechnica.com/articles/paedia/cpu/pipelin ing-1.ars

      No? Well how about caching?
      http://arstechnica.com/articles/paedia/cpu/caching .ars

      And then there is the Opteron/Athlon64 goodness
      http://arstechnica.com/articles/paedia/cpu/amd-ham mer-1.ars

      Historical view of the Pentium Architecture?
      http://arstechnica.com/articles/paedia/cpu/pentium -1.ars

      For more great articles from him (and others)
      http://arstechnica.com/paedia/

      Also dont forget about MIT open courseware to fill in the gaps...
      http://ocw.mit.edu/OcwWeb/Electrical-Engineering-a nd-Computer-Science/6-823Computer-System-Architect ureSpring2002/CourseHome/index.htm

      And how can you not have love for
      http://slashdot.org/comments.pl?sid=138810&cid=116 17250

      Keep up the good work!

    2. Re:To the putz who submitted this news post: by Hannibal_Ars · · Score: 1

      Ok, this appears to be a hasty mistake on my part. The author has emailed me and shown that these two links were posted in the same thread in the Beyond3D message board, which he frequents. He also claims to have put other links in the post, which were then edited out by the /. guys. I'll give him the benefit of the doubt, and I apologize for jumping to conclusions and calling him a putz.

      (P.S. I know this was sort of unprofessional behavior on my part, but this kind of thing happens all the time and every now and then it gets to you and you fly off the handle a bit. Sometimes, we embedd little joke codes in the URL, if the linked sites CMS permits it. So if you see a link with NO_YUO at the end or something, then you know where it came from.)

      --
      Senior CPU Editor | Ars Technica | http://arstechnica.com/
  76. Re:x86 compatibility? by gravygraphics · · Score: 1
    FLOPs are a better measure, as a divide is a divide and a multiply is a multiply no matter what chip architecture you use.

    A FLOP is a FLOP, except that the SPE's don't quite round in IEEE approved ways in order to decrease logic complexity... so it is a *little* different.

  77. Cell Hype by Anonymous Coward · · Score: 0

    Although this looks awesome on paper now, we all have to remember that this won't be out for some time. I have no doubt this will be fast, it all will be decided by it's scalability and price in the end.

    By the time the cell processor is on shelves, has a operating system that can somehow thread off or pipeline everything effective through it, it'll be 2 years. In 2 years, there will be dual core/quad core/and probably 8 core/ and there will still be SMP. So we could very well have anything between 2 and 16 cores in the future from AMD or Intel.

    I personally like the idea of having lots of chips that are full of cores that can handle complex tasks. The competitors feel the heat, and what cost Billions to develop years ago, now costs millions. Sony made the step, but Amd and Intel have never felt bad about copying each other or ibm in the past, and trust me, by the time cell is a commercial threat, there will be plenty of reasonable competition.

    Where are the /. that say dual cpu/core amds available soon? Better than any more of this sony hype.

    1. Re:Cell Hype by Anonymous Coward · · Score: 0
      I remember when back around 1993 Motorola announced the PPC 601 which was going to revolutionize desktop computing. Cheap, fast, cool, and low cost.

      Here we are 12 years later, and I'm still waiting. From experience, these announcements never amount to much in the long run. Other than some specialty items and "boutique" computers, this stuff has no commodity general purpose application.

      In its realm, it will do OK. It will probably be hot for FFT and such. Might make a good computer for signal processing. I imagine the NSA will be buying a few Sony game consoles.

  78. Context switching by ekc · · Score: 1

    This might not be an issue for a game console, but for a workstation, wouldn't the Cell's context-switching overhead be rather huge? From what I have read so far, it seems each SPE has 256 KB of local storage, plus another 2 KB for the main register file (128 16-byte registers) and whatever other state information it needs. Since there are 8 of these things, we're talking over 2 MB of state to swap in and out. Would that still be considered a drop in the bucket for most scheduling schemes?

    1. Re:Context switching by be-fan · · Score: 1

      The SPE doesn't context switch. The OS does regular scheduling on the PPE, which can execute 2 threads at the same time. The SPEs, however, are scheduled in batches. Each runs autonomously (in its own thread), and runs the same task until its completed. So when the PPE context switches, the SPEs can still run whatever they are running.

      --
      A deep unwavering belief is a sure sign you're missing something...
    2. Re:Context switching by king-manic · · Score: 1

      This might not be an issue for a game console, but for a workstation, wouldn't the Cell's context-switching overhead be rather huge? From what I have read so far, it seems each SPE has 256 KB of local storage, plus another 2 KB for the main register file (128 16-byte registers) and whatever other state information it needs. Since there are 8 of these things, we're talking over 2 MB of state to swap in and out. Would that still be considered a drop in the bucket for most scheduling schemes?

      Do you really need context switch when you have 1 real processor and 8 mini processors on di? It provides a semi- hard limit to multi tasking but context switchign is only req when you assign tasks that cannot be divided up amoung the 8 mini-CPUS and the main cpu on di. So only when you have a crazy amoutn of active proccesses that require contant attention. But you can imagine a system where the OS is ont he main, and the 5 most important processes run on the first 5 sub chips while the last 3 handles anythign with context switched liek daemons ect..

      --
      "There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."
    3. Re:Context switching by shieldforyoureyes · · Score: 1

      Does this critter actually have 9 MMUs?

  79. ATTN: Karma Whores by Anonymous Coward · · Score: 0

    Can you please post a non borked, slashdotted link to this please? Christ, the top fucking like, 5+ stories are all non-mirrored and screwed up, just my luck when there is actually some shit that's worth reading on this little DoS operation disguised as a news for nerds portal.

  80. Stream processors are not general-purpose by Anonymous Coward · · Score: 0

    Stanford may be in a tizzy over their "stream processor" concept but people will have to learn to grasp the reality that not everything will benefit from that kind of architecture.

    How many programs, even parallel ones, have small algorithms that work on large streams of data? Even supercomputers need large amounts of RAM per CPU.

    This cell is more reminiscent of older DSPs. Great at what they do, but you aren't going to accelerate M$ Word and destroy Intel with it.

    1. Re:Stream processors are not general-purpose by Anonymous Coward · · Score: 0

      People feel all tingly inside with theoretical performance, so it's quite difficult to tell them anything else.

      Imagine a 256Jiga flops Cell running GNU/Linux...

  81. Linus? by Anonymous Coward · · Score: 0
    Linus seems interested in CELL, too.

    Linus in interested in giving anal to Miko Lee, too, but that doesnt mean anything is going to happen because of it.

  82. It WILL kick ass... by Anonymous Coward · · Score: 0

    ...if it does work with iLife.

  83. think to the future by coult · · Score: 2, Interesting

    Most of you are thinking of today's applications...but what about things like eye/head tracking, voice recognition, face recognition, telepresence, real-time cinema-quality CGI, etc...those are tasks requiring large-scale numerical computation, and they all might appear on your desktop in the not-too-distant future thanks to chips like CELL and its future ancestors.

    --

    All is Number -Pythagoras.

  84. Re:Cool, as a co-proc by randyest · · Score: 1

    This chip is not going to compete with other general purpose CPUs. It's going to compete with custom ASICs and FPGAs.

    No, it won't. Who uses an ASIC or FPGA to make just a processor? No one, that's who. Processors are often embedded in ASICs (and sometimes even FPGAs) along with lots of other goodies. If you just need a processor, you buy an off-the-shelf ARM, VR, or any of the dozens available. You don't spend the bucks to make an ASIC. This may compete with off-the-shelf processors and some ASSP (App Specific Standard Product) but not ASIC.

    --
    everything in moderation
  85. IBM by simpl3x · · Score: 1

    "Cell was co-designed by IBM which has an interest in selling workstations etc with that chip..."

    That's conjecture... IBM makes money designing, and fabbing chips more than in PCs, as the selling of the division attests. But, could Sony be one of the PC outfits interested in licensing a compatible version of OS X for the living room? Network workstations running the beast might be of interest to IBM however. Does your cash register really need Windows?

  86. How hot? by aldeng · · Score: 1

    How hot will this thing be? From TFA:
    One unconfirmed report claims that at the extreme end of the frequency/voltage/power spectrum, one sample CELL processor was observed to operate at 5.6 GHz with 1.4 V Vdd and consumed 180 W of power.
    I'm not sure about the die dimensions or what kind of cooling system will be used, but isn't that a lot of heat to dissapate? I may have to trade my P4/toaster oven in for a Cell if they're hot enough.

  87. Hypercomputer? by Anonymous Coward · · Score: 0

    I've yet seen an article on how this "supercomputer on a chip" could play a role in the supercomputer market (extra-supercomputer? hypercomputer?) If you can have 25-30 GFLOP/chip, what is System X-like cluster like? The raw number is 25 GFLOP/chip*2chips/server*1000 servers = 50 TFLOP. Assuming 80% efficiency, that is still 40 TFLOP for a low price supercomputer and low power requirement. What if IBM actually build supercomputers (not clusters) using this chips?

    Any comments from supercomputer experts? Does Cell make good supercomputing processors?

  88. 256MB / 512MB memory limit? by loose_cannon_gamer · · Score: 1

    I'm not going to claim I completely understood the XDR memory controller section, but limiting the whole chip (SPEs + PPE) to 256MB seems to effectively rule out any high-end workstation in the immediate future.

    Additionally, although there is much speculation about what the processor can run, it is pretty obvious it will not run x86 code. So in order to compete (or take over) in the PC business, it will have to do what every other new architecture in its position has failed at -- overcome the x86 existing compatibility requirement. How could that happen... Hmmm...

    1) have a better instruction set -- uh, no, other processors have essentially the same instruction set. And while the relative beauty of an instruction set is often a matter of preference, few people find x86 beautiful.

    2) have a higher clock rate? Well, maybe, but not really. Due to a nearly complete (albeit intentional) lack of branch predictive hardware with an 18 cycle flush penalty, it seems clear that a P4 4.0Ghz will smash an individual SPE at 4.0Ghz.

    3) be cheaper and cooler? Here, we have a cell processor at 4Ghz and about half the power consumption. A definite potential win. Die size seems comparable, so production costs will be, too, in all likelihood. Packaging might be more complex for the cell, I'm not sure.

    4) be more parallelizable? this is the only area in which cell can stomp every other chip. Sure, an 8-way opteron system will beat a single cell processor, but who builds 8-way opterons? And who can do it for the price of one cell system?

    I don't think anyone doubts the potential power of these processors. But I think there is a long software bootstrap process to be undertaken before we see mainstream cell desktops. Businesses won't write cell code for consumers unless consumers have cell machines, and consumers aren't going to buy cell machines unless the machines run their apps. The catch-22 that has doomed every new and superior instruction set since the x86 original (backwards compatibility with current software) will likely hold back the cell as well. However, the cell brings something new to the table -- the promise of more raw power.

    I think that this bootstrapping process will therefore most likely be driven by the big vendors who need the processing power. Alice and Bob can instant message just fine without one, but if you're building a renderfarm, the power and saved time is maybe going to justify the costs of porting your key apps.

    As a side note, if it really does run linux soon after release, one must wonder if the killer app some have been looking for for years to move people from windows to linux may in fact be a killer processor. If you can beat the performance of any x86 windows box by an order of magnitude with a cell linux box, that's an argument that really hasn't been made before.

    But right now, it looks like the only real software development market for the cell is going to be the high end workstation / performance chasers, and they need more memory than cell can deliver. I'm not holding my breath for the moment.

    Either way, it'll be interesting. :)

    --
    In Soviet Russia, us are belong to all your base.
  89. PS3 operating system? by Anonymous Coward · · Score: 0

    It sounds like an ideal candidate for a propietary hobby operating system with real time multitasking, preferrably coded in asm.

    No more time slicing the CPU, let 1 SPE do the network, 1 SPE for sound and so on, all in parallel and real time. Leave out 1 SPE to be time sliced between all other non-important background programs.

    8 simultaneous processes is much for a Personal computer.

  90. Sorry about that by Anonymous Coward · · Score: 1, Interesting

    My apologies,

    I am the editor of Real World Tech, and I tried to warn the folks at our hosting company, but apparently they got caught with their pants down : )

    A good slashdotting never gave us any trouble before...but with our new hosts, something gave out...

    Check it out, it's a damn good read.

    David Kanter
    Editor
    Real World Technologies

  91. Gates by glrotate · · Score: 1

    I believe he is selling so that AIDS patients in Africa can live longer and spread the disease even further, allowing the virus to mutate and becoming more virulent in the process.

    He's very clever that Bill Gates.

  92. Would be a nice addition to the Opteron by Anonymous Coward · · Score: 0

    The Opteron has stunning performance --except that it's floating point unit is lackluster. Instead of having a dual core Opteron, you could have a single core Opteron and replace the second core with a cell processor, giving a processor with outstanding I/O performance, 8 way SMP (without glue chips), and an integrated Northbridge chip (memory address controller). With the addition of being able to gang processor registers together (much like IBM's VIVA) so as to provide a Virtual Vector processor (like IBM's), you could get oustanding 512 bit Vector performance too (instead of relying on SIMD or streaming simd extensions (SSE).

  93. Is there a Linus watch anywhere? by totierne · · Score: 1

    Is there a Linus watch somewhere so mere mortals can try learn from the master, by following his web contributions to forums and presentations and email lists?

    The closest I know is:
    http://marc.theaimsgroup.com/?l=linux-kernel& w=2&r =1&s=Linus+Torvalds&q=a

    A Richard Stallman watch would be good too.

    An rss feed, or maybe just a microphone pinned to him, though the keyboard clicks would get annoying, maybe a video feed of his screen....

    No joking matter, it is only a matter of time, even if it will be a distraction from the business of coding!

    1. Re:Is there a Linus watch anywhere? by Anonymous Coward · · Score: 0


      Search for Torvalds, Andy Kleen,...

      http://www.realworldtech.com/forums/index.cfm?ac ti on=search

  94. Noobs by Anonymous Coward · · Score: 0

    The Cell is going to mean the death of x86. The Cell will also, IMO, be the chip where the Mythical Convergence actually happens. I'm not trolling. Everything starts out small in the beginning. If you read the Cell specs, you will see that the chip can run MULTIPLE OS's at the SAME TIME. Also, Microsoft owns VirtualPC which is an x86 emulator for the PPC Architecture. Microsoft is also using IBM iron in their new XBox 2, which is incidentally developed for on a G5. Chew on that for a while, then go back to the IBM literature on the Architecture.

  95. Re:x86 compatibility? by Anonymous Coward · · Score: 0

    Actually, Forth can be ported faster then C.

  96. Thanks by totierne · · Score: 1

    Thanks, are there any other forums, mailing lists and the like that Linus Torvalds or Richard Stalman contribute to?

    Besides:
    http://www.realworldtech.com
    and
    htt p://marc.theaimsgroup.com/?a=105701892400001&r= 1&w=2 [linux kernel list]

    Maybe I am just too lazy to trawl through google or slashdot search but I have not seen this information pop up on my standard web graze...

    Thank you for your time.

  97. Bandwith is more expensive thant processing power by Anonymous Coward · · Score: 0

    It actually makes economic sense to put vastly more processing power in a chip than it has bandwith to supply that power ... even if most of the time that processing power isnt being used, the fact that a small percentage of time it will be still makes you come out ahead.

    Processing power and bandwith should be balanced ... but their relative costs need to be taken into account when determining that balance, just balancing for the average workload is a very naive approach. See the Merrimac design paper, it makes the point better than I can.

  98. What about Nvidia's graphic chipset for CELL? by adsl · · Score: 1

    So we have this neat modular CPU/CELL thing, which is rather faster than anything available for game hardware right now.
    But I understand that CELL partners with the lastest and greatest Nvidia graphic chipset so the real video performance we will get will reside as much in Nvidia's technological capability as the new base CELL.
    So why is nobody talking about Nvidia's role in this hardware and how this will translate in real life performance?

  99. Re:What's the point? - What are your assumptions? by jeif1k · · Score: 1

    If you've ever done some detailed algorithm work, you will quickly realize that there are many algorithms where you can make tradeoffs between memory and CPU time.

    No, sorry, it doesn't work that way. It isn't the total amount of memory that the algorithm uses that matters, it's locality of reference And people already maximize locality of reference in their performance critical implementations because it already pays back handsomely on current processors.

  100. Re:What's the point? - What are your assumptions? by thpr · · Score: 1
    It isn't the total amount of memory that the algorithm uses that matters, it's locality of reference

    I see where you're coming from, as there are reasons why locality helps (cache lines fetched in their entirety, bursts of memory from DRAM to caches, charging of different areas on a DRAM) but I disagree that there are not tradeoffs that can COMPETELY trade memory access for CPU time, regardless of the value of locality. In addition, locality becomes vitally important on Cell, where the additional processors primarily address their 256K of local memory. It will also become more and more important elsewhere as the penalty of going to main memory increases (and you can't achieve a 100% hit rate in an L1 cache).

    Let me expand my original example to show you two (potentially extreme and thus stupid, but none-the-less illustrative) implementations:
    Let's assume we want every assignment a = b _ c_ d _ e where each _ is either +, -, * or /. I will ignore everything else, even though this is not exhaustive (since without parenthesis, it is subject to a language's order of operations). [Note this is actually a 'real' example, in the sense that I was part of a project where were exhaustively testing a compiler for these simple assignments.]

    One algorithm might calculate all possibilities of x _ y (there are 4) and then substitute b for x, c for y and d for x and e for y for each of the permutations (4 x 4 = 16 total permutations). The storage of each of the "sub equations" is done in memory, and while this fits in an L1 cache for this trivial example, with more operators and variables, it doesn't (especially if you go to 8 variables and are therefore caching permutations of w _ x _ y _ z)

    Another algorithm will treat this like counting. let + = 00, - = 01, * = 10 and / = 11. Start at 000000 and increment up to 111111, at each number using the first pair of numbers for the sign between b and c, the second pair for c _ d, the third pair for d _ e.

    If you expand these examples to a much larger example (do 8 variables instead of 4; N symbols + parenthesis instead of 4), you will quickly realize that storing all of the permutations by doing the precalculation (in the first example) not only blows out the memory, but also (just due to size and the need to merge the permutations) destroys any ability to do the calculation locally in memory. However, the second example, at the expense of recalculating the same thing (specifically, the patterns x + y, x - y, etc.) COMPLETELY eliminates any duplicate reference to memory (since there is one access to 'store the answer' and what this does use - the counter and the code - would all fit in an L1 cache), but this is done at the expense of run time (it will take MUCH longer to run... in the real example I cite above, it took about 100X the runtime, but about 1% of the memory, since we wrote the final results directly to disk in both cases). Yes, I realize there are middle cases that are probabily 'more ideal', but that requires one think about the target architecture, programming language, and lots of details not valuable here.

    This - as pointed out in my first post - is certainly a trivial example. However, there are many other places where this type of tradeoff is valid. Geometric applications and work on graphs [graph theory] (which is where I spend my programming time now) can make a lot of effort to cache values to avoid lots of calls to sin, cos, and tan (geometry) or paths through the graph (graph theory). However, with a surfeit of computing power, should that be done, or should the memory footprint be minimized (to hopefully get better locality of reference)? Those tradeoffs are actively being made with what I am working on. After all, what's the value of code profiling tools if you don't use them :)

  101. PS3 will use Nvidia for GPU!!! by taweili · · Score: 1

    There is a lot of confusion in the discussion. First, Cell won't be used for graphic performance. Nvidia is developing the next generation GPU for PS3. Second, the 250 GFLOPS archived by the 'magic' Cell at 4.7 Ghz can be archive by overclocking the current GeForce FX at 500 Mhz a bit higher. GeForce FS at 500 Mhz already can reach 200 GLOPS. The excitement about Cell is probably the high communication bandwidth it will bring with Rambus technologies. All of these technologies are available today. High bandwidth switching bus has been used in high performance workstation and servers for year. The real contribution of Cell is probably to bring these wonderful to a mass market level with Sony's PS3 launch.

  102. Re:Cool, as a co-proc by gabebear · · Score: 1

    so it might not be as good as an equally clocked Power5 based processor

    Man, you do realize the Power5 is F#$%ing FAST. The "PowerPC Processing Element" looks like it won't have much in common with the Power5 at all. I doubt you will get 1/100th the speed of the current Power5s for generic PowerPC programs on the Cell.

  103. Metamod Notification by Anonymous Coward · · Score: 0

    Cell processor does not yet exist, therefore the post was not Offtopic. You should have marked it Flamebait or Troll.

  104. Re:What's the point? - What are your assumptions? by Anonymous Coward · · Score: 0

    This - as pointed out in my first post - is certainly a trivial example. However, there are many other places where this type of tradeoff is valid

    The bread-and-butter problems of high-performance computing usually deal with datasets that are much larger than the cache, and they need to access them repeatedly. That's true for dense matrices, sparse matrices, graphs, speech signals, images, etc. Problems whose runtime is dominated by calls to time consuming single-variable functions that are not themselves dependent on a lot of data are quite rare.

    These relationships are not a lucky coincidence; rather, chip designers look at existing codes and they do simulations on them. Based on that, they decide how much chip real-estate to devote to multipliers, the evaluation of trig functions, cache memory, etc.

  105. To the paranoid named Hannibal of arstechnica.com by Anonymous Coward · · Score: 0

    You want copyright on hyperlinks...

    Take your medicine, seriously.