Slashdot Mirror


Octopiler to Ease Use of Cell Processor

Sean0michael writes "Ars Technica is running a piece about The Octopiler from IBM. The Octopiler is supposed to be compiler designed to handle the Cell processor (the one inside Sony's PS3). From the article: 'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip. So Cell has immense performance potential, but if you want to make it programable by mere mortals then you need a compiler that can ingest code written in a high-level language and produce optimized binaries that fit not just a programming model or a microarchitecture, but an entire multiprocessor system.' The article also has several links to some technical information released by IBM."

423 comments

  1. 'Octopiler' by Anonymous Coward · · Score: 0, Funny

    Wasn't that a James Bond film?

    1. Re:'Octopiler' by SeeMyNuts! · · Score: 1

      Octopussy is the name of the Cell optimized sockets library. It provides hardware-assisted handshaking routines for each Octopiler data stream.

    2. Re:'Octopiler' by Impy+the+Impiuos+Imp · · Score: 1

      "I'm Octopiler. And you are?"

      "Nerd. James Nerd."

      (sitting in a tub full of foam) "Sir, you have me at a disadvantage."

      "Madame, the disadvantage is all mine."

      "Could you hand me something to put on?"

      (Nerd reaches down and picks up a pair of 5" Cell processors and wordlessly holds them out to Octopiler.)

      "The quiet type, eh? Don't you have something seductive to say, James?"

      "D'joo see 'Matrix'? When Mouse runs into the lunchroom and says 'Morpheus is fightin' Neo!', then they all scramble out to go look and Neo is moving so fast onscreen, we see ripping and tearing on the screen ala Quake cranked way up in the framerate with V-sync turned off! Wow! Wowowoeee!"

      "Oh, James!"

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    3. Re:'Octopiler' by apoc06 · · Score: 1

      mods? please tell me im not the only idiot that found this funny...

      thats just sad, huh?

    4. Re:'Octopiler' by Impy+the+Impiuos+Imp · · Score: 1

      Totally, some of my best stuff. I have theories that later comments roll onto later pages, and that nobody ever reads anything but page one. I didn't even realize there were more than one page of comments for well-commented stories for many months.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  2. So don't hire mere mortals by ScrewMaster · · Score: 4, Funny

    Hire "Real Programmers". You know, the ones that only code in Assembler, and if they can't do it in Assembler then it isn't worth doing.

    --
    The higher the technology, the sharper that two-edged sword.
    1. Re:So don't hire mere mortals by stedo · · Score: 2, Funny

      Hire "Real Programmers". You know, the ones that only code in Assembler, and if they can't do it in Assembler then it isn't worth doing.
      Hmph. "Real Programmers" needing a bleedin' assembler to tell them what their bleedin' instructions mean? Why, back in my day we had to write our programs in machine language. We saved our work by means of a small bar magnet held a short distance above a hard disk platter. And we had to pay for our own bytes.

    2. Re:So don't hire mere mortals by SkyFire360 · · Score: 3, Funny

      So don't hire mere mortals, Hire "Real Programmers"

      Zeus was booked, Apollo was out of town, Hermes is still learning, Posideon just signed a 500-year agreement with Apple and Ares was killed off in God of War, so most of the good non-mortal programmers were out of the question. Hades claims to be a writer instead of a programmer, but most of the plot lines he comes up with ends up with everyone dead.

    3. Re:So don't hire mere mortals by Crafack · · Score: 1
      I wonder why nobody has posted this yet...

      The story of Mel, the Real Programmer: http://www.pbm.com/~lindahl/mel.html

      /Crafack

      --
      ... Elecance is left to the implementors.
    4. Re:So don't hire mere mortals by Kadin2048 · · Score: 3, Funny

      Oh, come on. Everyone knows that Hades isn't a programmer any more, not since he got promoted to Management and got that whole division to run down there.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    5. Re:So don't hire mere mortals by StikyPad · · Score: 1

      And Jesus has already agreed to be my co-pilot.

    6. Re:So don't hire mere mortals by RyuuzakiTetsuya · · Score: 1

      that reminds me of one of the most intimidating things a girlfriend's mom EVER told me.

      "We always coded in assembler. We never let the compiler do all the work for us."

      I crapped myself right then and there.

      --
      Non impediti ratione cogitationus.
    7. Re:So don't hire mere mortals by __aaclcg7560 · · Score: 1

      Hades claims to be a writer instead of a programmer, but most of the plot lines he comes up with ends up with everyone dead.

      Hamlet wasn't that bad. Besides, on some programming projects, having everyone dead is a blessing. It's the ghost of previous projects that continues to haunt the living.

    8. Re:So don't hire mere mortals by __aaclcg7560 · · Score: 1

      That's what you get for dating the daughter of a code monkey. :P

    9. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      ...and any impugning of Allah will get Slashdot burnt down.

    10. Re:So don't hire mere mortals by Tim+Browse · · Score: 1

      It's a fine attitude. And certainly why I bang nails in with my forehead.

    11. Re:So don't hire mere mortals by MobileTatsu-NJG · · Score: 1

      "Zeus was booked, Apollo was out of town, Hermes is still learning, Posideon just signed a 500-year agreement with Apple and Ares was killed off in God of War, so most of the good non-mortal programmers were out of the question. Hades claims to be a writer instead of a programmer, but most of the plot lines he comes up with ends up with everyone dead."

      That still leaves Boomer and Starbuck!

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    12. Re:So don't hire mere mortals by dcapel · · Score: 1

      Pity Hephaestus is more of a hareware person. Companies have considered Aphrodite, but she seems to always screw things up, and Dionysus is entirely too busy working on WINE. Athena has a bit of trouble working with male programmers, so she is out of the question. Artemis is really good at hunting down bugs, but she never seems to return calls.

      Meh, immortal programmers are so hard to come by these days.

      --
      DYWYPI?
    13. Re:So don't hire mere mortals by LostCluster · · Score: 3, Funny

      Help Wanted: Game Programmers

      Must have 5 years experience coding in Assembly for the IBM Cell processor

    14. Re:So don't hire mere mortals by Kagura · · Score: 1

      Companies have considered Aphrodite, but she seems to always screw things

      Couldn't have put it better myself... :)

    15. Re:So don't hire mere mortals by MrLizardo · · Score: 1

      I know exactly what you mean! Using a hammer just feels like ... cheating.

      --
      ^I'm with stupid.^
    16. Re:So don't hire mere mortals by mnmn · · Score: 1

      ...A 10-year experience of Windows XP will help.

      --
      "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
    17. Re:So don't hire mere mortals by rjmars97 · · Score: 1

      as an electrical engineering student with experience in processor design and many programming languages (including assembly), I think programming in assembly is far superior to any higher level language. only with assembly can the code directly interact with the hardware and take advantage of every design feature of the hardware. when designing a chip, the designer is THINKING in assembly, thus it is only logical for the programmer to also be thinking in assembly. as chips become more and more complex, such as the Cell processor, there will need to be more and more specialized code, not generic code produced by compilers. yes, programming in assembly is much more difficult, however it will run with fewer problems, and much faster than complier produced assembly code. high level languages may be fine for some programs, however, i feel that assembly is far more vital than what it has been made out to be... the mysterious language that only computer architects need to know and implement.

      --
      Heuristically programmed ALgorithmic computer
    18. Re:So don't hire mere mortals by miyako · · Score: 1

      By the time the PS3 is released (and anyone can afford to buy one) that may very well be one of the more reasonable requirements...

      --
      Famous Last Words: "hmm...wikipedia says it's edible"
    19. Re:So don't hire mere mortals by Travoltus · · Score: 1

      That and coding in assembly will ensure that all games take as long as Duke Nukem to produce, and will cost more than two Hollywood blockbusters to produce... even if your coders come from a Malaysian sweatshop...

      --
      --- Grow a pair, liberals... stop letting the Republicans bully you!
    20. Re:So don't hire mere mortals by delong · · Score: 1

      Help Wanted: Game Programmers

      Must have 5 years experience coding in Assembly for the IBM Cell processor


      That's the funniest (and most tragically true) post I've read on Slashdot in months. Bravo!

    21. Re:So don't hire mere mortals by Kaptain+Kruton · · Score: 1
      ...and if they can't do it in Assembler then it isn't worth doing.
      You mean they only do Hello World's?
    22. Re:So don't hire mere mortals by Zone-MR · · Score: 1

      While I agree that good knowledge of low-level programming is essential for anyone who wants to write decent code, good coders should also be able to select the right tools for the job - and assembler is not always a good choice.

      ASM is great for those fragments of code which are performance critical, need direct access to hardware, or those few cases where the ASM code is simpler and neater than the equivelant code in high-level languages. It's great when used inline to optimize fragments of code. However it is not a good language for writing a entire complex application in.

      In all but the most trivial projects, it is often beneficial to trade a little bit of run-time execution speed for more maintainable/readable (hence less bug-prone) code. For example, modern compilers are pretty damn good at producing efficient ASM code from C++ code. You can have unbundled loops, inline functions, etc without making your source files look horribly nasty (with pages of 'cut+paste code reuse').

      For things like computer games, where physical objects need to be modelled, object-oriented programming is something that's difficult to live without. The compiler translates things like inherited object traits, virtual functions, calling object destructors at the right times, etc, into relatively efficient assembly code which would have been very difficult to write by hand without producing code which is impossible to understand at a glance, and requires thourough studying to maintain or debug.

      An average coder will know how to write in both low-level code and high-level code. An good coder will have a good familiarity with their compiler, will have studied the assembly code listings generated from their high-level source files, and will have a good feel for the strenghts and weaknesses of their compiler. An excellent coder will know how to apply this knowledge. As they type code in a high-level language, they will instinctively know how this will be translated into ASM code, and will be able to use this knowledge to identify potentially inefficient statments and re-write them in inline ASM.

      Lastly, don't forget portability. If you choose to write your entire project in ASM, you might be proud of that 8% performance gain, but when the time comes to port the code to a new platform, you are thoroughly screwed - while your competitors will simply recompile their code with a different compiler, you need to re-write it all from scratch!

    23. Re:So don't hire mere mortals by Bazzalisk · · Score: 1

      Thoth and Enki seem keen, but they insist on writing in Objective n-Funge.

      --
      James P. Barrett
    24. Re:So don't hire mere mortals by CastrTroy · · Score: 1

      I took a course in university that taught us OpenGl. They also taught us how graphics processing works on a very mathematical level. Anyway, it's amazing how well Object Orientation maps to 3d modelling and game design. Inheritance, polymorphism, and and all that other stuff works quite well. If you draw the hand from the end of the arm, and you rotate the arm about the shoulder, it would be some complex math to figure out where the hand went if you didn't use object oriented code so that the arm object would draw the hand object, and you wouldn't have to reposition it manually every time you moved another part of the arm or body. Robotics was a much harder course because we had to do all those matrix operations to solve how to move the joints properly depending on how the other joints were positioned.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    25. Re:So don't hire mere mortals by daliman · · Score: 2, Funny

      I was stuck trying to decide whether you should be modded insightful or funny... So I decided to post instead. Sorry.

    26. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      Way to congratulate yourself on getting the joke. When you're done patting yourself on the back, give yourself a pat for me, too. Congratulation!!! You didn't fail it!!!. You can read!!! And get jokes!!!! And repeat them!!!

    27. Re:So don't hire mere mortals by SeeMyNuts! · · Score: 1

      Not only that, every one of his projects is a Death March, and the employment contracts have that damned "Eternity" clause!

    28. Re:So don't hire mere mortals by bombadier_beetle · · Score: 1

      as an electrical engineering student with experience in processor design and many programming languages (including assembly), I think programming in assembly is far superior to any higher level language.

      And this, my friend, is why you're an electrical engineer, not a software developer.

      --

      If you mod me down, I shall become more powerful than you can possibly imagine.
    29. Re:So don't hire mere mortals by superflyguy · · Score: 1

      I honestly think it would be more practical for the designer of the chip to think in C++ than for the programmer to think in assembly. That way it's a 15 year wait for the architecture and a year for the first program, instead of a 5 year wait for the chip and a 30 year wait for the first useful program. Or... We could compile the C++ into assembly and use an assembly architecture and spend the minimum amount of time from the start of the processor design to the first program! I have prior art on compilers! I can make a fortu...

      oh... that's what people already do... and here I thought I had a breakthrough concept...

    30. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0
      Yes, naive object orientation maps well.

      Now try optimizing that same code for a vector processor. The naive per-object decomposition fails immediately since modern system performance is based on streaming throughput, pretty much orthogoal to the object layout caused by the standard OO model. I don't know of any production compilers that restructure the objects in a class hierarchy to improve performance of streamed application to multiple objects. It's a hard problem of great import on new hardware, and until compilers can do it, "real" programmers will continue to swear at the crummy naive OO decompositions people insist on using.

    31. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      Lamest. Reply. Ever.

    32. Re:So don't hire mere mortals by Jeremi · · Score: 1
      Help Wanted: Game Programmers. Must have 5 years experience coding in Assembly for the IBM Cell processor


      You joke, but Real Programmers start writing code well before the hardware becomes available...

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    33. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      And then 10,000 resumes came from India saying that they are qualified.

    34. Re:So don't hire mere mortals by William+Baric · · Score: 1

      You say that a good knowledge of low-level programming is essential but you obviously don't know much about assembly. Writing a small program take a lot more time in assembly than in a high-level language. But for a complex programs, where you spend more time thinking about algorithms and data structures than typing, the difference is not that significant. For someone who knows what he's doing, it's not more difficult to write in assembly. It takes more time to type the code, but it's not more difficult.

      As for compilers efficiency... disassemble a program and you will immediately see the difference between human made code and compiler code. Compilers still do a lot of stupid things. Those 8% performance gain you mentioned is if you compare a shitty assembly programmer with an extremely good C++ programmer. In reality, the difference is a lot higher than 8%.

      Having said that, it's true that assembly is a bad choice for most projects. Only good programmers know assembly... And a good programmer cost a lot more than a guy who just got his diploma. From a business perspective, it's easier to hire two or three shitty programmers and tell one's client to get better hardware, than to hire a good programmer.

      Also... Object programming has nothing to do with physical objects that need to be modelled. So not only you don't know assembly, but you don't understand what object programming is about. Please, tell me you're not a programmer.

      (Sorry if I'm a little bit aggressive, but I'm in a bad mood today because of a moron and I'm beginning to be fed up with people who don't know squat but who pretend to know)

    35. Re:So don't hire mere mortals by deKernel · · Score: 1

      More than likely, I have more years experience with each language than you do in total so I am pretty sure I am qualified enough to state that you should not type when you are mad.

      The quality, maintainability and ability to add features to code written in higher level languages (C, C++ and such) is light-years ahead of ASM. If you have a "good developer" and a good compiler, they can get so darn close to the performance of ASM it is scary. I don't care how "crappy" the produced ASM code looks, it is performance that counts. Maintaining is not an issue since the ASM code is not what is suppported, it is the higher level language code that is supported.

    36. Re:So don't hire mere mortals by ScrewMaster · · Score: 1

      Yes, but they run really really fast.

      --
      The higher the technology, the sharper that two-edged sword.
    37. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      Real Programmers can write anything in Assembler - That is what defines them. Code Monkeys are script kiddies with a salary. You may choose to do the project in a higher level language, but when you know exactly why the compiler spat out that brain dead stream from perfectly good code, you tend to reuse your own libraries and - believe it or not - do it in assembler. OOP is not only possible in Assembler, it's easy - IFF you know what you are doing. The GP's gf's mother is a Programmer.

    38. Re:So don't hire mere mortals by Schraegstrichpunkt · · Score: 1

      Yes, but the assembler has to be loaded into RAM using DIP switches, and there is no hard disk.

    39. Re:So don't hire mere mortals by corvair2k1 · · Score: 1

      Actually, it's not that hard to do assembly linkages in C using OpenGL with motion at joints... First, the math ins't complex at all, simply multiplying 4x4 matrices that are well defined (write a function to do it). Second, you can use the OpenGL matrix stack (which is quite fast, and commonly done in hardware on the graphics card) to manage this for you, calling rotate and translation functions for you.

    40. Re:So don't hire mere mortals by bit01 · · Score: 1

      Does this mean they're only looking for people with multiple personality disorders?

      ---

      Keep your options open!

    41. Re:So don't hire mere mortals by LilGuy · · Score: 1

      |||
      oo
          )
      \-/

        ^
        |
      Mohammed ^_^ kekeke

      --

      You're nothing; like me.
    42. Re:So don't hire mere mortals by William+Baric · · Score: 1

      You are right when you say that I should not type when I'm mad (and tired btw). But, hey! This is slashdot! It's not like it matters! (Yes, I'm doing a big wink right now).

      The quality of a program do not depend on the language. I saw high quality, well written code in assembly and really crappy stuff in Ada. I agree that some language are more beautiful than others, but let's be honest, this sense of beauty is purely subjective.

      As for maintainability and ability to add features, once again it depends more on the programmer than on the language used. A few times I was asked if I could take over the maintenance of a project (once in Pascal, others in C or C++). Each time I asked to see the source code, and each time I refused because there was not enough documentation. A well documented assembly program is as easy to maintain as any other well documented program written in a high level language... and a program without documentation is unmaintainable, no matter what language is used. I would agree that a program without documentation written in a high level language is far more readable than the same thing in assembly, but does it really matters? If the thing is more than a few thousand lines it's probably unmaintainable anyway.

      Having said that, I will be honest and admit that most of the time I program in Ada. I still use assembly sometimes, but it's more for fun than for real work. But the reason I use Ada is not because of maintainability or the ability to add features. It's because most of the work I do now is the fire and forget kind of work. I have a problem, I need a program to do the job, and once the job is done, I forget everything about it. I don't expect to maintain it so my goal is to use a tool that allows me to write the fastest way possible. I don't care about quality or documentation. That's why I use Ada. The compiler get most of the errors and I don't have to be careful. And this is where high level languages really shine. Not in doing quality code, but rather in doing crappy code.

      There is also another reason I don't write much assembly anymore. You need a lot of knowledge to code efficiently in assembly. And unfortunately, since I'm not a full time coder anymore, I can't afford the time to learn new architectures. To give you an idea, when I code for fun in assembly, most of the time it's for my C64 or Amiga emulator (yes it's nostalgia). I did learn the AMD64 architecture, but I think it sucks and I have no fun using this thing.

      There are good reasons we don't use assembly anymore. But "quality, maintainability and ability to add features" are not part of those. From my experience, the real reason is it allows us to be lazy, do a sloppy job and rely on a compiler to do something that kind of work instead of quality code.

    43. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      Real engineers that design real CPUs actually take into consideration executables generated by high-level language compilers. Companies like Intel, AMD, IBM and others have compiler experts that work on their CPU teams. These companies also take in and digest suggestions from their most important software developers like Microsoft.

    44. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      Was there a closing down sale at International House of Punctuation or something?

    45. Re:So don't hire mere mortals by ConceptJunkie · · Score: 1

      Don't forget 10 years of C# experience.

      --
      You are in a maze of twisty little passages, all alike.
    46. Re:So don't hire mere mortals by Zone-MR · · Score: 1

      [snip insults]...Writing a small program take a lot more time in assembly than in a high-level language. But for a complex programs, where you spend more time thinking about algorithms and data structures than typing, the difference is not that significant. For someone who knows what he's doing, it's not more difficult to write in assembly. It takes more time to type the code, but it's not more difficult.

      Wrong. For simple projects, there isn't much difference. For complex projects, you *need* to write modular code. This usually means defining objects and their behavour, and then putting the building blocks together by defining how the objects interract.

      Object programming has nothing to do with physical objects that need to be modelled. ...[snip insults]

      If your OO code doesn't model real objects, perhaps you shouldn't use object-oriented programming for the application. However OO programming has a LOT do do with modelling physical objects, and that's what it's ideally suited to do.

      Take for example a simple game. You define some code for Vectors, Complex Numbers. Then you build on top of this - you define a class for handling Quaternions and Matrix Operations. You also define operators for simple operations (Vector Multiplication, Transforms etc). Once the initial work is done, you implement the behaviour in what becomes simple and elegant code:

      Vector objVelocity;
      Vector objPosition; //while (blah) ...
      objPosition = objPosition + objVelocity;
      etc...

      A simple glance at a fragment of the code is enough for someone to tell what it does. Maintaining it is easy. You don't repeat code for things which need to be done frequently - you define them as inline routines to produce efficient executables.

      With high-level languages you are freed to spend your time and effort thinking about high-level problems. Not messing around with typing out 2 pages of code every time you want to do floating point multiplication of a vector by a matrix.

    47. Re:So don't hire mere mortals by arodland · · Score: 1

      Oh come on, you don't believe anything you read on Slashdot, do you? It'll be released this year, and it'll be sold at a gigantic loss to Sony. Which won't matter because they'll still have control of all of the games that are actually worth buying, and recoup their losses in no time.

    48. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      So maybe it just because im from a younger lazier generation, but if i have 8 processers at my disposal, why wouldn't i write in a high level languge and take the performance hit? I mean its easier, and you have resources running out of your ass... Hell i might just leave some infinate loops in there for the hell of it... i mean you have 8 PROCCESSORS ! screw it i'll program one of them to do my taxes while the game is running... I'm waiting for the first game developer to put a folding at home client built into one of their games...

    49. Re:So don't hire mere mortals by somersault · · Score: 1

      Ah yus, Hephaestus must work for Intel

      --
      which is totally what she said
    50. Re:So don't hire mere mortals by mrjimorg · · Score: 1

      I'm an electrial engineer, and I certainly dont think like this. Quite the opposite. I know that because of pipe-lines and other processor features, rearraging your code around can make a huge difference in performance. Compilers are written to optimize for the chip-set that your writting for to take advantage of such things as 'free' jump instructions, whereas it would be amazingly difficult for an assembly language writer to do so. For instance, by using the result of one operation in the next operation you can cause a pipeline stall. If you take 2 streams of operations and weave them together you can improve performance, but you would make the code very unreadable. Also, the compiler can keep track of which ALU's or other chip resources are being used by which instructions and can rearrage them accordingly to improve performance.

    51. Re:So don't hire mere mortals by Derkec · · Score: 1

      As for compilers efficiency... disassemble a program and you will immediately see the difference between human made code and compiler code. Compilers still do a lot of stupid things. Those 8% performance gain you mentioned is if you compare a shitty assembly programmer with an good C++ programmer. In reality, the difference is a lot higher than 8%.

      8%? 15%? Who cares? Unless you're writing an extremely intensive section of code that is quite small, fiddling at this level is waste of time. Most performance problems are caused not by inefficient asm code, but by really dumb algorithms, dumb IO (often network or database) and the like.

      Fixing this stuff is done either while editing your C++ or at a whiteboard. This is where you're going to see the 50% performance improvements. It's not nearly as nerd sexy as hacking ASM, but it's going to make a bigger difference than ensuring that you've got your row major or column major array traversing straight or that your asm code is tight.

      Sure, in extremely performance critical apps, where all the other issues are already resolved, tweaking the asm might be needed. But frankly, that's a small percentage of the apps in the world. Most really great programmes can get by never hacking in asm because they are making the 50% performance improvements in different kinds of apps.

    52. Re:So don't hire mere mortals by Anonymous Coward · · Score: 0

      His wife is in a coma.

    53. Re:So don't hire mere mortals by apoc06 · · Score: 1

      ssshhhhh!!!!!!!!!!!!!

      be careful, youre letting sony's secret out of the bag. all of MS paid employees that troll the sony threads might hear you.

    54. Re:So don't hire mere mortals by gstoddart · · Score: 1
      Oh, come on. Everyone knows that Hades isn't a programmer any more, not since he got promoted to Management and got that whole division to run down there.

      Yeah, but just to prove he's got it, Hades will whip together something just to show those young punks that he's still lord of the underworld and they better not forget it.

      =)
      --
      Lost at C:>. Found at C.
    55. Re:So don't hire mere mortals by Intocabile · · Score: 1

      He'll happily outsource the programing to his hordes of captured souls. Doesn't really support the idea of Cell being easy to program for though.

  3. Makes you wonder by Egregius · · Score: 5, Insightful

    It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now. And 'the PS3 is due out in 2006.'

    1. Re:Makes you wonder by general_re · · Score: 1, Interesting
      ...they didn't have a decent compiler untill now.

      Actually, it sounds like they still don't have one, just some ideas on how to make one someday.

      --
      ABSURDITY, n.: A statement or belief manifestly inconsistent with one's own opinion.
    2. Re:Makes you wonder by bfizzle · · Score: 1

      If the cell processors is as bad ass as everyone makes it out to be Sony has nothing to worry about. All the PS3 has to be when it is launched is as good as the XBox360. Game developers need to be able to port games to either platform easily and if the PS3 has tools to make it easy to utilize just enough processing power to match the 360 then they will be sucessful. The cell processor will only really come into play after the developer tools mature and games are able to take full advantage of the cell processor.

    3. Re:Makes you wonder by Anonymous Coward · · Score: 0

      So beautifully true, if only I could mod. Just recently have some of the finest looking video games come out for the PS2, its incredible what they can do, I would almost give a couple toes to have the make the ram quantity in the PS2 be more than 32 megs, even if it wouldnt have been usable all simultaneously.

    4. Re:Makes you wonder by ClamIAm · · Score: 1
      Programmers have likely had specs on the chip and the architecture for some time now, and also have probably had docs on what kind of compiler they'd be dealing with. This way, if they're any good at all, they should already have code that only needs to be tweaked to work on this thing.

      Also, they could just program some bits in assembler. Game programming is one of the places where assembly is still used.

    5. Re:Makes you wonder by Anonymous Coward · · Score: 1, Interesting

      The Cell is overhyped.

      There's another octuple processor that is better. The Sun UltraSparc T1. It blows Cell's doors off.

      The Cell only has 1 General CPU core and 7 flops.

      Why is everyone even talking about the Cell? The Sun T1 has 8 Cores with 32 threads each. These are 8 true cores (not 1 CPU and 7 half baked Floating Point cores like the Cell). The Sun T1 also has 4 DDR memory controllers and 72 Watts of power consumption. The Sun T1 also has a Sparc V9 architecture. Each of the Cores also has a floating point unit. The Sun T1 blows the doors off a Cell Chip.

      If it weren't for the 7 Synergistic Processors the Cell would be a Flop.

    6. Re:Makes you wonder by MikeFM · · Score: 1

      All it really exists for is to allow current crops of code and programmers to more easily move to the new system. It doesn't mean it's THAT hard to program the cell now. Writing a compiler that correctly optimizes code is much trickier than writing code and likewise a compiler that correctly reinvents code to be more parallel I don't doubt.

      Where this will really pay off is up the food chain as it'll allow a lot of programmers to worry less about how things get done. It's like the difference between hard coding for a GPU or using OpenGL. You can write for the GPU but it's a lot more low level and there is more rope to hang yourself with.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    7. Re:Makes you wonder by FatherOfONe · · Score: 1

      The other way to look at this is:

      If you see the first generation games and they are better than anything on the XBOX 360, then imagine how much better they will be when this type of compiler becomes standard. The difference in games will only get larger, thus it would make more sense to purchase a Playstation 3.

      Or to be more specific, imagine 30 cars in GT5, all with different computer AI, against say 10 cars on a similar XBOX 360 game.

      --
      The more I learn about science, the more my faith in God increases.
    8. Re:Makes you wonder by Lehk228 · · Score: 1

      all those cars drawn as unlit wireframes like in the PS2 "benchmarks"

      --
      Snowden and Manning are heroes.
    9. Re:Makes you wonder by TinyManCan · · Score: 1
      Everyone is bitching about writing games that use three threads (xBox 360) or 8 threads (Cell). You want to go and jump right to 32 threads or more. Lets give the programmers some time to adjust to threads > 1, and then lets slowly crank it up.

      I'm a beliver in the ability of games to use multiple cores, but I think 32 would be taking it too far. Leave the T1 to what it is good at. Web serving and processing thousands of simultaneous requests.

    10. Re:Makes you wonder by Anonymous Coward · · Score: 0

      Scott McNealy, is that you?

    11. Re:Makes you wonder by Anonymous Coward · · Score: 0
      Each of the Cores also has a floating point unit.

      No they don't, T1 has just one shared FP unit. The FP performance is just abysmal vs. the Cell, but the target market is totally different so it doesn't matter.

    12. Re:Makes you wonder by KDR_11k · · Score: 1

      FLOPS means floating point operations per second. Seven FLOPS would be rather pathetic.

      --
      Justice is the sheep getting arrested while an impartial judge declares the vote void.
    13. Re:Makes you wonder by FatherOfONe · · Score: 1

      We will see. I think you would agree though, ANYONE planning on buying just one system would be insane not to wait and see what the PS3 and Revolution can do. Also, on paper the PS3 looks more powerful than the 360. So it will be very interesting to see what happens. By the way have you seen the demo's that the vendors say were NOT pre-rendered for the PS3? If they are accurate then it would explain why Microsoft rushed the 360 to market. They would understand that their only hope of winning the next gen console war would be to get it out before the PS3.

      Lastly, have you seen first gen PS2 games compared to what is out now? Wow what a difference!

      It is a great time to be a consumer of video game systems and I look forward to comparing them all later this year. I "might" buy an 360, but to be honest the lack of a HD-DVD(BlueRay) will probably be a show stopper for me. I like a ton of people have a HDTV that has a digital HD input, and I look forward to seeing Lord of the Rings in HD :-)

      --
      The more I learn about science, the more my faith in God increases.
    14. Re:Makes you wonder by Lehk228 · · Score: 1

      those clips were admittedto be fully scripted. they were rendered in real time but done using pre-programmed movements and actions rather than being actual game engine output.

      --
      Snowden and Manning are heroes.
  4. Hello, Itanium... by general_re · · Score: 5, Insightful

    Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."

    --
    ABSURDITY, n.: A statement or belief manifestly inconsistent with one's own opinion.
    1. Re:Hello, Itanium... by Ceriel+Nosforit · · Score: 2, Insightful

      Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."

      From TFA:
      "I say "intended to become," because judging from the paper the guys at IBM are still in the early stages of taming this many-headed beast. This is by no means meant to disparage all the IBM researchers who have done yeoman's work in their practically single-handed attempts to move the entire field of computer science forward by a quantum leap. No, the Octopiler paper is full of innovative ideas to be fleshed out at a further date, results that are "promising," avenues to be explored, and overarching approaches that seem likely to bear fruit eventually."

      Too early to say for sure, of course, but I'd rather take this guy's word for it than study the papers myself. - Would I invest/bet money on it? Yes, I would.

      --
      All rites reversed 2010
    2. Re:Hello, Itanium... by Brain_Recall · · Score: 3, Informative
      More familiar than you may think. Some of the first Itanium compilers were spitting out nearly 40% NOP's, which are simply do-nothings. Because the IA-64 is explicilty parallel, instructions are generated and bundled together to be executed in parallel. The problem is branches, which destroy parallelism since they can change the code direction. On average, there are about 6 instructions between branches, so, such a design is very costly since the memory controller will be stuck getting inscructions that are empty. Of course, speculation and branch-prediction is generally a good way to increase performance, but like many things on the IA-64, that's left to the compilier to figure out. These are some of the exact same problems with the Cell, although, I wish I knew how the instruction set was. If it's more like Itanium, then they got all of the problems of the Itanium. If it's more of a direct approach, they may be able to pull it of because of the work in multi-processor systems that are done today. But, they simply can't expect the "super-computer" numbers Sony keeps flashing around. It may be good on certain tightly coded scientific calculations, but when it comes down to real-world code, it's stuck to the stripped-down Power4 that is coordinating the Cells.


      They didn't call it the Itanic for nothing...

    3. Re:Hello, Itanium... by timeOday · · Score: 3, Insightful
      Everybody prefers a simpler programming model, there's no doubt about that. But with the recent lack of progress in unicore speeds, something has to give, and apparently that "something" is programming complexity. While the PC world moves from 1 to 2 cores, the PS3 is jumping straight to 8. But going from 1 to 2 threads is a bigger conceptual jump than from 2 to 8 anyways.

      Fortunately for IBM and Sony, games are one place where hand-optimizing certain algorithms is still practical. I doubt they will place all their eggs in the octopiler basket. I can't imagine a compiler will find that much paralellism in code that isn't explicitly written to be parallel. Personally, I think they should instead focus on explicitly parallel libraries for common game algorithms like collision detection.

    4. Re:Hello, Itanium... by Anonymous Coward · · Score: 1, Insightful

      Fortunately for IBM and Sony, games are one place where hand-optimizing certain algorithms is still practical

      That's the key right there. Cell will only run brand-new software, while Itanium is expected to run a bunch of 30 year old C code originally written for the VAX.

    5. Re:Hello, Itanium... by Saint+Stephen · · Score: 1

      Actually that's not really true. People assume that if you can scale to 2 processors, you can scale to 2 or 4 or 8 or 128. It doesn't work like that. Some code can scale up to 4 processors and then actually break on 8 processors. It's because there are different "bridges" in the hardware -- everything's not equally accessible at the same speeds.

    6. Re:Hello, Itanium... by John+Whitley · · Score: 1

      So what? That's the story of every modern processor, because the true engineering problem is to create a synthesis of processor and compiler that produces a powerful platform. Put another way, modern general purpose processors aren't targeted at assembly programmers. Itanium, however, seems to have been plagued by outright hardware design and engineering issues above and beyond any hardware/compiler synthesis issues.

    7. Re:Hello, Itanium... by mikelang · · Score: 1

      Citanium doesn't sound right, maybe rather cellar performance?

    8. Re:Hello, Itanium... by Raenex · · Score: 1
      Too early to say for sure, of course, but I'd rather take this guy's word for it than study the papers myself. - Would I invest/bet money on it? Yes, I would.

      Now there's a prudent investor. Too lazy to understand the issues, and some guy in an article says after a "generation's worth of doctoral research" it "seems likely" they'll have something useful. Of course, this problem is already decades old...

    9. Re:Hello, Itanium... by fitten · · Score: 1

      There's also the issue in complexity as the Cell is not SMP. The PC (in the general sense) multi-core designs are SMP, which is a fairly simple and understood programming model (and we still can't get good auto-parallelizing compilers for it). The Cell isn't even remotely SMP in so many ways. First, the CPUs aren't the same. One is a basic PPC type but then you have all these others that are more like DSPs. The next thing is that the different CPUs have different access to memory. While the PPC has a 'normal' memory hierarchy (caches and such) and can access main memory 'trivially', the SPEs have static memory blocks and use DMA to push/pull data into that static memory. So the compiler has to not only deal with instruction streams, it has to deal with using that memory effectively... tiling, multi-banking, interleaving and such... which is typically algorithm based... so the compilers now have to figure out your algorithm, determine efficient memory allocation and windowing sizes, and so much more auto-magically, than a 'normal' compiler has to deal with. Not only that, but it also has to schedule in such a way for the SPE tasks to rendezvous with the PPC code in 'good' ways so as to not waste cycles on either the PPC or the SPE cores... and it has to do all that for every SPE... (X8). The Itanium compilers have a hard time finding parallelism in instruction streams and get heat because they aren't the best at that... now compare that to a compiler that has to figure out what parts of the instruction stream are useful to be translated to an SPE instead of the PPC (X8) and deal with all the DMA and memory blocking required for it to be efficient... basically figuring out your algorithm and making it better. And... it isn't useful to have one instruction ran on an SPE... you have to have reasonable streams to make the setup/teardown of the processes on the SPEs worthwhile.

      I believe the best way to program such systems (for a while now and probably for a while to come) is to have libraries written to do specific things and programmers code against it. Perhaps some groups will make engines that run on a specified number of SPEs and those engines are used by a number of groups, similar to what we see in the game world today. Some company may make a physics engine that runs on, say, four SPEs and sell that to a number of groups, for example. Some other group may make a 3D sound engine that runs on one SPE. Another make a video engine that runs on two SPEs. A game company can take all three, glue them together and produce content and release a game. Even more cookie cutter games for us because the complexity of writing those engines will be high and different companies will be forced to use off-the-shelf engines because of time and money concerns.

      The Cell is not new... TI has had similar architectures for years (typically ARM cores attached to a number of DSPs all on a single die). I've worked on similar systems. I wouldn't hold my breath waiting on this 'Octopiler' to be good.

    10. Re:Hello, Itanium... by colman77 · · Score: 1

      That sounds like something I've been hearing a little too often lately... "Don't worry, we'll fix it in software!" I hate that phrase...

    11. Re:Hello, Itanium... by Ceriel+Nosforit · · Score: 1

      Well, no. I base that on over a decade of experience with the IT field, the simple understanding that the Cell is a supply to meet a growing demand, and that it's backed by Big Blue.

      --
      All rites reversed 2010
  5. Octoplier? by Anonymous Coward · · Score: 0

    what you say?

  6. Sadly, not a lotta FPU hardware. by mosel-saar-ruwer · · Score: 4, Insightful

    'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip.

    Sadly, there's almost no FPU hardware to speak of: 32-bit single precision floats in hardware; 64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees.

    Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?

    1. Re:Sadly, not a lotta FPU hardware. by sedyn · · Score: 1

      Math geeks that would need 128-bit double percision are a subset of all math geeks...

      Therefore an even smaller portion of an already small population.

      --
      Am I open minded towards open source, or closed minded towards closed source?
    2. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      Because 128 bit would be a quadruple, not a double. You silly math geek, you. :)

      QED, e pluribus unum, et cetera, et cetera...

    3. Re:Sadly, not a lotta FPU hardware. by rodac · · Score: 1

      What benefit does increasing the precision of floats to 128bits bring?
      64bits are more than enough for 99.9999% and the remaining cases can be handled in sw emulation.

      You can still not solve (without massive growth of the error terms) an equation system described by a Hilbert-matrix using Gaussean-elimination no matter how many bits you make the mantissa.

    4. Re:Sadly, not a lotta FPU hardware. by stedo · · Score: 3, Insightful

      The basic purpose of the Cell is to make the PS3 work. The basic purpose of the PS3 is to play games. Games, as a rule, don't give a damn about 64-bit floating point. Games can get away with 32-bit because they don't need to be incredibly accurate, they just need to be fast. No gamer will care whether or not the trajectory of the bullet was out by 0.000000000023~ as long as it moves fluidly. So, in making a chip for gaming, you are far better off making 32-bit really fast than spending time and die space on perfecting useless 64-bit.

    5. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      No gamer will care weather or not the trajectory of the bullet was out by 0.000000000023

      you don't spend much time around hardcore gamers do you? "WTF?!? Missed F&$K you!"

      of course usually they missed because of something they did, not any error in the trajectory of the math...

    6. Re:Sadly, not a lotta FPU hardware. by soldack · · Score: 1

      Doesn't the Itanium do pretty well on floating point?
      -Ack

      --
      -- soldack
    7. Re:Sadly, not a lotta FPU hardware. by JFMulder · · Score: 1

      Consider the fact that the movie industry is slowly adopting HDR (that's 32bit per float component, 4 component pixel) as the prefered depth for image processing, I don't see why games should use more. At least in graphics. Plus, using 128 bit floats would cut the number of whatever you want to process each second by 4 since you would need to move 4 times the data for the same work. No, we don't need 128 bit floats for games just yet, or shall I say, 32-bit floats should be enough for everyone. ;)

    8. Re:Sadly, not a lotta FPU hardware. by ScriptedReplay · · Score: 1

      Math geeks that would need 128-bit double percision are a subset of all math geeks...

      Perhaps you meant longdouble precision. Math geeks that can live with 32-bit floating point precision are also a small subset - most of those who do heavy math (not pixel processing) pretty much require 64-bit double precision. And that is not available in hardware from Cell (come to think of it, not for Alitvec, either)

    9. Re:Sadly, not a lotta FPU hardware. by Animats · · Score: 3, Interesting
      Games, as a rule, don't give a damn about 64-bit floating point.

      You wish. In a big 32-bit game world, effort has to be made to re-origin the data as you move. Suppose you want vertices to be positioned to within 1cm (worse than that and you'll see it), and you're 10km from the origin. The low order bit of a 32-bit floating point number is now more than 1cm.

      It's even worse for physics engines, but that's another story.

      If the XBox 360 had simply been a dual- or quad-core IA-32, life would have been much simpler for the game industry.

    10. Re:Sadly, not a lotta FPU hardware. by Frumious+Wombat · · Score: 2, Informative

      They have, although outside of certain implementations of double-complex, 64-bit double-precision (REAL*8 to Real Programmers) is enough.

      Those machines are Cray Vector Processors, MIPS R8K and later, DEC Alpha, HP/Intel Itanium, IBM Power 4/5/n, IBM Vector Facility for the 3090, etc.

      Notice how many of those you see every day, and how many fewer of those you can still buy.

      Yes, unfortunately, you are that tiny a proportion of the world pop. I had hoped by this point that we'd have Cray Vector Processors on a chip, or integrated into the base chipset (like the old Proc/Math-CoProc combos), or be running EV10 Alphas on our desktops. Unfortunately, double-precision floating point benefits so few people that it's not worth it from a design standpoint to optimize the processors around it. The R8000 was a good example of this; incredible FP for the time, but terrible integer (early Itanium-2 falls into this category as well). So, it crushes numbers like mad in the background, but your word processor, etc, are no faster and possibly slower than the previous generation, less expensive processor.

      Just a couple of years ago my boss commented that we had problems in quantum chemistry which were still more time-effective to solve on mid-90s Crays than modern MPPs, because the algorithms vectorized easily but didn't parallelize. Some of them have been fixed by now, and alternatives found for others, but there are a lot of problems (by the standard of scientists) that would benefit from having a processor optimized for double-precision ops. Unfortunately, by the standards of the cell-phone-camera wielding email junkies, those problems are an invisible subset of the things you do with a computer. Ergo, good enough for home entertainment and PowerPoint, less than ideal for scientific use.

      Thankfully Power5 and Itanium will be around for a few more years.

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
    11. Re:Sadly, not a lotta FPU hardware. by OldManAndTheC++ · · Score: 3, Funny
      Are we really that tiny a proportion of the world's population?

      You math geeks need to multiply. :)

      --
      Soylent Green is peoplicious!
    12. Re:Sadly, not a lotta FPU hardware. by Tim+Browse · · Score: 1
      Why can't someone invent a chip for math geeks? With 128-bit hardware doubles?

      Because the math geeks won't pay for the fab plants.

      Are we really that tiny a proportion of the world's population?

      Yes. You're the math geek - you do the math.

    13. Re:Sadly, not a lotta FPU hardware. by stedo · · Score: 2, Interesting
      True

      Actually, what I can't figure out is why you want floating point at all. Floating-point data stores a certain number of bits of actual data, and a certain number of bits as a scaling factor. To use your example, this would mean that while items near the origin would be picture-perfect, the object 10km away would be out by well more than a cm.

      Back when integer arithmetic was so much faster that floating point it was worth the effort, game coders used to use fixed-point arithmetic. This kept a uniform level of accuracy around the entire world, not like floating point which makes data near the origin more accurate. It was also very fast, and easy to implement. Why hasn't anyone implement fast fixed-point arithmetic in hardware? You could afford to go 64-bit if it was fixed-point since it is so much easier to compute (think integer arithmetic versus floating point), and 64-bit is accurate enough for very small detail in a very large world.

    14. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      Real math geeks use only integers.

    15. Re:Sadly, not a lotta FPU hardware. by octopus72 · · Score: 1

      You can move 64-bit floating point data around, but as long as you don't do the double precision fpu math on cell, it is as fast as if processor were capable of doing it in hw, GPU takes care of transformations on current generation consoles anyway. No need to use slow FPU emulation.

      Physics is a problem as you say, but I don't think precision is so much important for games, it is often enough to have 32-bit.

    16. Re:Sadly, not a lotta FPU hardware. by Watson+Ladd · · Score: 1

      MMIX!. 256 general purpose registers, 32 special purpose, simple calling convention, and bitwise matrix multiplies. All we need is some real sillicon.

      --
      Inventions have long since reached their limit, and I see no hope for further development.-- Frontinus, 1st cent. AD
    17. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      A lot math and physics problems/algorithms need higher precision if scaled. Even the more suitable Conjugate Gradient (than Gaussian-elimination) is one example. Sometimes you can limit high precision to aggregation function and emulate them in software, as you suggest.

    18. Re:Sadly, not a lotta FPU hardware. by twiddlingbits · · Score: 1

      You are forgetting Grid Computing where you can have 1000 or more CPUs working on the problem components or in parallel. I've seen some pretty hairy physics problems solved on these. Also, a fair amount of the scientific community seems to be buying the Sun SPARC IV+ architecture. Power 5 is going to be around a while, but when they start cranking the chips speeds past about 3.5GHz then they will need liquid cooling. Itanium is hanging by a thread. I wouldn't invest in that. Best new things I see on the horizon are some advanced AMD chips or the Sun "Rock" chip that will be hardware multithreaded like the Sparc T1 but have 8 FPUs to support 8 cores. I think in the next few years you'll have more than the choices you mention.

    19. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      So true! I have a Math degree (BS), and I don't think we ever did anything with floating point. I did take a numerical analysis class as an elective. That was about it for floating point.

    20. Re:Sadly, not a lotta FPU hardware. by Frumious+Wombat · · Score: 1

      I would like to think so, and some nostaligic part of me is rooting for Sun. I actually have great hope for the AMD hypertransport bus, as that might get us back to where we'd have been if SGI's ccNUMA (used in the Origins) had caught on with desktop systems. The Rocks, plus Sun's engineers, will be an interesting team to beat. After that, better algorithms and better compilers.

      Still, the most impressive chip I've seen in the last 5 years is the Itanium-2, and as you say, it's on life-support. (it's also amazingly hot and power hungry, which still limits its use) Four 64-bit floating-point operations per clock cycle. On our codes, only the IBM Power 4/5 are competitive with it, though the Alpha probably would have been, had development continued.

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
    21. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      Of course.. but we're talking about vertex/normals/texture coordinates here, not color depth...

    22. Re:Sadly, not a lotta FPU hardware. by ameline · · Score: 1

      Not quite as bad as you make it out to be --

      Each SPU can do 2 DP FMACs (in one vector) in 6 cycles -- not pipelined. and at 3.2 GHz. Then you can add the single pipelined DP FMAC unit in the PPE.

      Sure, it's an order of magnitude less than SP, but it's not that anemic. And if I weren't still under NDA, I could speculate about what IBM/SONY might be doing about that situation. But I wont.

      Oh and back on topic, I used to work at IBM on compilers, and I recognized some of the names on the list of authors of the Octopiler paper -- and those are some *seriously* smart dudes who really know what they're doing when it comes to compilers. If anyone can pull it off, they can.

      --
      Ian Ameline
    23. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0
      Wrong, the Cell has double precision instructions (no division though) and they are fully IEEE compliant except for precision exception. But they are slower than single precision since they are not pipelined.

      There are rumors that next Cell version will have pipelined double precision FPU.

    24. Re:Sadly, not a lotta FPU hardware. by mnmn · · Score: 1

      Huh? I thought the cores were all 64-bit which means 64-bit ALU, which should mean 64-bit integers like other 64-bit cores. Am I missing something?

      --
      "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
    25. Re:Sadly, not a lotta FPU hardware. by ivan256 · · Score: 1

      Why hasn't anyone implement fast fixed-point arithmetic in hardware?\

      Everyone implements fast integer math in hardware. Why do you think people use SPECInt to make their hardware look good?

    26. Re:Sadly, not a lotta FPU hardware. by rodac · · Score: 1

      I know that there are specific niche applications that may benefit from higher precision.
      However, I belive that is very niche.

      Most, virtually all, people asking for higher precision floats do so because they do not understand floating point and they misguidedly belive it will improve the accuracy in thair calculations. As my example it doesnt matter how many bits of precision you use, you just CAN not solve certain problems without understanding of floats and numerical analysis.

      As for Gaussean-elimination in particular, it doesnt matter how many bits of precision you use, you just can not invert certain types of matrices using that algorithm, no matter how many bits you extend your floats to. I belive you agree with that.

    27. Re:Sadly, not a lotta FPU hardware. by Rufus211 · · Score: 1

      It's good that you have no idea what you're talking about. IEEE Single-Precision floating point numbers have a 23-bit mantissa (fraction). Because of the implied 1 that is not stored you actually have a 24 bit number. 2^24 = 16,777,216 so you can precisely represent integers between 1 and 16 million with no error.

      (10 kilometers) / (1 centimeter) = 1,000,000. 1 million is much less than 16 million. Please play again.

      Anyway, if all you want is 1 centimeter precision why not use standard integer arithmatic? It'll be orders of magnitude faster than floating point arithmatic and (assuming signed numbers) you get a range of (2^31) * 1 centimeter = 21,474.8365 kilometers (for reference Earth's diameter is 12,756.3 km).

      With all that said you actually want far greater than centimeter precision and there are good reasons to use floating point numbers, but you wouldn't know any of them. And the parent is correct, if your game depends on dual-precision, you're doing something wrong.

    28. Re:Sadly, not a lotta FPU hardware. by adam31 · · Score: 1
      64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees.

      "implemented in software" (you mean microcoded) : False.

      Doubles are processed at 25-30 GFlops, vs 256 for SP (a magnitude of difference). However, I've also read that the current incarnation of the Cell is geared toward the PS3, and later versions meant for scientific computing will have much improved double support.

      But Flops aren't so crucial. What is crucial is memory bandwidth and scalability, 2 factors that have dictated the design of Cell from the ground up. The processing speed of a 1,000 node computer will be memory-bound for most real problems.

    29. Re:Sadly, not a lotta FPU hardware. by Rufus211 · · Score: 2, Interesting

      > And fixed-point isn't integer, bozo.
      Yes it is, as long as you're willing to put a few seconds of thought into it (or just google for the answer).

    30. Re:Sadly, not a lotta FPU hardware. by localman · · Score: 1

      Really? You can see 1cm discrepancies at a distance of 10km? I can't do that with my naked eyes, an probably not with even with a basic set of binoculars, let alone the low resolution images that the PS3 (or nearly any computer for that matter) can put out.

      Re-origining makes sense, but it's probably cheaper to do that every few seconds than to run everything in 64 bit all the time.

      Cheers.

    31. Re:Sadly, not a lotta FPU hardware. by gnasher719 · · Score: 1

      '' Sadly, there's almost no FPU hardware to speak of: 32-bit single precision floats in hardware; 64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees. ''

      Don't believe everything you read on Wikipedia.

      "Almost no FPU hardware to speak of": Each cell processor can issue four fused multiply-adds each cycle, and there are seven cells per chip, that is 28 madds or 56 fops per cycle. Not bad.

      Double precision is not implemented in software. Maximum two fused multiply-adds (one instruction working on two vector elements) can be issued every 7 cycles, with 14 cycles latency. Doesn't sound much, but that means it is very easy to get absolutely close to peak performance, no matter how many other operations you have to do between floating point operations; you only need to have two independent operations to cover latency, and you can issue six non-fp instructions between any two fp instructions to do the legwork. Multiply by seven, take the higher clockrate, add in the PowerPC processor that can do some more work, and you beat a G5 easily on most problems.

    32. Re:Sadly, not a lotta FPU hardware. by ralmin · · Score: 1
      You wish. In a big 32-bit game world, effort has to be made to re-origin the data as you move. Suppose you want vertices to be positioned to within 1cm (worse than that and you'll see it), and you're 10km from the origin. The low order bit of a 32-bit floating point number is now more than 1cm.

      Really? You can see 1cm discrepancies at a distance of 10km? I can't do that with my naked eyes, an probably not with even with a basic set of binoculars, let alone the low resolution images that the PS3 (or nearly any computer for that matter) can put out.

      No, that's not what he meant. I made the same mistake on first reading. Imagine that positions within the game world are represented by 32-bit floating point values. There is some point within the game world that is at (0.0f, 0.0f). Now imagine you have moved ten kilometres away from that point. You're not looking at the origin, but just at the features around you. However, the granularity of the position values for each object around you is now worse than if you were standing near the zero point of the game world. Assuming a 23-bit mantissa, the granularity is around 1.2 mm -- not too bad, really. And I don't know too many single game maps that would extend to 100 km, 1000 km where the granularity would become a real problem.

    33. Re:Sadly, not a lotta FPU hardware. by ObsessiveMathsFreak · · Score: 1

      Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?

      Yes. I know one of the top men in the field of numerical analysis, where number crunching is a big deal, and as near as I can tell, his programs are written in matlab on a 32 bit windows machine. I could be wrong here, but as far as I can tell the maths department here has no dedicated machines of any kind.

      If your numerical code has errors of order h^4 anyway, then what's really the point of going to 128bit machines? You'll lose all the precision amid the noise from the numerical method.

      --
      May the Maths Be with you!
    34. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      Apparently some IBM machines implement special fixed point hardware so that the rounding is handled correctly.

    35. Re:Sadly, not a lotta FPU hardware. by localman · · Score: 1

      Ah, I see. Thanks for the explanation. I should have read closer.

      I guess there were some other flaws with the parent post description anyways, but even with the limits described, it still might be better to reorigin periodically? I'll just take the current crop of 32 bit machines as evidence that such is probably "good enough" for most gaming applications.

      Cheers.

    36. Re:Sadly, not a lotta FPU hardware. by salad_fingers · · Score: 1

      The only component of the Cell that would remotely require an FPU is the PPC core which handles the physics and AI. However it does the job fine with out one due primarily to compiler optimization. The compiler is key for Cell because all of it's cores are "in-order", meaning that instructions must be executed in the order they are written and recieved. An out-of-order core requires much more complex circuitry along with surface area and power consumption, and in turn heat dissipation. With an optimized compiler, one can make an in-order core behave similar to an out-of-order one. Branch prediction also improves. Keep in mind that these cores are working with no cache and we can see why the compiler is crucial.

    37. Re:Sadly, not a lotta FPU hardware. by TheRaven64 · · Score: 1
      If you are doing the kind of calculations where 32-bit floats are not enough, and 64-bit might not be, then you probably shouldn't be using the floating point unit at all. Things like matlab use their own numerical representations which don't lose precision. For integers, they will use an array of integers, each overflowing into the next[1]. For rational numbers they will, perhaps, store a denominator and a numerator as big integers. For irrational numbers they will usually store the function used to generate the number (almost always a root) and only turn it into an actual number for output.

      Floating point is for when an approximation is good enough, not for when the real answer is important.

      [1] Yes, I am oversimplifying here. The storage is as I describe, but the operations are somewhat complicated.

      --
      I am TheRaven on Soylent News
    38. Re:Sadly, not a lotta FPU hardware. by Jeremi · · Score: 1
      and there are seven cells per chip


      Do I understand correctly that there are actually eight cells per chip, but they want to be able to handle production errors without throwing too many chips away, so they assume that one of the cells will be defective and therefore disabled?


      If so, what happens when they get lucky and all eight cells function properly? Do they still disable one, just for the sake of uniformity, or do you get a slightly-more-powerful-than-normal cell chip in that case?

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
    39. Re:Sadly, not a lotta FPU hardware. by MikeFM · · Score: 1

      The Cell is a new CPU and needs time to evolve a little. If you try to hit every possible goal for the first generation then the stupid thing will never launch or will cost so much it'll never be used. I think they made a good decision to create this kind of CPU but it won't start getting really impressive until, or unless, there is a second generation. Of course I've noticed in most tech things third time really is the charm so it could be the same with the Cell too. Two sets of revisions and upgrades usually hits a technological sweet spot.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    40. Re:Sadly, not a lotta FPU hardware. by Animats · · Score: 1

      Actually, language support for fixed-point is useful. Ada has it. When the compiler understands fixed point, some useful optimizations are possible. And you avoid errors due to incorrect shifting by programmers.

    41. Re:Sadly, not a lotta FPU hardware. by twiddlingbits · · Score: 1

      I'd REALLY like to see Sun buy SGI and put NUMA into SPARC boxes. That combined with some other things I know are going on would be killer. Disclaimer..I work for Sun ;) We spend way too much money buying software firms that don't make a big difference for our customers. Some really kick-ass new hardware (such as T1) would be nice.

    42. Re:Sadly, not a lotta FPU hardware. by Jeff+DeMaagd · · Score: 1

      I thought that two of the other major objectives for Cell was workstations and scientific computing. If it was just games, I doubt that IBM or Toshiba would have gotten involved. I think it would be really cool if the open source low level movers and shakers could be given these things in the form of a Linux workstation, though I would hope that speculation or promises of that the PS3 being a Linux devel platform do pan out.

    43. Re:Sadly, not a lotta FPU hardware. by tabrisnet · · Score: 1

      Since it's all handled by the software, the 8th SPE will never be used (unless some guy decides to do that with the Linux port... but each application has to be able to know if it has an 8th, and then use it).
      So not disabling it wouldn't really mean enabling it, if you get my drift.

    44. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      I worked on a title that had this issue. It was an MMRPG with a seamless world, and from the origin to the furthest extents of things was 42 kilometers. Once you got to about the 24k mark you'd see a lot of artifacts -- verts shimmying back and forth between frames and so on. We eventually had to rework things to render relative to the sector you were in (everything was divided into a quarter-kilometer grid) so fix the problem.

      Saying "Why don't you store it in an integer" is missing the point (it was, actually). Regardless of what you store the data in, eventually you have to transform and render it to the screen, which is unavoidably floating point, and that means keeping your origin point closer to the camera than tens of millions.

    45. Re:Sadly, not a lotta FPU hardware. by rodac · · Score: 2, Interesting

      No that is not a good example of something that needs/benefits from better accuracy.
      The problem is that code cutters today have zero understanding of what they do or theory and then they blame lack of precision for the "error" terms.

      No matter how high you make precision there are lots of numberical calculations that just can not be done accurately without a proper education in computer science or numerical analysis.


      Question 1: Using Gaussean elimination, I want to invert a Hilbert-matrix with 100 rows and 100 columns, how many bits of mantissa do I need in the float representation if I want the residual error to be less than 1%?
      1, dont know. dont even know how to estimate ==> you should not write numerical software since you lack the tools and understanding required.
      2, make them really big. ==> see above. 3, 128bits. ==> see above 4, could estimate it bit it is pointless since that algorithm is not numerically stable. ==> almost there 5, 4+the Gershgorin(spelling?) circles show that we have to do partial pivoing (english name?) to stabilize the calculations. ==> congratulations right answer.


      I am in the unfortunate position to have to work with the 1-3 answer people. Todays cs degrees are just paper and dont even cover the most basic 101 skillsets. I bet they can hack together an example inventory database app in VB really quickly though.

    46. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 1, Informative

      A 32-bit Windows machine actually has 80-bit floats. The 32 bit are the address size, nothing to do with
      FP.

    47. Re:Sadly, not a lotta FPU hardware. by MORB · · Score: 1

      What about avoiding having such large objects in the first place ? Slice it down into smaller objects (which is a good idea for culling anyway), translate the vertices so that they are relative to some point located within the object's bounding box (for instance, its center), setup the transformation matrix for that object to translate it back where it belongs, problem solved. Now perhaps for the position of an object relative to the world you may need to use double if you have a large world and need the precision, but for the object space to homogenous space matrix used to transform the individual vertices, floats should be enough (as it only need enough precision to represent the position of vertices relative to the camera)

    48. Re:Sadly, not a lotta FPU hardware. by mgblst · · Score: 1

      A difficult task, once you see the math geek women. Though to be honest, degrees better than the computer geek women.

    49. Re:Sadly, not a lotta FPU hardware. by Frumious+Wombat · · Score: 1

      Presuming you're being truthful, then:

      Thank you. Thank you (collective) for actually worrying about shipping a Fortran with your development tools for Solaris x86. For pushing Opterons, which, while not Itaniums, are still strong floating-point processors, and easy to cluster. I'd say nice things about the Sun Ultra60 I used to have as a desktop, but that's a different era.

      Now, get together with the Rocks guys http://www.rocksclusters.org/, and get a Solaris/Rocks installation out there!

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
    50. Re:Sadly, not a lotta FPU hardware. by strikethree · · Score: 1

      Why can't someone invent a chip for math geeks?

      um, because math geeks are dangerous...

      --
      "Someone needs to talk to the tree of liberty about its ghoulish drinking problem." by ohnocitizen
    51. Re:Sadly, not a lotta FPU hardware. by stedo · · Score: 1

      That's interesting. I was thinking more along the lines of CPU support, so that it would ideally be possible to, e.g. do fixed-point multiplications in one instruction, instead of the multiply-shift sequence of any CPU I know of.

    52. Re:Sadly, not a lotta FPU hardware. by default+luser · · Score: 1

      The Cell is a new CPU and needs time to evolve a little.

      No, the Cell is a games CPU that IBM is trying to hock as a general-purpose DSP. The only reason IBM is hyping it is because they already designed it for Sony. Any extra sales beyond Sony is pure profit.

      And I do say this: Cell is a gaming processor. Like most gaming processors, it doesn't put much value on 64-bit floating-point, and instead it is very fast at 32-bit floating-point. For example, Gekko (Gamecube) can execute two 32-bit FP instructions or 1 64-bit FP instruction each cycle using the same hardware. The PS2 (Emotion Engine) doesn't support 64-bit FP. The VMX in Xenon (Xbox 360) is also limited to 32-bit precision. That's just the way the gaming world works, for 99% of the time, 32 bits is enough.

      --

      Man is the animal that laughs.
      And occasionally whores for Karma.

    53. Re:Sadly, not a lotta FPU hardware. by MikeFM · · Score: 1

      In general purpose computing how often do I use 64bit floating point math? I've been programming for 15 years and I can probably count that need on one hand with room to spare. I can see how with certain applications it could speed things up quite a bit but I don't see those things as being very general computing tasks.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    54. Re:Sadly, not a lotta FPU hardware. by twiddlingbits · · Score: 1

      Very truthful, I'm an Architect in Pre-Sales for a large market in the Southwest. While I certainly give feedback on things to the product development teams they seem to have their own ideas about what the customer wants. Even though we are the "point of the spear" we often go unheard. The BAD thing about the Opterons is (IMHO) they are going to start stealing UltraSparc customers, they are just as powerful or more than the USIV+ and a LOT cheaper. Watch for some very exciting Operton architectures coming soon. Opterons also do good with Oracle RAC, using InfiniBand HBAs to hook up to 8 boxes in a RAC cluster for an very HA database.

    55. Re:Sadly, not a lotta FPU hardware. by Frumious+Wombat · · Score: 1

      Well, you can tell them that we (i.e. technical computing end-users) want an easy to install, seamless, HPC cluster, built around Opterons. I mention Rocks because they're a great bunch of guys, but also because their clustering solution is clean, elegant, and works. It also configures for Sun Grid Engine ( :~) right out of the box.

      Politely, while the UltraSparc hardware is great for reliability, management, and end-user support, it's been doomed for several years, from a performance standpoint. We retired our US-III V880s to purchase a large cluster of Opterons, and did better than double our throughput.

      Some of us would like to see a V880 using Opterons, as those hooked together using Infiniband to cluster would solve some serious problems.

      Take all of this with a grain of salt. I'm a chemist who's worked in HPC, and I know we're not the bulk of the market for anyone's systems.

      --
      the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
    56. Re:Sadly, not a lotta FPU hardware. by twiddlingbits · · Score: 1

      A V880 using Opterons..well all I can say is be patient. Some good things in the queue, if you contact somone at Sun and are willing to sign an NDA we can certainly talk more about the specifics of upcoming Opteron (and SPARC) products. We are always looking for early adopters and/or beta testers for new equipment in many different industries.

    57. Re:Sadly, not a lotta FPU hardware. by Anonymous Coward · · Score: 0

      Seriously, if that's a major problem for you then you should be looking for a change in career, or maybe asking for a demotion to QA.

  7. Octointerpreter by yerdaddie · · Score: 2, Interesting

    Reading this is making me nostalgic for LISP machines and interpreter environments that let programmers really play with the machine instead of abstracting it away. What I'd really like to see is someone who takes all the potential for reconfiguration and parallelism and doesn't hide it away but makes it available.

    1. Re:Octointerpreter by idlake · · Score: 1

      Lisp machines were dog slow and hugely expensive, even compared to the workstations around at the time. If you want that kind of environment, you can emulate it by running any of the interpreted languages so popular on Linux, Windows, and Macintosh (although you'll be hard pressed to be as slow as the Lisp machine).

      As for not hiding reconfigurability: you can buy anything you desire as an add-on board, like an FPGA board or an array processor. People don't use them a lot because they are a pain to program and they don't give you good bang for the buck.

    2. Re:Octointerpreter by Tyler+Eaves · · Score: 1

      Hardly. While they weren't super computers, they wern't overly slow for their day (late 80s). They were used for tasks such as 3D rendering (The liquid metal T2 was done on a LISPM). There isn't any environment that comes even close to giving the kind of access they did. You could change kernel code WHILE THE SYSTEM WAS RUNNING, if you wanted to, or any other code for that matter.

      --
      TODO: Something witty here...
    3. Re:Octointerpreter by JohnnyLocust · · Score: 1

      Reading this is making me nostalgic for LISP machines [comcast.net]

      I bet you're real fun in the sack

    4. Re:Octointerpreter by idlake · · Score: 1

      There isn't any environment that comes even close to giving the kind of access they did. You could change kernel code WHILE THE SYSTEM WAS RUNNING, if you wanted to, or any other code for that matter.

      You can do the same with UNIX or Linux.

    5. Re:Octointerpreter by mav[LAG] · · Score: 1

      Really? Pray tell me how I can change say, the Linux kernel scheduler on the fly without having to recompile from source, install and reboot with the new image. This is what the GP was getting at - changing the code of the running kernel of the Lisp Machine while its running.

      --
      --- Hot Shot City is particularly good.
    6. Re:Octointerpreter by Anne+Thwacks · · Score: 1
      Maybe they need to support Algol68 - a high level language which allows the programmer to say what can be done in parallel, and what can't, and lets the compiler/hardware decide what should actually be done in parallel. Not only that, its easy to lean "Algo68 without tears" was only 64 pages or so, and mathematically correct - took 8 years of development by the best mathematicians in Europe, but they got there in the end.

      Unfortunately, it was developed at a time when IBM was like MS is now, and Algol68 was NOT an IBM product, so it was doomed. IBM produced PL/1 instead, which was doomed because it was crap, so we were left with C as the only surviving language from the minicomputer era! (And C is really PDP11 assembler, and the PDP11 was designed to be a "hardware Fortan machine".)

      --
      Sent from my ASR33 using ASCII
    7. Re:Octointerpreter by idlake · · Score: 1

      Pray tell me how I can change say, the Linux kernel scheduler on the fly without having to recompile from source, install and reboot with the new image.

      There are lots of ways. You load a new dynamic module and, if necessary, dispatch to it. In some cases, the dispatching happens automatically, in others, you replace the first instruction of an old subroutine with a jump to the new subroutine. GDB supports that, as do many rootkits.

      For kernel development, most people seem to have stopped doing that because well-defined version control, automated builds, automated testing, and remote debugging seem to work better. Keep in mind that if you set up your kernel development environment well, you can reboot faster than you can type M-x eval-buffer.

    8. Re:Octointerpreter by mav[LAG] · · Score: 1

      Doh - modules. Of course. Thanks for the reminder :)

      --
      --- Hot Shot City is particularly good.
    9. Re:Octointerpreter by Anonymous Coward · · Score: 0

      Note that even before modules, people would use adb on kernel memory to patch or change things. You really can change everything and anything you like at runtime.

  8. Am I ignorant or . . . by Nomihn0 · · Score: 1

    isn't this a bit of a pipe dream? A compiler that optimizes a program for multiple processors is a nice idea, but how can you foresee worst-case-scenarios that only emerge with human use? Take driving as a very abstract example. You "write" a car. You want it to both accelerate and brake on a dime while still being fuel efficient. Without knowing the driving conditions, city or country, how can you optimize your driving for efficiency?

    1. Re:Am I ignorant or . . . by Tinned_Tuna · · Score: 1

      They could write a compiler that only works on the Cell pocessor in the PS3, taking away a lot of the hardware variables. I think that would speed the process pf creating a compiler easier (not easy, easier)

    2. Re:Am I ignorant or . . . by slavemowgli · · Score: 1

      But you do know the driving conditions: they're the specs of the target architecture. It's still not an easy problem, of course, but it's not like you are supposed to write a compiler that emits perfect code for any target architecture - that would indeed be a rather hard problem.

      --
      quidquid latine dictum sit altum videtur.
    3. Re:Am I ignorant or . . . by JesseT · · Score: 1

      Ever heard of profile-guided optimizing compilers? Many runs of a program from such a compiler in "profiling" mode produce metrics about how the code is actually executed. Next, the the program is recompiled using this information, producing a much more optimized compiler.

      Many C/C++ compilers today support this feature.

      I don't know if IBM's compiler for their Cell architecture supports profile guided optimizations, but if they ever want to take full advantage of the architecture, I forsee they will build it in to their compiler.

    4. Re:Am I ignorant or . . . by Kaptain+Kruton · · Score: 1
      Yes, optimization is very tricky when writing a compiler. However, when it is only made to run on one single architecture, or one single type of car, then some optimizations can be done. You make the optimizations based on chip. To put it in terms of your analogy, you 'write the driving' to be optimized for the car and not so much the driving conditions. A minivan is not made for racing, so you would not write code for a minivan that accelerates and turns on a dime. Similarly, a stockcar would not be written to drive off road. These vehicles would be 'written' to optimize and take advantage of what they were designed to do.

      -my $0.02

    5. Re:Am I ignorant or . . . by Nomihn0 · · Score: 1

      I disagree. I see the driving conditions as being analagous to the data a user provides a program. The car is the architecture. This car must function under the strain of user input. The program's job is to regulate the car in such a way that mileage remains good, regardless of the user's behavior.

  9. Is it just me or... by Kawahee · · Score: 1

    Is it just me or is it a bad idea to make something that completely breaks most programming paradigms, and requires a special compiler to compile it properly, and *then* use it in a next gen console, due out this year?

    Surely it was screaming at them that this isn't something that's meant to be released so soon. I mean, the compiler have 4 tiers of 'optimisation', which is meant for the programmers to set so the compiler doesn't make a mess of their memory-management code if they memory manage correctly, or something like that. What this shows to me is that if IBM can't even get the code behind the compiler to make sense of the Cell's architecture, what chance do we have of programming it?

    --
    I'll subscribe to Slashdot when I see a month without a dupe, a typo, or an article the "editors" didn't read.
    1. Re:Is it just me or... by Tx · · Score: 1

      Is it just me or is it a bad idea to make something that completely breaks most programming paradigms, and requires a special compiler to compile it properly, and *then* use it in a next gen console, due out this year?

      Not really, it's future proofing. It can be used as pretty much a still pretty powerful single core machine for the initial release titles, and as the programmers get to grips with how to get the most out of the cell architecture, and better tools come out, the titles will keep getting better over several years. Actually it's pretty much ideal, given the desired life of the console.

      --
      Oh no... it's the future.
    2. Re:Is it just me or... by TheRaven64 · · Score: 1
      I would imagine that a lot of the problem is trying to generate SPU code from a language such as C. I would have thought that the solution would be to design a language more like Erlang[1] that is designed for parallelism, and allow your programmers to express their algorithms in this, rather than getting them to program for the PDP-11 and then trying to turn this into optimal code for something like the Cell.

      [1] Much as I like Erlang, it would not actually be quite suitable for the Cell.

      --
      I am TheRaven on Soylent News
    3. Re:Is it just me or... by Anonymous Coward · · Score: 0

      Is it just me or is it a bad idea to make something that completely breaks most programming paradigms, and requires a special compiler to compile it properly, and *then* use it in a next gen console?

      The guys at IBM wanted to make sure that when Dead or Alive comes out for that console, they can get maximum framerates and set 'Bouncing Boobies' to Hyperealistic.

      These *ARE* math geeks, you know.

    4. Re:Is it just me or... by Samurai+Crow · · Score: 1

      The old 8086-based PCs were hard to program too but their memory expansion used DMA transfers very similiar to the way the SPEs handle their memory stashing and fetching. The instruction set of the SPEs are not very unique except for the "branch hints" for software controlled branch prediction. Apart from that it's pretty straightforward.

  10. A new era in performance breakthroughs? by PornMaster · · Score: 1

    Microsoft's Todd Proebsting claims that compiler optimization only adds 4% performance per year, based on some back of the envelopes on x86 hardware.

    This radical of a change in architecture should at least provide an accelerated growth from introduction through the next several years, which I'm sure will provide added incentive for those involved in compiler optimization -- finally, some real enhancements.

    1. Re:A new era in performance breakthroughs? by stedo · · Score: 1, Interesting
      Microsoft's Todd Proebsting claims that compiler optimization only adds 4% performance per year, based on some back of the envelopes on x86 hardware.

      Then Microsoft's Todd Proebstring is wrong. Ask some Gentoo users. Personally, I recently wrote a bit of fairly simple mathematical code (computing difference sets). The total runtime on my 3 gig P4 was 22 seconds. I shaved off 2 seconds by optimizing the algorithm myself. By using gcc -O3, I shaved off a further 10 seconds, halving the runtime.

      Anyway, this compiler isn't so much optimization as taking code intended for one paradigm (simple single-threaded code) and converting it to another (code with 8 cores of execution).

    2. Re:A new era in performance breakthroughs? by hunterx11 · · Score: 2, Funny

      Your post reminds me of the old adage, "Any sufficiently advanced fanboyism is indistinguishable from trolling."

      --
      English is easier said than done.
    3. Re:A new era in performance breakthroughs? by PhrostyMcByte · · Score: 1
      There is no magic silver bullet to vectorizing code. Compilers need to guarantee that your app will run how you meant it to run and that is no small task when it needs to infer from a language without explicit parallelism support. If the PS3 uses standard C++, I doubt this compiler will do much to help measurably.

      At the last PDC, Microsoft announced some very exciting ideas it is looking at to propose for the next C++ standard that will give language support for parallelism, essentially letting you do things like:
      vector<int> vec;

      future<int> i = active { return 1+1; };

      // do stuff while we wait for i to complete
      for(int val : vec) active {
      // process each in parallel
      }

      usei(i.wait());
    4. Re:A new era in performance breakthroughs? by Anonymous Coward · · Score: 0

      I like how you stopped reading and started mouth-foaming before you got to the "per year" part.

    5. Re:A new era in performance breakthroughs? by Anonymous Coward · · Score: 0

      Todd Proebstring said that compiler optimization adds 4% performance per year.

      You optimized an inefficient bit of code. You didn't optimize the compiler.

    6. Re:A new era in performance breakthroughs? by Anonymous Coward · · Score: 0

      Yet more proof that Gentoo rots the brain. That's assuming you have one to start with.

    7. Re:A new era in performance breakthroughs? by jadavis · · Score: 1

      There is no magic silver bullet to vectorizing code.

      It's even harder when there's no memory protection. One might imagine (within reason) that a Java compiler could separate independent tasks by tracking what variables are used in what sections of code, and inferring that one section must be independent of another until you reach line X (at which point you may need to synchonize access to a variable the two pieces have in common, or join the threads). That could (perhaps) achieve decent multithreaded performance even for apps written in sequential code.

      However, if there's no memory protection, then any pointer in the code can affect any other part of the running code (except for, e.g., a memory page in the text segment which may be hardware protected), because it's all running in the same virtual address space. So it makes me wonder how they even begin to attack that problem from the compiler. Of course adding support to the language like you suggest makes it easier for the programmer to do it, but the article is talking about the compiler doing it for the programmer.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    8. Re:A new era in performance breakthroughs? by Anonymous Coward · · Score: 0

      That's an interesting and yet completely irrelevant anecdote.

      The point is not to compare 'gcc foo.c' to 'gcc -O3 foo.c', it's to compare 'gcc -O3 foo.c' where GCC is today's version to 'gcc -O3 foo.c' where GCC is a version from a year ago.

    9. Re:A new era in performance breakthroughs? by Anonymous Coward · · Score: 0

      I use -O999! My Gentoy box boots before I turn it on!

    10. Re:A new era in performance breakthroughs? by dmoore0100 · · Score: 1

      >> Proebsting's Law: Compiler Advances Double Computing Power Every 18 Years >> I claim the following simple experiment supports this depressing claim. Run your favorite >> set of benchmarks with your favorite state-of-the-art optimizing compiler. Run the >> benchmarks both with and without optimizations enabled. The ratio of of those numbers >> represents the entirety of the contribution of compiler optimizations to speeding up >> those benchmarks. Let's assume that this ratio is about 4X for typical real-world >> applications, and let's further assume that compiler optimization work has been going on >> for about 36 years. These assumptions lead to the conclusion that compiler optimization >> advances double computing power every 18 years. QED. >> >> This means that while hardware computing horsepower increases at roughly 60%/year, >> compiler optimizations contribute only 4%. Basically, compiler optimization work makes >> only marginal contributions. ??? So what kind of optimisation are we talking about ? 1. Microoptimisations - well modern GHz micros are pretty fast at everything - they'll even run bad code fast ultimately reducing the ratio difference. 2. Modern Out of order optimisations, superscalar - well these usually do make a more noticeable impact. The DEC Gem Compilers ran code 40% faster - but the compiler is designed to utilise such architectures on purpose and as such does not have a "only run this code utilising half the hardware resources of the micro" type mode so a naive user running a benchmark cannot see the difference. 3. Autoparellelisation detection for the Cells 8 Vector Units - a special case of (2) ------- members.lycos.co.uk/dmoore0100.

  11. A summary of the idea here... by edashofy · · Score: 1

    Posit: Parallel processing can solve certain types of problems much faster than serial processing.
    Posit: The Cell architecture is highly parallel.
    Posit: Most programmers today are good at writing serial, not parallel, code.

    Hypothesis: A compiler can be developed that takes serially written programs and auto-transforms them into parallel programs to exploit the benefits of parallelism.

    Now comes the research to attempt to validate that hypothesis. Will it succeed? We'll find out in several years. There are likely to be some suprising results, and maybe even a paradigm-shattering breakthrough. Or, this line of research may just peter out. It happens.

    1. Re:A summary of the idea here... by irexe · · Score: 4, Insightful
      Hypothesis: A compiler can be developed that takes serially written programs and auto-transforms them into parallel programs to exploit the benefits of parallelism.

      Parallel programming and automated parallelization have already been researched exhaustively throughout the last thirty years of the 20th century. The outcome of all this research is that it is not feasible/tractable to create a compiler that is capable of recongising parallelism, as you suggest. Compilers that can do this are sometimes called 'heroic' compilers, for the reason that the required transformations are so incredibly difficult, and heroic compilers that actually work (well) simply don't exist.

    2. Re:A summary of the idea here... by stedo · · Score: 1
      Personally, I think we are going to need new languages to cope with parallel execution models. Consider this example: the for loop. A for loop in C or any of its offspring (C++, Java, C#, etc) relies on side-effects inside the loop to advance the code, and eventually one such side-effect will cause the loop to exit. This design implies serial execution, and converting it to parallel code would be extremely difficult.

      Now consider one of the common uses of a for loop: to perform the same operation on an array or matrix of data. This is a conceptually parallel operation, which the programmer has had to force into the languages serial operation structure. To write a special compiler to convert it back out into parallel code is an unnecessary waste of time and effort. Instead, a new language should be written, which allows programmers to directly write code in parallel units. Imagine a language where, for example, all function calls were asynchrous.

      AFAIK, the APL language was data-parallel which meant that you could perform operations, at least conceptually, on large sets of data at the same time. However, this language was last popular in the 60s. Anyone know of a modern language that can exploit parallelism?

    3. Re:A summary of the idea here... by Space+cowboy · · Score: 1

      See my reply above (vcl v2) and look on the linux for PS2 website for VCL.

      VCL takes sequential code and splits it up into parallel code based on the constraints of the vector-units (each VU is dual-issue, with some restrictions). It'll re-order code, insert wait states, etc. Certainly it's a good start at auto-parallelisation of the code. It's supposed to do as well as a skilled engineer...

      Simon

      --
      Physicists get Hadrons!
    4. Re:A summary of the idea here... by SlayerDave · · Score: 2, Interesting
      Well, there's already been one parallel processing success story - the GPU. Granted, the GPU provides a more restrictive programming environment and memory model than the Cell, but with the right training and the right tools, it is possible to write code that effectively exploits parallelism.

      Let's also not lose sight of the big picture with regard to the Cell: the 8 parallel vector processors are coupled with a single CPU core derived from the PowerPC chip. So the overarching structure of the Cell isn't all that different conceptually from a typical CPU-GPU setup in most PCs today.

    5. Re:A summary of the idea here... by Anonymous Coward · · Score: 0

      The only project I'm personally familiar with on those lines is BrookGPU.

    6. Re:A summary of the idea here... by Anonymous Coward · · Score: 0

      I would appriciate any direct info - link to ACM, or other. I'm indirectly involved in a project where some folks think an auto parallel compiler can be written. A direct source as to why not would make my job as the messanger of bad news a bit easier.
      Posted AC to protect the employed!

    7. Re:A summary of the idea here... by middlemen · · Score: 1

      You might be able to write an auto-parallel compiler, but how will you get performance out of it ? How do you know whether the parallelization that the compiler is doing is efficient or not ? There are so many algorithms that need to be considered. What about the mesh algorithms which need migration of points and data ? The methodology and efficiency of parallellizing each algorithm will vary with algorithm and might even vary with each data set given to the algorithm. I work in parallel processing Monte Carlo algorithms and sometimes the parallelization is not worth the effort because the input data is so skewed that it actually slows down the work. An auto-parallel compiler can exist for specific cases only and not be an all-case-applicable universal parallelizer.

    8. Re:A summary of the idea here... by Anonymous Coward · · Score: 0

      Verilog and VHDL both support(and are based on) explicit constructs to indicate parallel activites. However these languages are designed to describe logical circuits, not algorithms. They force you to take care of EVERYTHING. If you had an algorithm that could be done in parallel it would only be done that way if the designer coded it in that manner.

    9. Re:A summary of the idea here... by Kupek · · Score: 1

      It's not that it's can't be done, it's more that it hasn't been done. I'm not aware of any particular proof of why it's impossible, but I am aware of many of the reasons why it's extremely difficult. The most I've seen a compiler do on its own is recognize that a simple loop could be vectorized (Intel's compiler does this). But other than that, if you want parallel code, you need to parallelize it yourself.

      Auto-parallelizing compilers have been a holy-grail type problem for a while. On the other hand, there are facilities such as OpenMP which can buy you a lot of power with little programmer cost. Doing complex tasks in OpenMP is complex just like any other programming model, but simple parallelization remains simple. There are many, many parallel languages, but I think OpenMP might become the standard shared-memory programming model because it fits on top of a language (C and Fortran) instead of supplanting it.

  12. Anyone having flashbacks? by SmallFurryCreature · · Score: 4, Insightful
    I seem to remember that the PS2 was a bitch to code for as well and that many of the early titles did not make full use of its capabilities. So?

    All this meant that as the PS2 aged it could 'keep up' because the coders kept getting better and better.

    Mere mortals do not write the latest graphics engines. I think there are a lot more tier1 people running around then /. seems to think. They are just to busy to comment here.

    All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan.

    If your a game company and your faced with the choice of either making just another engine OR spending some money on the kind of people that code for super computers and get an engine that will blow the competition out of the water then it will be a simple choice.

    Just because some guy on website finds it hard doesn't mean nobody can do it.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:Anyone having flashbacks? by buffer-overflowed · · Score: 1

      Or you could put a quarter of those resources into a platform that's far easier to develop for and wind up with the same result. You know, like the XBox or Gamecube versus the PS2.

      The only thing Sony has going for it is inertia, and everyone knows this.

      --
      The key to the enjoyment of pop music is to replace any instance of "love" with "C.H.U.D."
    2. Re:Anyone having flashbacks? by moosesocks · · Score: 1

      The Sega Saturn had the same problem, except that it never achieved the critical mass necessary to produce games taking full advantage of the hardware, and Sega pulled the plug.

      In a lot of ways, the PS3 is looking VERY similar to the Saturn. Complex hardware with several individual processing units. Lots of potential, but also very risky.

      Of course, with the momentum from the success of the PS2, and the backing of Sony, I think that the PS3 will perform better than Xbox360. As I've been predicting all along, however, I still think that Nintendo's going to dominate this round, and we're going to see a lot of incredible and unique games that will appeal to a huge range of audiences.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    3. Re:Anyone having flashbacks? by Tweekster · · Score: 1

      just like 5 years ago with the xbox.... what makes you think it will happen different this time? what you said vs the gp

      --
      The phrase "more better" is acceptable English. suck it grammar Nazis
    4. Re:Anyone having flashbacks? by Kupek · · Score: 1

      Just because some guy on website finds it hard doesn't mean nobody can do it.

      It's not just some guy on a website. The main problem he's talking about - taking high level code without explicit parallelization and discovering parallelism during compilation - is a difficult research problem.

    5. Re:Anyone having flashbacks? by Mr2001 · · Score: 1

      In a lot of ways, the PS3 is looking VERY similar to the Saturn. Complex hardware with several individual processing units. Lots of potential, but also very risky.

      Of course, you could say the same about the PS2. It has several individual processing units, none of which are all that spectacular on their own, but they're connected with massive pipes. Taking advantage of all the PS2's hardware means using all of those units to their fullest and moving data through them as fast as you can.

      Programming for the PS3 will likely be harder than programming for the PS2, but my understanding is that the separate units of the PS3 will be more similar to each other--like writing a program for a cluster of Linux boxes, rather than the PS2's cluster where one machine runs Linux, another runs Windows, another runs BeOS, etc.

      --
      Visual IRC: Fast. Powerful. Free.
  13. compilers ... by dioscaido · · Score: 4, Insightful

    ... can get you only so far. You need to have parallelism in mind when you write the high-level code, otherwise it may end up with needless dependence on serial execution that a compiler may not be able to break, reducing the benefits of such an architecture. It will be interesting to see how well games are suited for concurrent execution. Logically there are lots of computations that can be performed independently (AI, physics) but all of it has inherent interaction with a central data source (the game world).

    1. Re:compilers ... by CarpetShark · · Score: 1

      There are compiler extensions that allow for multi-threaded code etc., specifically designed with parallelism in mind. However, yes, your point is good. I think the playstation will need some well thought-out high-level engine APIs even if the compiler is good, before many games with optimal performance are released. However, I'll be surprised if the Cell becomes cheap and has good raw performance, but isn't readily adopted and adopted to by the high-performance computing crowd.

    2. Re:compilers ... by JFMulder · · Score: 1

      Why should you separate each tasks into different threads. Instead, you need to multiply this skinning matrix by a million vertices? Why not multiply half of them on one core and the other half on a second one and then synchronize so that the you move to the next step only when both are done? Most work in a game engine right now is pretty much linear, so it seems to me that the easiest way to use the Cell right now would be to split one task at a time across cores that splitting tasks across cores.

  14. Why hasn't this been done before ? by zymano · · Score: 1

    Always wondered why there is no cooperation between chip makers and even video card companies to make a compiler like this.

    1. Re:Why hasn't this been done before ? by idlake · · Score: 1

      There has. Itanium is the most recent example. Most of those efforts fail because, in the real world, getting good performance only with a single compiler from a single vendor, and then usually only if the stars align right, isn't good enough.

    2. Re:Why hasn't this been done before ? by Anonymous Coward · · Score: 0

      Because it's not a hardware problem - it's a software problem. Many people have already attempted to develop compilers that automagically parallelize serial programs. The problem is, you can't easily parallelize a program written in a language that has global side-effects, and virtually all useful programming languages have global side-effects.

      The Octopiler as a general solution will fail if IBM can't come up with appropriate programming language abstractions.

    3. Re:Why hasn't this been done before ? by zymano · · Score: 1

      Thats too bad. Could be a big breakthrough for performance if it can be done.

  15. Quad precision by pkhuong · · Score: 1

    SPARCv8(?) and up have quad precision.

    I've also implemented a simple double double (represents numbers as an unevaluated sum of two non-overlapping doubles) arithmetic in CL. It was ~25% as fast as doubles (mostly branchless, each op expands into ~2-8 double precision op). That gives an upper-bound on the slowdown ratio for the emulation of doubles with singles.

    --
    Try Corewar @ www.koth.org - rec.games.corewar
    1. Re:Quad precision by lordholm · · Score: 1

      Yes, the V8 have support for quads, but I can't think of a single implementation that does not force the OS to emulate it.

      --
      "Civis Europaeus sum!"
  16. You had machine language? by Flying+pig · · Score: 2

    You were lucky. We had to write our own microinstructions using a 12 bit ALU with no barrel shifter, and then burn them into ROM using a magnifying glass to vaporise the aluminium interconnect. And you had hard disks? We had to hand code on paper tape using a leather punch to make the holes. And we thought we were lucky. Next door, the guys in Alan Turing's department were having to stick together infinite paper tapes for some machine he made in the 30s.

    --
    Pining for the fjords
    1. Re:You had machine language? by Mike+Savior · · Score: 1

      >Next door, the guys in Alan Turing's department were having to stick together infinite paper tapes for some machine he made in the 30s.

      And it was uphill, both ways!

      --
      space is pretty cool.
    2. Re:You had machine language? by Anonymous Coward · · Score: 0

      You had a universe and matter? First i had to create myself, then build a universe to exist in....

    3. Re:You had machine language? by Ginger+Unicorn · · Score: 1

      I write C code in my head, and compile it by reading the source code of GCC out loud while counting on my fingers. Then i settle down with nice cup of earl grey and stare blankly out of the window as i hand optimise bottlenecks in the compiled binary. This involves imagining a virtual machine layer that allows me to run a virtual 386, PowerPC, etc in my brain.

      --
      (1.21 gigawatts) / (88 miles per hour) = 30 757 874 newtons
    4. Re:You had machine language? by avronius · · Score: 1

      Luxury! We used to hang children on the clothesline by their feet - moving them back and forth like a giant abacus. Loud devices - always complaining about which sector they were in. You couldn't put two 'bits' together, or they'd byte each other. As soon as they were released from the line (for a nibble, etc.)- well, I need not tell you about the results of a head crash from that height.

      Booting them up was fun...

      (no children were injured in the creation of this e-mail message)

  17. Far too complex? by hptux06 · · Score: 2, Insightful

    Cells big programming problem goes right down to each SPE: The assembler commands for which cannot actually address main memory! Every time information is read into / out of the 256K "local storage" on each SPE, a dma command must be issued. Now, while this is Cell's greatest asset (Execution continues while seriously slow memory movement occurs), it is also difficult to work with.

    Your average C programmer doesn't take architecture into account, and so there's no user indication of whether a variable can be paged to maim memory, if code needs to be fetched, and crucially: how far in advance data can be pre-loaded into the local storage, to avoid the SPE hanging on a memory operation.

    I'd guess that this new compiler will try to address these issues, which is suggested by the article.

    1. Re:Far too complex? by stedo · · Score: 3, Insightful

      Your average C programmer will not be developing the core code. Most likely, a group of very good coders will create a game engine, and the average C programmers can use the API that the highly-skilled, highly-paid engine coders created to hide unnecessary implementation details.

    2. Re:Far too complex? by CarpetShark · · Score: 1
      Your average C programmer doesn't take architecture into account,


      That's because, to the average C (or C++) programmer, speed doesn't matter -- ease of coding and debugging and maintenance does. However, that's not the case with games developers (or, more correctly, games engine developers these days), or high-performance computing people (ie, scientists who write weather prediction programs and such). To them, it matters, and they'll code for it. But, they also have tools like MPI and PVM, which are designed to handle parallelism, and will do most of the work for them in a way they're used to, with little or no API-level changes.
    3. Re:Far too complex? by ooze · · Score: 1

      And what does that mean? Basically that we are struck with C for over 30 years already, and that reliance and dependence on that language, despite all the benefits if gave us, for being the reasonably fast language available everywhere and reasonably know by many, it also severely limits further progress. There is virtually no new programming technique or paradigm or even any real new syntacitcal construct since the times of C. And even worse, C has only a fraction of all this available.
      So I'd like to say, it's about time there comes some movement in the programming language front. Of all the "big new" programming languages that came up in the last 15 years, from Java, to C# to Ruby to Python, to OCAML, none had anything particularly new, that pushed the limits of expressiveness. People need to learn new ways of programming, making use of all the power the modern hardware gives us. But that certainly cannot be done by dumbing the interface between programmer and hardware down.

      --
      Just because I can imagine doing a hippopotamus, doesn't mean I'd like to do it.
  18. No, it's there alright by Daath · · Score: 4, Informative

    Nah, it's there. Download it, if you want ;)

    --
    Any technology distinguishable from magic, is insufficiently advanced.
  19. OK, Great a compiler, but ... by dnamaners · · Score: 1

    Hmm that FA was totally devoid of any real details. As it seems to me, and granted I do not develop on cell processors, and I am not a stickler for the "next big thing", but these things may be interesting. Unfortunately, if they want me to use them I need to know it works for me. I want my existing code to compile with minimal changes so I can test the new platform in the raw. I have the resources to test a few "maybe good may be not" systems a year. What I want to know in short is, If it "could" work well. This means I need to use my existing code base in part (their tier IV). I am happy to optimize in my spare time and if need be, once I know it "could be the thing". If the platform passes that test I'll buy a few more units and make a real go if it. I don't think that the cell processor is to that point yet, too little hardware on sale righ now and no software, and there lies the problem. Open source compiler support would be a big plus, but if the platform is "just that good" I can make an exception.

    my $0.02

  20. here's the real article... by advocate_one · · Score: 4, Informative
    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    1. Re:here's the real article... by CarpetShark · · Score: 1

      Hahhah :)

      On the other hand though, some boffins had to code this, and there were probably a few junior programmers involved somewhere too, who can now claim to have been part of it all ;)

  21. Think About This for a Moment by Quantam · · Score: 1

    Somebody had to code this monstrosity of a compiler, and it wasn't you. Isn't that enough of a reason to believe there's a god?

    --
    You have tried to support your argument with faulty reasoning! Go directly to jail; do not pass Go, do not collect $200!
    1. Re:Think About This for a Moment by elknco1 · · Score: 1

      god works for IBM???

      OTOH, if i WAS that guy, should that be enough of a reason for me to believe there is NO god?

  22. And when it fails... by errxn · · Score: 1

    ...it will be known far and wide as the "Octopile o' Crap."

    --
    In Soviet Russia, Chuck Norris will still kick your ass.
  23. special compilers, expert programmer = DOA product by idlake · · Score: 2, Insightful

    If a CPU needs a special compiler in order to give good performance, it's basically dead; there are simply too many different applications that do binary code generation.

    Also, the division into "expert programmer" and "regular programmer" is silly. Most coding is done by people who aren't experts in the cell architecture (or any other architecture). That's not because people are too stupid to do this sort of thing, it's because it's not worth the investment.

    If Cell can't deliver top-notch performance with a simple compiler back-end and regular programmers who know how to write decent imperative code, then Cell is going to lose. Hardware designers really need to get over the notion that they can push off all the hard stuff into software. People want hardware that works reliably, predictably,and with a minimum of software complexity.

    Maybe CISC wasn't such a bad idea after all--you may get less bang for the buck, but at least you get a predictable bang for the buck.

  24. Not really by CaptainCheese · · Score: 1

    keyword: "decent"

    according to the article, the compiler's still in early stages of development...

    --
    -- .sigs are a waste of data...turn them off...
  25. Yay! A new generation, FINALLY! by porkThreeWays · · Score: 2, Interesting

    I'm glad to see some real progress in the processor world. We are so guided by the enterprise market that we've had to support x86 WAY longer than we should have. The cell looks like it has a real chance of becoming the next big advancement. For one, IBM is working heavily with the open source community. This is possibly one of the best things they could have done to help the cell. By doing this, you make open source developers happy and more inclined to port over their applications. One of the hardest things to do in getting a new arch out is getting application support, and they've pretty much guaranteed a modest amount of applications by going open source. The nokia 770 is a pefect example of this. They've supported open source and made available more than enough tools for quick porting of applications, and look at the huge amount available already in the first few months. The nokia 770 probably sets records in how many applications were ported in such a short period of time.

    Make the developers happy, and they will port their apps. With large amounts of available applications, the consumers will buy. When the consumers buy, you have a successful new arch.

    --
    If an officer ever threatens to taze you, say you have a pacemaker.
    1. Re:Yay! A new generation, FINALLY! by jadavis · · Score: 1

      The cell looks like it has a real chance of becoming the next big advancement.

      It will be interesting to compare the Cell with the UltraSPARC T1 (Niagara). They both have about 8 cores (T1 is 8 cores, Cell is 8+1), but the T1 can do 32 threads of execution simultaneously. The Cell has good floating point performance, but the T1 only has 1 FPU for all 8 cores (it's specifically not designed for FP performance). The T1 has very low power requirements, at about 72 watts (79 peak), while (as far as I can tell from google) the Cell will have high power consumption and they have not disclosed the exact figures yet.

      And both companies are working very closely with the open source community. Sun actually went further, and open sourced the entire SPARC architecture. As far as I know, IBM is not opening up their architecture.

      They clearly have different markets, but they are similar in the multithreading aspect. Whoever does a better job of the multithreading and makes good compilers that can help the programmers write parallel code will then be able to move into the other company's space (if Sun does it better, they can add FPUs, if IBM does better they can remove them). And that success depends on open source involvement, which depends on an architecture that is easy to code for. If open source programmers get heavily involved in a concurrent compiler for one architecture, it will win in the long term.

      So, it's clear why both companies are fighting to get the attention of the open source community, which is becoming (in a lot of ways) the force that drives which technologies are actually used in business. And that's certainly good news.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
    2. Re:Yay! A new generation, FINALLY! by NutscrapeSucks · · Score: 1

      I dunno, it seems like Open Source developers generally produce generic C/C++ code and are unlikely to devote massive amounts of time optimizing for a specialized architecture like Cell. There's been a PS2 Linux kit for years, and how many specialized open source PS2 apps are there?

      The Nokia 770 is hardly comparable because it's really just a standard *nix/X11 box in a handlheld form.

      --
      Whenever I hear the word 'Innovation', I reach for my pistol.
    3. Re:Yay! A new generation, FINALLY! by JFMulder · · Score: 1

      Make the developers happy, and they will port their apps.
      I fail to see how your arguments up to this apply only to open source programmers. I work on a closed source project in a company, and we would love to get a cell based computer to run our image processing routines on. We are already threaded to death and almost scale linearly to the number of CPU (I mean, we do get a almost 2x increase with dual-cores), so something like cell would be very good to us. Closed sources developpers can embrace a new platform as well if not maybe even better than open source programmers, because financial incentives force the embracing of a new technology. Of course, I'm making the assumption that open source programmers are always operating without any economic pressure, which is not true in the case of appplications like Berkeley DB, mySql, the Linux guys at IBM, etc, but these are the exception, not the norm.

    4. Re:Yay! A new generation, FINALLY! by Anne+Thwacks · · Score: 1

      Do you know where I can get the GTA series for Sparc hardware? You can get Sun E450 Enterprise servers on E-bay for under $1000 (although you'd pay a bit more for one with 8 CPUs installed) It would be serieously cool to use one for great games!

      --
      Sent from my ASR33 using ASCII
    5. Re:Yay! A new generation, FINALLY! by Anne+Thwacks · · Score: 1

      Sun is your friend. Google for "Niagra"

      --
      Sent from my ASR33 using ASCII
    6. Re:Yay! A new generation, FINALLY! by jadavis · · Score: 1

      The SPARC T1 would probably not be very good for 3D graphics and other calculations in a game, because it only has 1 FPU. Of course it could have a good 3D graphics card, but then what are you using the processor for? The T1 is really optimized for an application server or something like that.

      --
      Social scientists are inspired by theories; scientists are humbled by facts.
  26. Tier II programmers by tepples · · Score: 1

    It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now.

    Obviously titles whose programmers earn a hefty salary premium for having Tier II skills (as defined in The Article). The art might not look as "next-gen" as it could because the developer had to reallocate some of the art budget toward programming.

  27. I corresponded with a Sparc designer. by mosel-saar-ruwer · · Score: 1

    I corresponded with the Sparc designer about this very question, because LabVIEW supports a 128-bit "quad-precision" double for Sparc platforms:
    http://zone.ni.com/devzone/conceptd.nsf/webmain/37 0DFC6FD19B318C86256A33006BFB78?opendocument
    I sent some email back and forth with one of the dudes on the Sparc design team, and he said that Sparc's 128-bit quad-precision double is a purely software implementation.

    Compare e.g.

    Floating-Point Computing A Comedy of Errors?
  28. what are you talking about? by twitter · · Score: 1
    Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."

    That's kind of a weird comparison given the differences in innovation, demonstrated results and company attitudes.

    IBM's Cell is a much more radical break from previous chips like Itanium, but the CES demo was reported to be very impressive. IBM has already released the SDK and openly published all specifications. The pace of development has been very rapid and people are predicting the replacement of Intel. The missing piece was a compiler to ease transition. It looks like that's coming along just fine.

    The Itanium on the other hand was obsolete on it's launch. Even HP dumped it after killing their own better performing 64 bit processor for it and spending billions of dollars and ten years building it.

    We can only wonder how things would have been if Intel had opened things up like IBM has, instead of making it so people have to figure things out on their own.

    --

    Friends don't help friends install M$ junk.

    1. Re:what are you talking about? by samkass · · Score: 1
      The pace of development has been very rapid and people are predicting the replacement of Intel.


      Sorry, you lost all credibility there. The Core is a single core with a bunch of DSPs tacked on. It's a great replacement for a general purpose PowerPC in many embedded applications, but won't touch Intel's target market any time soon. In the year and a half since that article was written we've learned how much Intel and AMD can do to keep ahead of the game and how applicable to general-purpose computing the Cell isn't.

      --
      E pluribus unum
    2. Re:what are you talking about? by ianpatt · · Score: 2, Interesting

      >We can only wonder how things would have been if Intel had opened things up like IBM has, instead of making it so people have to figure things out on their own.

      It's not quite as clean as it looks. "Full specifications" doesn't include any information on instruction latencies, cache performance, etc. They've documented the platform itself, but not the specific implementation. This makes optimization difficult.

      I've had to distill information from several publications to determine even basic things like how many cycles it takes to retire a floating point add. So the information /is/ out there, you just need to do a lot of work to get it.

    3. Re:what are you talking about? by jafac · · Score: 1

      It sure as hell HAS touched Intel's target market.

      It forced Intel to actually try and compete. Thus giving us some cool Intel (and AMD) CPU's. I think Intel pretty much ditched Itanium for this reason.

      --

      These are my friends, See how they glisten. See this one shine, how he smiles in the light.
    4. Re:what are you talking about? by samkass · · Score: 1

      I'm sorry, but I disagree. I don't think Intel has changed a single thing in response to Cell in any desktop or server chip. They MAY have tweaked the StrongARM future designs to compete on the DSP level, but I'm almost positive that Cell has no affect on Itanium. The Itanium was crushed by AMD64 at the low end but is still a very viable and well-respected chip at the very high end (massively parallel number crunchers,) but will compete much more with the POWER chips than the Cell chips.

      I'd argue AMD has forced Intel to try and compete. The Cell is hardly a blip on the radar, and will likely be about as popular in a desktop as StrongARM, SuperH, or other embedded chips.

      --
      E pluribus unum
  29. Re:special compilers, expert programmer = DOA prod by Anonymous Coward · · Score: 0

    "If a CPU needs a special compiler in order to give good performance, it's basically dead; there are simply too many different applications that do binary code generation."

    Do you mean like the pentium 4? AFAIK it was quite succesfull.

  30. CISC? by Billly+Gates · · Score: 1

    Is it just me or is it that we went from cisc to risc and now going back to risc again?

    I assumed less complex chips with optimizations coming from compile time were more efficient or cost effective?

    1. Re:CISC? by tarpitcod · · Score: 2, Interesting

      A key problem with CISC was that doing virtual memory and handling page faults on a CISC processor was so incredibly insanely complicated that you ended up going insane and designing your pipeline could throw multiple page faults on one instruction and you had a god-awful mess to clean up.

      The problem with the Cell is actually pretty interesting. They decided to go for in-order CPU's for the SPE's which means that to get good performance you sure as hell better know what your dependencies are and take into account memory latency etc.

      OTOH modern RISC CPU's normally do nice out-of-order stuff which whilst making the CPU more complicated makes life easier for the programmer - compiler.

      Itanium took the clean approach - and it flies on FP workloads that the compiler can do a good job on. The PS3 (like Itanium) should rock - once programmers get lots of nice little kernels that do groovy stuff (think super shader programs) in the SPE's. Just that will make the eye candy pretty.

      The counter argument is the 'Look at what happened with the i860'. It had amazing performance on kernels but was just totaly evil to program and compiler writers pulled out their hair.

      I don't know enough about modern game programming to know if the PS3 route is a good one to take - and it's easy to bitch at Sony for going too far - OTOH look at the PS2 games now vs at release. The PS3 games should slowly get better and better and better if they don't crash and burn and give up...

      --Tarp

    2. Re:CISC? by Tim+Browse · · Score: 2, Funny
      Is it just me or is it that we went from cisc to risc and now going back to risc again?

      Yeah, but the advantage of doing it this way is that the 2nd transition (from risc back to risc) is really quick!

    3. Re:CISC? by tarpitcod · · Score: 1

      It's not really CISC - complex instructions that kill you - it's the heaps of addressing modes with lots of indirection. That just kills you if you combine that with virtual memory. If you imagine a hypothetical CISC that has an instruction that does 20x as much work as a RISC CPU - but only works on registers then it doesn't seem so bad.

      With memory latency being as bad as it is I'm often surprised that more chips don't do CISCY type instruction sequences - you could certainly decrease instruction bandwidth...

      --Tarp

    4. Re:CISC? by dimfeld · · Score: 1

      The PPE on the Cell also doesn't do out-of-order execution. All the silicon they saved in stripping that out probably helped a lot in being able to fir the 8 SPEs in there too.

    5. Re:CISC? by dimfeld · · Score: 1

      One major reason for the move back toward CISC-like technologies is simply that as transistor technology has improved, there's been less of a tradeoff between advanced hardware functionality and the speed at which the hardware can run.

    6. Re:CISC? by tepples · · Score: 1

      With memory latency being as bad as it is I'm often surprised that more chips don't do CISCY type instruction sequences - you could certainly decrease instruction bandwidth...

      Depends. The 'mul' and 'div' instructions are already microcoded on many architectures, and so are a lot of CPU architectures' "legacy" instructions and kernel level instructions. But is adding more complicated instructions worth the microcode memory and other decode logic that would otherwise be assigned to an instruction cache?

    7. Re:CISC? by tarpitcod · · Score: 1

      I see what your getting at - and taking your point further - Isn't it better to have more instruction cache (which is regular and higher density perhaps) and fill it with lots of simpler more RISCY instructions that can do what the CISCY instructions do anyway.

      If you want to take an extreme case - consider a CPU with an 'escape code' which allowed you to essentially place it in a mode where all the internal functional unit busses were accessible from horizontal microcode.

      The user/compiler could then craft sequences that performed the exact operations they needed without being constrained by the ISA that was exposed by the CPU designers. In this case the bandwidth required for each instruction may grow - I have no idea how many bits you would need to control the functional units of a current CPU. If that was the case then perhaps you could have a huffman encoded table in the processor that allowed the user crafted huffman encoded external instructions to be executed.

      As an example - perhaps my algorithm really wants to do three shifts left and an add by 5, or an AND with 0x07 followed by an ADD of another register.

      The idea being that the CPU is a commodity - and perhaps whilst the ISA designed seems nice for one purpose it's suboptimal for other users.

      Just an idea.

      Other ideas - Load the ISA of your favorite CPU.

      I don't expect this to be faster - actually I'd guess it may be considerably slower loading the ISA and having the configurability but it would be interesting to expose all of the hardware to the problem vs constrain the solution to using the ISA that's popular at the moment.

      --Tarp

    8. Re:CISC? by tepples · · Score: 1

      If you want to take an extreme case - consider a CPU with an 'escape code' which allowed you to essentially place it in a mode where all the internal functional unit busses were accessible from horizontal microcode.

      Shove it up your FPGA ;-)

      perhaps my algorithm really wants to do three shifts left and an add by 5

      Then perhaps it should be running on an ARM CPU, where a shift comes free with every ALU instruction.

    9. Re:CISC? by asuffield · · Score: 1

      RISC died a long time ago. Once you get past all the morons on slashdot, the point behind the RISC idea was simply this: We know more about writing compilers than we do about making chips. Design systems so that the compiler does more of the work.

      Since the 1980s we've learned a lot about making chips but not much about making compilers (there have been massive changes in the way chips are designed; compilers have made some minor improvements but basically work the same way as they always have). Nobody's made RISC chips for the mass market in years - all modern chips behave a little like RISC and a little like CISC but mostly like neither of them. The whole RISC/CISC thing is so obsolete that it's not funny any more - it is just not a relevant consideration in modern chip design. Nowdays we're concerned with things like register pressure, speculative out-of-order execution, cache coherency, and retaining the information from the original source code so that the CPU can behave more intelligently with it.

    10. Re:CISC? by tarpitcod · · Score: 1

      Geeze seems like these things need to be programmed in some declarative or functional language vs C then. Programming them in C with multiple threads and getting the timing right must be a god-awful mess.

      I do wonder - optimizing and understanding the timing must be much easier (deterministic) because they aren't out-of-order so when you screw up you readily see the results and it isn't masked by your OOO logic.

      Thank god as far as consoles it isn't a monoculture. - It will be great to see how it all turns out.

      --Tarp

    11. Re:CISC? by tarpitcod · · Score: 1

      I still wish I owned an Archimedes ;-)

    12. Re:CISC? by tepples · · Score: 1

      I still wish I owned an Archimedes ;-)

      Even without an Archimedes, you can get your ARM kicks in Game Boy Advance homebrew programming.

  31. Check out William Kahan at UC-Berkeley. by mosel-saar-ruwer · · Score: 3, Informative

    What benefit does increasing the precision of floats to 128bits bring? 64bits are more than enough for 99.9999% and the remaining cases can be handled in sw emulation. You can still not solve (without massive growth of the error terms) an equation system described by a Hilbert-matrix using Gaussean-elimination no matter how many bits you make the mantissa.

    Check out some of Professor Kahan's shiznat at UC-Berkeley:

    http://www.cs.berkeley.edu/~wkahan/
    In particular, look at the pictures of "Borda's Mouthpiece" [page 13] or "Joukowski's Aerofoil" [page 14] in the following PDF document:
    How Java's Floating-Point Hurts Everyone Everywhere
    http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
    WARNING: PDF DOCUMENT
    As I understand it, the "wrong" pictures are computed using Java's strict 64-bit requirement; the "right" pictures are computed by embedding the 64-bit calculation within Intel/AMD 80-bit extended doubles, performing the calculations in 80-bits worth of hardware, and then rounding back down to 64-bits to present the final answer.

    MORAL OF THE STORY: Precision matters. You can never have enough of it.

    1. Re:Check out William Kahan at UC-Berkeley. by honkycat · · Score: 1

      Interesting (although unforgivably badly formatted) document.

      The two plots you point out aren't really examples of precision errors. Rather, they are errors brought about by not tracking the distinction between "positive 0" and "negative 0." You'll have this problem to some degree no matter how many bits of precision you've got if you don't track the sign of your numbers that round to 0.

    2. Re:Check out William Kahan at UC-Berkeley. by brunes69 · · Score: 1

      Check out some of Professor Kahan's shiznat at UC-Berkeley...

      As I understand it, the "wrong" pictures are computed using Java's strict 64-bit requirement; the "right" pictures are computed by embedding the 64-bit calculation within Intel/AMD 80-bit extended doubles

      I can see the people responsible for the Java engine now...

      KKKAAAAAAAAAAAAHHHHAAAAAANNNNNN!!!!!!!

    3. Re:Check out William Kahan at UC-Berkeley. by Have+Blue · · Score: 1

      If precision is more important than performance you shouldn't be using fixed length types in the first place. Use a bignum representation whose precision is limited only by RAM.

    4. Re:Check out William Kahan at UC-Berkeley. by espressojim · · Score: 1

      The problem is that the doc is 8 years old, back when poor old java was maybe hitting it's 3rd birthday. It's come a long way since then, and the VM has changed dramatically. Hell, was BigDecimal (etc) in the API back in '98?

    5. Re:Check out William Kahan at UC-Berkeley. by Rufus211 · · Score: 1

      Did you even read the document? "But when -0 is mishandled..." The paper is entirely about java's handling of special-case conditions like -0, it has nothing to do with precision.

    6. Re:Check out William Kahan at UC-Berkeley. by greg_barton · · Score: 3, Informative

      How Java's Floating-Point Hurts Everyone Everywhere

      Gods.

      This is eight years old, (1998) and has been fixed for five years.

      FIVE YEARS. Join the 21st century, for god's sake.

      java.lang.StrictMath

      How long will people repeat this, even though it's been fixed for five years, in java 1.3? The latest beta VM is 1.6...

    7. Re:Check out William Kahan at UC-Berkeley. by dr_labrat · · Score: 1

      Meh, but it has a tilde in the URl... can't be up to much.... Surely if it was a valid piece of research it would end in ".tv" at least..

      --
      The secret of success is honesty and fair dealing. If you can fake those, you've got it made. (Marx)
    8. Re:Check out William Kahan at UC-Berkeley. by dr_labrat · · Score: 1

      well, I mod this 0 (meh)

      --
      The secret of success is honesty and fair dealing. If you can fake those, you've got it made. (Marx)
    9. Re:Check out William Kahan at UC-Berkeley. by Anonymous Coward · · Score: 2, Insightful

      The age of the paper doesn't matter. The OP was pointing out what a huge difference just 16 bits of precision makes. The fact that you no longer have to deal with this problem in Java doesn't invalidate the point he was trying to make.

      It's like somebody asking why the move from eight bit colour to sixteen bit, and me linking to a 16 bit image versus an 8 bit rendition of that same image. Sure, it isn't all that relevant nowadays, but it still helps to explain the problem.

    10. Re:Check out William Kahan at UC-Berkeley. by woolio · · Score: 1

      MORAL OF THE STORY: Precision matters. You can never have enough of it.

      Well, in embedded applications, precision is an enemy... Do you think your cell phone is performing all the calculations that do the modulation, filtering, and encoding of the transmitted/received signals in **floating point**?

      Nope. Embedded communications/image devices often do everthing using integer arithmetic (fixed point). An I think many signal processing applications *can* be implemented with 16bit integers! [but it ain't easy]

      I believe even some 3D graphics applications are implemented in fixed-point arithmetic for faster hardware/software performance.

      I suspect the physics applications mentioned in the slides could have been implemented in a more numerically-robust manner. (My impression is that physists rarely care enough (or have enough time) about programming to do it well, which still works out better than trying to get CS people to do physics stuff)

    11. Re:Check out William Kahan at UC-Berkeley. by hackerjoe · · Score: 1

      Except that OP was wrong. The problem in the diagrams was not lack of precision but instead a broken complex type.

    12. Re:Check out William Kahan at UC-Berkeley. by rodac · · Score: 1

      The OP was wrong. The paper linked had nothing to do with precision at all but an old broken implementation.

      This is exactly the problem. The people that shout the loudest for enhanced precision is the crowd that have no idea about numerical analysis or how floating point work and for those, enhancing the precision is not a solution, it might HIDE the real problem (people with no clue about numerical analysis writing numerical analysis code) but it will not help them at all.

    13. Re:Check out William Kahan at UC-Berkeley. by GreatBunzinni · · Score: 1

      It seems you've entirely missed the point.

      The GP post served to prove (and it was very good at it) that precision really matters. The GP post wasn't a jab at java. It was a demonstration that FP precision is really important, whatever the machine is.

      Next time just take a few secons before writing a post. Your blind java fan boy attitude got in the way with this one and it wasn't pretty.

      --
      Slashdot, fix your code or at least hire someone who is competent at it to do it for you.
  32. vcl v2 by Space+cowboy · · Score: 1

    On the PS2, there are two vector units (vu0 and vu1), which are basically where all the grunt work is done - the mips chip is there for housekeeping and non-time-critical code. Each VU has 2 code-paths (the instruction word is 64-bit, and there are two 32-bit instructions in each word). There are limitations on what you can do in each of the two words simultaneously. Sony have a GUI tool (in their professional kit) which allows the programmer to write essentially sequential code, and have it take full advantage of the vector units. According to Sony, it performs as well as a skilled programmer.

    For the linux kit, they only released vcl (a commandline version). It's a bit like a compiler-stage. It takes sequential assembly language for a single VU and re-orders code, inserts wait-states etc. Finally producing another assembly output which is optimised for the dual-issue nature of a VU.

    It strikes me that optimising for constraints over 2 code paths in a single unit isn't too far a stretch from optimising for constraints over 8 code paths in 8 units. The differences are mainly to do with locality of reference. On a VU it was up to the programmer to DMA data into scratch-space RAM, and set flags as semaphores on operation. There's no real reason why a computer program can't do that - a basic approach would be to do it on a function-by-function approach, or use #pragma constraints in the code. There's no need to have the all-singing, all-dancing version of the optimiser as version 1...

    Simon.

    --
    Physicists get Hadrons!
    1. Re:vcl v2 by CableModemSniper · · Score: 1

      It strikes me that optimising for constraints over 2 code paths in a single unit isn't too far a stretch from optimising for constraints over 8 code paths in 8 units.

      I think you missed the word hetereogenous in the article. It's not 8 identical units, its 8 different units.

      --
      Why not fork?
    2. Re:vcl v2 by Anonymous Coward · · Score: 0

      It's eight identical ones and one different unit. Is Cell new to you or something?

    3. Re:vcl v2 by CreateWindowEx · · Score: 1

      I think the programmer who wrote VCL based it off of some school project he had written. I've gotten the impression that it was sort of a black box that nobody at SCEE really understands... it is very handy, though, and you can even use it as a compiler back end for VU code (e.g., emit "naive" sequential assembly and have VCL fold redundant moves and do all the fancy-pants rescheduling and loop unrolling.) I am pretty skeptical that any sort of auto-parallelizing compiler will be used for PS3 games... although given how infrequently VU0 was fully utilized by PS2 games, I can see how they would like to find a way to get "lazy" programmers to use the SPEs...

    4. Re:vcl v2 by Space+cowboy · · Score: 1

      Um, the cell has 8 SPE's (all identical) and 1 PPC core. This is the same (although on a larger scale) as the PS2 (2 VU's, and 1 MIPS chip)

      Simon.

      --
      Physicists get Hadrons!
  33. Wasn't this the same mistake Sega made? by Lead+Butthead · · Score: 1

    I recall a common complaint by development houses about Sega consoles were that they were very difficult to code for because of hardware complexity. Isn't Sony now making the very same mistake that doomed Sega's console business? Speaking of which, is XB360 easier to code for than PS3?

    --
    ELOI, ELOI, LAMA SABACHTHANI!?
    1. Re:Wasn't this the same mistake Sega made? by MobileTatsu-NJG · · Score: 2, Interesting

      "I recall a common complaint by development houses about Sega consoles were that they were very difficult to code for because of hardware complexity. Isn't Sony now making the very same mistake that doomed Sega's console business?"

      Sega didn't make a single mistake, they made a LOT of them. I imagine you're thinking of the Saturn. It was supposed to be a SNES killer. In other words, all the fancy technology it had was meant to throw sprites on the screen. Then Sony showed up with it's fancy ass 3D architecture, and Sega said oops. So they band-aided some hardware in there to perform 3D functions. Unfortunately, this added another processor to the mix. The result? It was a bitch to program for, and it never really reached the performance levels of the PS. The result? Saturn games looked inferior to PS games. However, in the 2D fighter realm, the Saturn did quite well. As I recall, the Saturn was actually fairly successful in Japan for this.

      The Genesis was pretty easy to program for, at least compared to the SNES. The SNES had a weaker CPU, but it had extra hardware to beef up its graphics. In the end, the SNES won, but not without a couple of years of Genesis superiority. I remember lots of people bitching about the SNES slowing down when it came to a lot of sprites on the screen. This complaint died when Donkey Kong Country hit the scene.

      The Dreamcast... well I don't know as much about it. As I understand it, it wasn't too hard to program for. It even had some great hardware for throwing textures on the screen. This gave the DC an edge against the first generation of PS2 games despite having considerably weaker specs.

      The Saturn definitely hurt Sega. One could attribute this to the difficulty of programming for the system, and they'd likely be correct. PS ports to the Saturn often came many months after the original release, and they simply didn't do as well graphically. Sega had also flooded the market with hardware. Between the Genesis, the Sega CD, the 32X, and the Saturn, the market was pretty confused. Sega wasn't focused where they should have been and it came back and bit them in the keyster.

      Sega was in pretty sad shape financially when the DC was released. I vaguely recall that the president of Sega at the time had given up most of his shares of stock to keep the company afloat. (I want to say it was around 100 million dollars roughly, but I don't recall the specifics. I do remember thinking "wow, that's one dedicated dude.") In the end, though, Sega needed several hundred million dollars in order to get 10 million DCs out there in order to really start raking in money. But they simply didn't have the assets to do it. Kerplunk, the Dreamcast died, and Sega focused on software.

      With all that said, I'm sure a number of people will chime in with their own contribuatory reasons for Sega's demise. They wouldn't necessarily be wrong, either. It took a number of things to take Sega down, not one key one.

      "Speaking of which, is XB360 easier to code for than PS3?"

      I read an interview with Carmack not too long ago, and his answer was basically 'yes'. He did NOT go one to say that the difference would be a huge huge factor or not, though. Frankly, I have difficulty imagining it making all that big of difference, at least from a financial point of view. As these machines get more powerful, the weight of development shifts more towards the artists than the actual programmers. That is just an opinion, though. I'm a 3D artist by trade. Maybe my view is biased. But I know how much it costs to keep me seated at my desk. I know about how the work piles up by orders of magnitude as projects get more ambitious. And I have a pretty good sense of how artistry in video games has evolved over the last decade. Compare Super Mario 64 to Resident Evil 4 and you'll see what I mean.

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    2. Re:Wasn't this the same mistake Sega made? by CarpetShark · · Score: 3, Interesting

      The Cell doesn't seem to be that complex. It's a powerful processor, with multiple elements and associated timing issues that you have to be aware of, but that's nothing like the Gamecube or similar, which had all these weird modes and issues that I can't even recall now, probably because my brain blocked it out ;) It'll be a challenge for people who don't know parallel programming, and it might frustrate some who imagine that a cpu with 8 SPEs should act like 8 entirely independent machines, each with its own SPE. But, I think games developers these days will take it as par for the course. There seems to be a trend now that only the biggest and best games companies actually develop game engines (ie, right low-level optimised code), while the other companies just rent the technology and develop levels and artwork and scripting based on that engine. So, the big question is how many of the engine developers will get on board early and if they'll be sufficiently inspired and up to the task. I think they'll find a way :)

    3. Re:Wasn't this the same mistake Sega made? by ClamIAm · · Score: 1

      There were some compounding issues here as well. Sega released the Sega CD, the 32X, and the Saturn all within a pretty short time period. This confused a lot of people. They also thought they could get a jump on the competition by releasing the Saturn as soon as possible. This meant the hardware wasn't as refined as it could have been, and it drove the price up as well.

  34. I'm totally having deja vu. by Inoshiro · · Score: 2, Interesting

    "All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan."

    Yea, but what's the full power of a system? Prettier graphics?

    The "full power" of the PS1 seemed to be that its games became marginally less ugly as time went on, although FF7 was very well done since it didn't use textured polygons for most of it (the shading methods were much sexier). When I think about FF9, I don't like it more because it uses the PS1 at a fuller power level than FF7, I like it better because the story is cuter.

    I like PGR2 better than PGR3 because PGR2 has cars I know and love from Initial D and my own experience, whereas PGR3 has super cars I've never driven or seen before.

    I don't think Rez taxes the PS2 more than Wild Arms 3, but I like it better than Wild Arms 3. I also like most of the iterations of DDR, and they're not taxing in the slightest.

    The full power of a system is not its graphics capability or how easy it is to control or its controller or its games -- it's the entire package. Does the PS3 have a good package? The Xbox 360 sure doesn't -- the controller power-up button is nice, but there is nothing new or interesting; it's a rehash. The PS3 is a rehash too.

    The Sega Saturn was a rehash of the 8-bit and 16-bit 2D eras. It died. The PS3 and Xbox 360 are rehashes of the 64-bit and 128-bit 3D gaming eras.

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  35. YES! Re:Sadly, not a lotta FPU hardware. by perler · · Score: 1
    Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?

    Yes, in fact you are a really tiny proportion of the world's population!

  36. Simple parallelism? by calambrac · · Score: 1
    I haven't done a lot of multi-threaded programming, so maybe this is actually commonly available, but I think a nice language-level parallelism feature would be something that could handle a really basic "for each" type loop:
    serialCode();

    pfor(element in collection) {
    element.parallelCode();
    }

    serialCode();
    without having to worry about manually setting up the threads, etc - if there are multiple resources available, they get used, if not, then it happens in serial. Is there anything like this out now? btw... how do you get proper indentation using this ecode tag?
    1. Re:Simple parallelism? by DichotomicAnalogist · · Score: 1

      Several variants of Fortran have something like this. Other concurrent languages... well, a number of other concurrent languages typically don't need or use for, but most likely offer concurrent/distributed map or fold, which are the common "clean" approximation of for.

      --
      This troll is over. You can now resume a normal activity.
    2. Re:Simple parallelism? by Anonymous Coward · · Score: 0

      Sounds kinda like what OpenMP can do; only I'm not sure if it will adjust to serial execution of the parallel section if not enough processors are available.

    3. Re:Simple parallelism? by NathanBFH · · Score: 1
    4. Re:Simple parallelism? by TheRaven64 · · Score: 1

      FORTRAN has a construct somewhat like this. In FORTRAN you can operate on vectors as if they were scalars, which makes it much easier to generate vector unit code from FORTRAN than from (for example) C. This doesn't help with the problem of generating code for SPUs, however. A language that would make it easy to generate SPU code would have to have message passing built in, similar to that found in Erlang, but designed for larger messages and vector features similar to those found in shader languages.

      --
      I am TheRaven on Soylent News
    5. Re:Simple parallelism? by BillKaos · · Score: 1

      That's not easyly doable due to side effects happening. Once you need to check for that, your compiler complexity is almost infite.

      Functional programming has been trying to address this with varying dregrees of success.

    6. Re:Simple parallelism? by calambrac · · Score: 1

      You can screw up a program with side effects even if you're just doing regular serial coding. How is this more cmplex than that? This isn't meant to be a full-blown replacement for threading - just a simple language construct to introduce a very simple form of parallelism...

    7. Re:Simple parallelism? by calambrac · · Score: 1

      Not exactly what I had in mind, but it looks like an interesting read, thanks!

    8. Re:Simple parallelism? by Anonymous Coward · · Score: 0

      You want OpenMP.

    9. Re:Simple parallelism? by BillKaos · · Score: 1
      The danger in this construction is the implicit paralelism. So with one compiler version your program may happen to run fine, while when another compiler implements the construct in a subtle different way, the program can show a bug.

      You know side effects are time depending the most.

    10. Re:Simple parallelism? by Anonymous Coward · · Score: 0

      One fairly easy way to do this in a multi-threaded oo way is to make all the things you want to do be objects (I think this is called the command pattern), and then shove them into a queue. And then you have a bunch of threads/processors pulling items off the queue (in Java the method to pull the items off the queue would be synchronized so you don't have two (or more) threads pulling the same item off at the same time) and processing them.

      Now, for the for loop over an array type of deal that you are looking at there, the basic problem is who has the collection/array? Do you make copies of it and give the copies to each of the processors? (NB: probably not, especially if the items in the array are complex objects...) But then if you don't copy it, is it shared in a common cache? (In which case do you kiss your sandbox/security model goodbye?). So it isn't copied around, and it isn't in a common cache... so what you need to do is pass a reference to the item... how good is your interprocessor communication?

      If your parallel code boils down to:
      item.setResult(item.x + item.y + item.z);

      then that is four lookups on the item held by the other processor...

      The thing is that what it boils down to is that what you want to do makes sense if you think about your code ordering the processor around.
      Your code says okay, you do this, then you other processor go off and do the next one (etc).

      But code doesn't work like this code is passive, not active. Processors pull instructions, not have them pushed at them.

  37. Time to let C die ? by DichotomicAnalogist · · Score: 2, Interesting
    (Warning : troll venting off.)
    Let me summarize
    1. take one of the most unsafe, slowest-to-compile, pitfall-ish, unspecified languages in existence (ok, I might be exagerating on the "unspecified" part)
    2. add even more #pragmas and other half-specified annotations which are going to change the result of a program near invisibly
    3. don't provide a debugger
    4. require even more interactions between the programmer and the profiler, just to understand what's going on with his code
    5. add unguaranteed and slow static analysis
    6. ...
    7. lots of money ?
    Am I the only one (with Unreal's Tim Sweeney) who thinks that now might be the right time to let C die, or at least return to its assembly-language niche ? I mean, C is a language based on technologies of the 50s 60s (yes, I know, the language itself only came around in the late 60s), and it shows. Since then, the world has seen
    • Lisp, Scheme, Dylan, ... -- maximize code reuse and programmer's ability to customize the language, automatic garbage-collection
    • ML, Ocaml, Haskell, ... -- remove all hidden dependencies, give more power to the compiler, make code easier to maintain, check statically for errors
    • Java, C#, VB, Objective-C ... -- remove pitfalls, make programming easier to understand, include a little bit of everything
    • Python, Ruby, JavaScript -- maximize programming speed, make code readable, make writing prototypes a breeze ...
    • Erlang, JoCaml, Mozart, Acute -- write distributed code (almost) automatically, without hidden dependencies, with code migration
    • Fortress -- high-performance low-level computing, with distribution
    • SQL, K, Q -- restrict the field of application, remove most of the errors in existence
    • and probably plenty of others I can't think of at the moment.

    And what are C and C++ programmers stuck with ?
    • a macro system which was already obsolete when it was invented
    • slow compilers
    • no modules or any reasonable manner of modularizing code
    • neither static guarantees nor dynamic introspection
    • no static introspection
    • an unsafe language in which very little can be checked automatically
    • mostly-untyped programming (not to be confused with dynamically-typed programming)
    • about a thousand different incompatible manners of doing just about everything, starting with character strings
    • manual garbage-collection (yes, I know about the Boehm garbage-collector -- but I also know about it's limits, such as threads)
    • a false sense of safety with respect to portability
    • extreme verbosity of programs.

    So, now, we hear that IBM is trying to maintain C alive, under perfusion. IBM, please stop. Let granddaddy rest in peace. He had his time of glory, but now, he deserves that rest.

    Oh, and just for the record. I program in C/C++ quite often as an open-source developer and my field is distributed computing. But I try to keep these subjects as far away from each other as I can.
    (well, venting off feels good)

    --
    This troll is over. You can now resume a normal activity.
    1. Re:Time to let C die ? by Anonymous Coward · · Score: 0

      1. Port linux to ruby!
      2. Watch your C compile times increase 1000-fold.
      3. Blame C.
      3. ...
      4. PROFIT!

      Seriously, if C bothers you that much, don't use it.

    2. Re:Time to let C die ? by Bazzalisk · · Score: 3, Interesting
      C lacks a lot of features of more modern languages - but I think you'd be hard-pressed to find a modern autogarbage-collecting dynamicly typed modularise language which can handle low-level programming anything like as well as C.

      Certainly if I'm writing a pleasant little modern desktop application I'm going to write in Objective C or C# - would seem a little silly not to ... but for writing a compiler, a network stack, or gods forbid a kernel I don't know of anything that works even close to as well as C. C still has a niche, can't realy change that.

      --
      James P. Barrett
    3. Re:Time to let C die ? by DichotomicAnalogist · · Score: 1

      Seriously, if C bothers you that much, don't use it.
      Seriously ? If I could avoid using C, I would. But there is a perverse logic by which every "serious" application or library seems to be doomed to be written in C or, in the best case, C++, regardless of the application. Don't take me wrong. C or C++ is probably the best language for low-level, embedded and/or kernel development. But not for writing, say, Mozilla (according to my measurements nearly 2/3 of the native code of Mozilla is actually spent mapping a different paradigm on top of C++), or gcc, or evolution. Frankly, there are better languages for this.
      Oh, and I'm not planning to use Ruby for the kernel or for Gcc, mind you. I'm not a specialist of Ruby, but the language looks quite appropriate for user interfaces or dynamic websites. If I had to choose a more modern language for the kernel, I would probably use either D, or OCaml, or a language with explicit continuations or explicit concurrency (Mozart-style). Some use C# and I'm willing to bet that their development will be significantly faster and that the resulting system is not going to be significantly slower.
      I tend to assume that, after a few years of life and real-life profiling and optimization, a properly-designed system implemented in a high-level language would end up rather faster than a C kernel: while many low-level optimizations would have to be removed, a number of high-level optimizations can be introduced once you're sure of your code. Removing memory protection, for instance, is quite possible, if the system only executes programs without pointer arithmetics. In turn, this can be checked when the (possibly typed) binary is loaded, rather than at every single memory access. Let's not quabble about the fact that this check is hard-wired in the CPU. It's still a waste of time and energy somewhere and it's not the only one.

      --
      This troll is over. You can now resume a normal activity.
    4. Re:Time to let C die ? by DichotomicAnalogist · · Score: 2, Insightful

      C still has a niche, can't realy change that.
      C definitely has a niche. I, for one, vote to let C return to it.
      Large parts of the kernel, if not the whole kernel, fall into that niche. I'm less convinced about the network stack. Compilers fall quite far away from it. Graph-based or continuous path-finding, artificial intelligence, concurrent programming, interpreters, webservers, webbrowsers, VoIP applications... all that is getting further and further away from that niche.
      But, please, whatever you do, everyone, stop considering C as a general-purpose language. It has been. It is not anymore. It wastes too many precious hours of everyone's life. Which could be better spent trolling on /.

      --
      This troll is over. You can now resume a normal activity.
    5. Re:Time to let C die ? by be-fan · · Score: 1

      The best language in existence for writing a compiler is Lisp. No joke. Ever look at the GCC source code? They go to enormous lengths to make C look like Lisp. Between dynamic typing, generic dispatch, macros, and garbage collection, Lisp has all the features necessary to make a good language for writing compilers.

      I'll give you C for the kernel, but for a network stack, I'm on the fence. Lisp's macros make it very easy to build up primitives for handling structured bytestreams, which abound in network programming. Moreover, macros aren't really an expensive language feature (they are expanded at compile time usually), so performance shouldn't be too far off either.

      --
      A deep unwavering belief is a sure sign you're missing something...
    6. Re:Time to let C die ? by Anonymous Coward · · Score: 0

      For crying out loud... Mod this person down. Anyone who says C is better for writing a compiler then a higher level language is just flat out _wrong_.

    7. Re:Time to let C die ? by Anonymous Coward · · Score: 0

      Unlike in the West, there're many Japanese console game developers whom have been using C/ASM to develop games for a long time now mostly because of performance issues (older consoles has very limited memory in them). And I don't think they'll switch to higher language anytime soon, after all - why do things differently now when they can stick to their old methods of doing and they just simply works(TM)? But of course, they'll need to relearn the opcodes, etc. but that's the life of console game developers - to learn throughly and make full use of the hardware.

    8. Re:Time to let C die ? by mukund · · Score: 1

      JNode is one example of an entire OS which is written in pure Java but for a nano-kernel written in assembly (similar to the assembly code in Linux kernels which is required for setting up the processor before switching to C code and also for some basic libraries such as for outputting to ports). JNode includes a TCP/IP stack and device drivers.

      Eclipse's Java compiler and Sun's J2SE compiler already are written in Java.

      So it's not impossible. It's even better in many ways to write compilers or large device drivers such as SCSI subsystems and IP based protocols in Java, especially where there's complexity and large amounts of code which can be helped by object orientation.

      C certainly has its uses where you want a high level system programming language for small to medium sized projects. But it is certainly over-used in many applications.

      --
      Banu
    9. Re:Time to let C die ? by Anonymous Coward · · Score: 0

      The best language to write a compiler in is ML.

    10. Re:Time to let C die ? by ClamIAm · · Score: 1
      but I think you'd be hard-pressed to find a modern autogarbage-collecting dynamicly typed modularise language which can handle low-level programming anything like as well as C.

      Does D count?

    11. Re:Time to let C die ? by TheRaven64 · · Score: 1

      What is 'low level programming?' C has a very close mapping to PDP-11 semantics. Modern CPUs are not like PDP-11s. A huge amount of cost and effort has been invested in developing compilers that turn code that matches up to PDP-11 semantics and turning them into modern semantics. This allows 'low level' developers to to pretend that they are still working on 1970's technology. There are languages that match up with with modern CPU semantics and are good for low-level programming, but C is not one of them.

      --
      I am TheRaven on Soylent News
    12. Re:Time to let C die ? by Anonymous Coward · · Score: 0

      About compiler development, there are language much more suited for the task than C. Have a look at the ML family (ML - as in Meta-Language - was designed to handle language parsing and analysis). *Caml for instance are very well suited for this job. these take care of memory management and all other petty things that only matter when you are really doing low level coding (kernel or system libraries).

    13. Re:Time to let C die ? by Anonymous Coward · · Score: 0

      Don't knock the older languages. Lisp first worked in 1962. C was the early '70s. In other words, c and disco were taking off in popularity *at the same time*. Coincidence? I think not.

    14. Re:Time to let C die ? by DichotomicAnalogist · · Score: 1

      Interesting. Do you have any documentation on this ?

      --
      This troll is over. You can now resume a normal activity.
    15. Re:Time to let C die ? by Anonymous Coward · · Score: 0

      Disclaimer: I#ve never had to program a PDP-11 myself :)
      If you read http://en.wikipedia.org/wiki/PDP-11 , you will see lots of similarities to how C works. E.g., the autoincrement/autodecrement and the [] operator of C matches one of the PDP-11's addressing modes exactly.

  38. Threads and vectors by tepples · · Score: 1

    What I'd really like to see is someone who takes all the potential for reconfiguration and parallelism and doesn't hide it away but makes it available.

    It's called threads on the one hand and vector data types on the other. Once you have learned how to use those, you're a tier II developer (as defined in The Article) working with a PowerPC based computer connected through low-latency pipes to seven DSPs, and you can just spawn tasks in threads that die when the tasks finish. Trouble is that a lot of development firms that can only afford the lower salaries of tier III and IV programmers don't want to take the time to adapt a 90 percent finished single-threaded PC game to a highly threaded, vectorized environment.

  39. Re:special compilers, expert programmer = DOA prod by Frumious+Wombat · · Score: 1

    Probably not true. Consider the yelling and screaming that went on in the late 90s as code had to become 'thread-safe'. Now that fight is mostly over, so you're already on the right track. Next step is take a page from the technical computing market, and generalize 'thread' to 'non-local access', i.e. your thread may be on another proc, with another cache or memory to access. This gets you to dual core, or openMP type systems. (SMP). One more step, and you're at NUMA, where that other core could be another entire computer, with a longer latency. Usable techniques are known (after all, somebody is using BlueGene, and there are codes such as NAMD which run segmented across hundreds of machines), so compilers have to be taught how to do as much of this automatically as possible, while programmers will have to be up to speed on multi-threaded, heirarchical memory access patterns.

    The key is whether enough processors can be sold to make this investment of time worthwhile. Advances in Windows (quit yelling) have already driven some of those changes, as can be seen if you compare the behaviour of current programs versus those aimed for 3.1/95, but you haven't noticed it much because those changes are incremental. More tasks run asynchronously, dialogues don't lock the entire window manager while waiting for your response, systems wait until idle periods to do heavy I/O. The proposed Cell compiler is just one step beyond multi-threaded, so the transition will, in the end, be less fuss than is currently anticipated.

    Dig into the technical docs for Intel's current Fortran versus its ancestral DEC variants, and you'll see compilers are already doing an amazing amount of work in terms of code reorganization, execution order prediction, etc., that their ancestors didn't. The language the programmer sees is almost identical to the one they saw 20 years ago, and only comes with a few more 'gotchas' to avoid. This has to happen, as the Market has decided that it's cheaper to add cores than design faster ones, so this sort of distributed programming is going to become the norm. You'll look back at simple, imperitive code some day soon and say, "How quaint". From the programmer's view, all that their new, miraculous, octapiler, has to do is take OpenMP statements within a current language, and they can continue working much as they did before.

    On that note, it's somewhat heartwarming to envision hordes of recent CS grads, soaked in the latest OO paradigms, being told, "there's great money to be made programming for the Cell, but you're going to do it in High-Performance Fortran."

    --
    the more accurate the calculations became, the more the concepts tended to vanish into thin air. R. S. Mulliken
  40. Doesn't work that way... by CarpetShark · · Score: 1

    You engineer programs in a sense similar to cars, yes. But, you interact with your tools on a much higher level than putting in a pedal and a brake pad. I suspect you do in actual car design too: it wouldn't be a huge step to be able to model a car in a 3D app and ask the computer how that shape of car will perform in terms of aerodynamics, gears, engine power and therefore miles per gallon or acceleration etc.

    It's similar with programming. Instead of saying, this is a car, and it goes in that world, and we'll see what happens, you also design the world, and the way they interact, and you do it all at as high a level as you can. So, the compiler can see what you're doing at a fairly high level, and ideally, can understand and optimise that. Similarly, if you're doing programming multiple processors/cores with threads, then you use a compiler that understands threads. You tell it when threads can run at full speed, and when they need to stop and catch up with each other. Then, the compiler can hopefully examine what needs to be done and when, and what processors are available to do it on, and optimise accordingly. This is nothing new; lots of compilers/APIs do this sort of thing now in various ways.

    What I want to know is... will this just be limited to a single 8-workhorse cell chip, as the name "Octopiler" suggests, or will it use the promised power of Cells, so that a program will spread its workload across all the Cell devices in your home if you have more than one? Somehow I doubt they're there yet.

  41. OT - Sig comment by bohemian72 · · Score: 1

    You neglected "They're - They are."

    --
    The greatest thing you'll ever learn is just to love and be loved in return.
    1. Re:OT - Sig comment by Mike+Savior · · Score: 1

      I tried to fit some more, but the sig cut off.

      --
      space is pretty cool.
  42. Re:special compilers, expert programmer = DOA prod by theJML · · Score: 3, Insightful

    As a programmer, there's only so much that can be done in software. Sure you can parallize things, and you can come up with newer/faster algorthms, but if we didn't get dual proc systems, that would have been pointless. So with parallel procs, we get better parallel code. Hardware advances will create software advances, and new algorthms will direct hardware futures. This is the way the world works, and I think it's worked out fairly well so far. Lets see what the Cell and processors after it can do!

    --
    -=JML=-
  43. Compiler isn't necessarily serial by CarpetShark · · Score: 1

    The compiler may have pragma instructions or linker bindings for parallelism, which would be easily taken advantage of by higher-level libraries, even if end-users don't know how to use it (though, imho, they can learn easily enough).

  44. Will you need a modchip to make full use of it? by tepples · · Score: 1

    This is possibly one of the best things they could have done to help the cell. By doing this, you make open source developers happy and more inclined to port over their applications.

    It's too bad that the only popular commercial implementation of the Cell processor for several years is going to be in a machine with a lockout chip, a technical measure that prohibits end users from compiling Free software on the machine. Otherwise, game developers could develop a Free engine subsidized by keeping game assets (maps, models, textures, audio, scripts) proprietary, and then sell the resulting game without having to pay Sony per title and per copy. Such an open-source business model would break the console business model, which involves taking a loss on R&D and marketing, breaking roughly even on manufacturing the console, and making up the difference in marketing.

    1. Re:Will you need a modchip to make full use of it? by heinousjay · · Score: 1

      I'm reminded of Seinfeld:

      Cosmo Kramer: They're redoing the Cloud Club.
      Jerry: Oh, the restaurant on top of the Chrysler building? That's a good idea.
      Cosmo Kramer: Of course it is, it's my idea.
      Jerry: Which part, renovating the restaurant you don't own part, or spending the 200 million you don't have part?

      --
      Slashdot - where whining about luck is the new way to make the world you want.
  45. And make sure that by Travoltus · · Score: 2

    these Octopiler coders are doing their work for the love of coding. If they want a salary for this then they're not worth their weight in salt.

    [/kfg mode off]

    --
    --- Grow a pair, liberals... stop letting the Republicans bully you!
  46. Doesn't leatherman sell them? by FeriteCore · · Score: 1

    An octoplier, that is. I think they've had them for years.

  47. nothing "radical" about it by idlake · · Score: 1

    This radical of a change in architecture

    There's nothing "radical" about it--it's just a bunch of CPUs on a chip. It's about the least radical way in which you can put a bunch of CPUs on a chip, beyond multicore.

  48. Re:special compilers, expert programmer = DOA prod by TheRaven64 · · Score: 2, Insightful
    If a CPU needs a special compiler in order to give good performance, it's basically dead

    Pretty much all modern CPUs need special compilers to give good performance. Unless you can keep track of the number of pipeline stages, the degree of superscalar architecture, etc. you will get sub-optimal code. The P4, for example, can have 140 instructions in-flight at once. Can you keep track of your code over a 140 instruction window and make sure there are no hazards? If not, then you're probably better off using a 'special' compiler.

    The days when a compiler could just turn each statement into a fixed instruction sequence are long gone.

    Maybe CISC wasn't such a bad idea after all--you may get less bang for the buck, but at least you get a predictable bang for the buck.

    No, actually, you don't. One of the key features of RISC was that instructions took the same time to execute. On a CISC architecture, instruction timings are far from constant. Some instructions (have you looked at the x86 instruction set? It even has string manipulation instructions) can take several times longer to execute than others, which makes generating code very difficult. For example, you might know that it takes n instructions for a load to complete if accessing from memory and m if accessing from cache. How many instructions is that? That's much, much easier to work out on RISC. To prevent pipeline stalls, you need to make sure that you have a minimum of m instructions (and ideally m) between your load and your first operation that depends on the that data. Try doing that with a fixed-timing instruction set (RISC), and then with a variable-timing instruction set (CISC), and see which is easier.

    --
    I am TheRaven on Soylent News
  49. No by News+for+nerds · · Score: 1

    Dreamcast is one of the easiest game consoles for programmers.

  50. You answered your own question by Anonymous Coward · · Score: 0

    There's lots of languages out there and it's probably not that difficult to invent new ones. But people just don't abandon their current languages the moment a new one comes along. So inventing a new language is probably not the best way to get a lot of applications written and ported to a new processor architecture.

  51. Octopiler? by d.corri · · Score: 1

    Sounds like the name of a battling robot that looks like an octopus and piles up its opponents.

  52. Simpler than that. by CarpetShark · · Score: 1

    Yes, there are things like that now, and much better in fact. Even the most basic of popular compilers will optimise code like that pretty well. But, the better way is much more suited to parallel code.

    Code is object oriented, so all you need to do is say something like PLAYER1 can do his own thing at his own speed, as long as he's not killing another PLAYER. If he is, then he has do slow down and let the other player sync up with him. Semaphores and similar locking techniques allow threads to wait on other threads when necessary, but to continue at their own pace when possible. So, if there are a thousand computer-controlled players, and only none of them need to be synced with something, they're free to hit the parallel processing SPEs as fast as they can. Only when they interact with something that isn't able to take the information right away will they need to slow down. The Cell isn't quite THAT parallel though, since the SPEs aren't completely independent (no one would expect them to be probably), so it's not going to get the performance you might expect from eight processors zipping through a thousand characters or a thousand effects or whatever, but when the timing and interdependence issues are taken into consideration, the results should be pretty impressive.

    Parallel-aware compilers are aware of threads, and have specific constructs to say things like this thread must wait, or it can continue as long as X hasn't happened, or whatever. There are nice high-level APIs for this in C++, python, etc. What IBM's compiler seems to do is take a little more of this on board without specifically being told to. Personally, I think that's a bit hyped, and that developers will still have to mark their thread synchronisation points. But, that's really not such a big deal. Debugging is ;)

  53. ps2 evolution by jakethecake · · Score: 1

    The PS2 had a MIPS 5900R as main, and as co two FPUs and two vector units. Isn't the cell just a continuance of that 'application specific' way of thinking. I think that with Blue Gene IBM finaly nailed the network part. That's why the Cell is called 'the Cell Broadband Engine'. Classical scaling killed of moores law a 3-4 years ago, and this and/or more cache on processors is the respons.

    - borked msg alert, swede behind the controls ;)

  54. Re:special compilers, expert programmer = DOA prod by Anonymous Coward · · Score: 0

    So, all Intel processors are dead then are they?

    Holy crap, that means half the processors in the world are a DOA product, whatever the hell that means. Maybe we should all use compilers that produce slower code and ignore special optimised compilers! Yeah that'll help the chip manufacturers ship units!

    Please think before you textually masturbate rubbish.

  55. java by Hakubi_Washu · · Score: 1

    If your "element" "implements Runnable", then "element.run()" in the foreach (Since Java 1.5 this is finally available) should work just as expected, but be careful, you can't be sure when those threads finish, unless you put the main thread on hold until they are. Should still speed your code up nicely, provided the VM used supports multiple processor cores (And this is the crux, but noone is keeping anyone from building one for Cell :-)

    1. Re:java by calambrac · · Score: 1

      Yeah, Java is what I typically program in, and that's the problem with Runnable. You have to explicit block the parent thread. Not a big deal, but it seems like extra complexity for something that should be so simple...

    2. Re:java by TheRaven64 · · Score: 1
      If Java's threading API is anything like POSIX, then it should have a join operation that lets you wait on other threads. Of course, there are two things that are obviously wrong with this method:
      1. You don't actually need to wait on the thread until the next time you access the elements it modified.
      2. The number of threads it is optimal to create is roughly the number of processing units at your disposal - something you don't actually know until runtime - which is why this kind of thing is much better handled by the VM and the compiler than the developer.
      --
      I am TheRaven on Soylent News
    3. Re:java by Hakubi_Washu · · Score: 1

      Well, the other option would have to explicitly unblock. I think that'd be more confusing and in case of the foreach should block the main thread after the first thread is started, killing all your speed improvement, because all threads can then only be processed sequentially (Since it cannot be known how many physical processors there are beforehand, you couldn't predict the code at all if you were to have the VM decide such issues based on available ressources, especially if you'd want every thread started and completed before the next). Threads are a little confusing the way they are (Though in my university their problems (deadlock, race conditions, etc.) are handled during "Operating Systems", so anyone with formal CS education should understand threads easily by inference), but I see no way of making them easier than Java already does (Well, maybe "implements Runnable" could be enough (Just like "implements Serialzable"), so run() isn't needed as a thread-main(), it should be possible to start any method. But then that would not allow you to call a method without starting a new thread, that has a little overhead, of course. *sigh* Point is: The way it is seems like a good solution for giving the programmer maximum flexibility while keeping stuff simple enough in most cases to be rather trivial)

    4. Re:java by calambrac · · Score: 1
      I think you're misunderstanding what I want. I do want all of the elements in the pfor loop to run in parallel, with the parent thread blocking until all the jobs in the pfor loop are completed. So, if the VM knew how many processing resources were available, it could allocate each element of the pfor loop to an available processing resource.

      The way I handle this situation now is I launch each parallel element in its own thread using a semaphore to control how many threads are available (I set the thread count with a commandline parameter or environment variable), and a counter to show how many jobs are outstanding. The parent thread monitors the value of the counter and resumes when that hits 0. It's usually not even worth doing...

      I don't intend this as a general model for parallelism or threading, it just seems like a quick and easy way to get really basic parallelism with virtually no programming effort.

    5. Re:java by Hakubi_Washu · · Score: 1

      That's an issue for the VM implementation, not for Java itself, I think. BTW that is exactly the way things are supposed to be handled by the VM, except that you have to block the main thread manually via join() to all threads started. When (and where!) a thread is actually executed is the job of the VM... I haven't done enough with threads to ever have encountered a problem with thread scheduling, especially not on multi-core machines, so it may be that VM don't follow expectation here in practice...

  56. Re:Am I ignorant or . . . tired? by JesseT · · Score: 1
    Sorry, it's a little too early for me to be replying to threads here. I was up all night! Let's try that again:

    Ever heard of profile-guided optimizing compilers? Many runs of a program, in "profiling" mode, that was generated from such a compiler, produces metrics about how the code is actually executed. Next, the the program is recompiled using this information, producing a much more optimized program.

    Many C/C++ compilers today support this feature. Including the latest GCC, Intel's C++ compiler, MSVC++ 8.0, and a bunch of others.

    I don't know if IBM's compiler for their Cell architecture supports profile guided optimizations, but if they ever want to take full advantage of the architecture, I forsee that they will build it into their compiler.

  57. still a million reasons by i+kan+reed · · Score: 1
    1. intensive memory management, while it seels like a pain, when dealing with fast exchances between various components of an architechture nothing is better or faster than memory mapping. The power of C/C++ is its greatest weakness. When you start treating things that really are numbers as abstract notions(like file pointers or functions or just about anything), you're really digging into your capacity to write efficient and platform specific code. One could make the argument that such platform depedent things should be part of libraries, but someone still has to write the libraries.
    2. program specific optimizations. you mentioned yourself automatic garbage collection, but you can easily not want to do time intensive collections during time critical parts of your code.
    3. the fact that anything really really abstract like you're talking about in games can probably already be done by licensing someone elses engine. The president of Epic games gave a presentation I attended recently and he mentioned that very point, their ps3 unreal engine has already been liscensed by many companies for making games with, and they haven't even finished it yet.


    the simple fact is that C/C++ is the standard for writing native code.
    1. Re:still a million reasons by dowobeha · · Score: 1
      the simple fact is that C/C++ is the standard for writing native code

      On many operating systems, this is true. But it doesn't have to be this way. Take a look at OS X. By using objective c as the primary language, programmers can get a well-designed pure oo language, and still have access to the low-level functionality that C provides for the few occasions when it is needed.

      --
      I am concerned about any program, any piece of hardware, any treaty, any law that treats me as a consumer, not a citizen
    2. Re:still a million reasons by ObsessiveMathsFreak · · Score: 1

      By using objective c as the primary language, programmers can get a well-designed pure oo language, and still have access to the low-level functionality that C provides for the few occasions when it is needed.

      Objective C, to be realistic, is not exactly that far removed form c/c++.

      Only one thing is going to deliver low level performance to a high level language, and that's Gods' Own Compiler, complete with cross platform portability, that is mathematically guaranteed to deliver the most, 100%, efficient machine code compliation of your code that can ever be achieved.

      As you can imagine, this won't be available on SourceForge anytime soon. Until then, just pay for the danm ease of use with a few more CPU cycles. It'll all be twice as fast in 18 months anyway.

      --
      May the Maths Be with you!
    3. Re:still a million reasons by Anonymous Coward · · Score: 0

      "It'll all be twice as fast in 18 months anyway." Um... no it won't. You've been paying attention to the trends in microprocessors right?

  58. imagine a boinc client? by way2trivial · · Score: 1

    I'd drop another 30-40$ for a cartridge that would process boinc workunits.. especially if it were so well optimized it really contributed to my total...

    --
    every day http://en.wikipedia.org/wiki/Special:Random
  59. This problem must be solved eventually by joshv · · Score: 2, Interesting

    The problems IBM programmers are having are emblematic of the problems that the PC industry is going to be facing in a few years. Multi-core is the future of PC performance. Increasing GHz and IPC of single processors has pretty much hit a wall. Creating Dual and multi-core CPUs is the best approach we have left for increasing performance with future increases in transistor count/density.

    The problem is that single threaded programs will run just as slowly on your quad-core 'Core-Quattro' in 2008, as they did on your old Pentium 4 - c. 2005. Great, yeah, I know, server loads parallelize very nicely (witness the miracle of Niagra), but consumer grade CPUs are where the volume is at, and people are going to have to notice a real difference in performance in order to stay on the hardware upgrade treadmill. This necessitates that Intel/AMD/IBM come up with new programming models that make it easy to parallelize existing code. Parallelized libraries and frameworks are all well and good, but it will be 20 years before everyone gets around to recoding the existing codebade to the the new platform - and most of them are probably not going to generate optimal code.

    No, what we need are compilers that take programs written in a serial fashion, and emit code that scales well on multiple processors. The problems with the PS3 are only the beginning.

    1. Re:This problem must be solved eventually by Kaldaien · · Score: 1

      The Intel C++ compiler and VTune are handy tools for writing parallelized x86 code. However, until recently such code was only practical for server code, as HyperThreading was the best a typical desktop machine could do running multiple threads. Except for the legacy CPU particle engine I worked on, I never really put much effort into optimizing this way and I know most other game programmers were the same. Time spent tuning code for a CPU architecture that your target hardware spec. does not have does not make a lot of sense.

      I am sure that just as programmable GPUs changed the way 3D engines are designed, multi-core CPUs and the Cell architecture will see a lot of coverage in future GDCs. If you teach new developers to think in terms of parallelization from day one, they probably will not be as overwhelmed by Cell as us dinosaurs are.

    2. Re:This problem must be solved eventually by MonoSynth · · Score: 1

      Most languages nowadays are focused on serial code, so I guess that a new language must be created with the low-levelness of C, but with parallel coding in mind. It takes a whole new mindset to program in such a language (maybe even a completely different approach to text editing and source files), but multi-core processors will kill the traditional programming paradigms anyway....

    3. Re:This problem must be solved eventually by Anonymous Coward · · Score: 0

      The problems IBM programmers are having are emblematic of the problems that the PC industry is going to be facing in a few years. Multi-core is the future of PC performance. Increasing GHz and IPC of single processors has pretty much hit a wall. Creating Dual and multi-core CPUs is the best approach we have left for increasing performance with future increases in transistor count/density.

      My suggestion - use Java. The JVM will distribute different Jav apps between the processors, or even different threads within an app if needed. No need to change a line in any of your code, existing or upcoming, I guarantee it.

    4. Re:This problem must be solved eventually by Kupek · · Score: 1

      You missed the point. By using threads, you parallelized the program already. You can get the same effect by using, say, C and POSIX threads.

      Although in Java applications, using multiple threads is often more about good design rather than good performance.

  60. Re:Am I ignorant or . . . tired? by Nomihn0 · · Score: 1

    Very interesting! As I am not a developer, I did not know this feature existed. I should have guessed it, though.

  61. Re:special compilers, expert programmer = DOA prod by Anonymous Coward · · Score: 0

    Also, the division into "expert programmer" and "regular programmer" is silly.
    No it is not. There are average programmers out there. They can hang pretty good. They can generally grasp what is going on. But they do not do the cutting edge work. They crank average code out for average things.

    Now your 'expert' is the sort of guy out there that lives and breaths it. They like to know that instruction xyz takes 3.9 cycles on average to finish. They live on this sort of thing. They are truely spooky to watch. They just zone and whatever they do comes out awsome. These people are extreemly rare. They are usually semi difficult to work with too as they do not think like everyone else.

    That's not because people are too stupid to do this sort of thing
    Now there I really disagree. There are truely stupid people out there that program. They should not be near a keyboard at all. They are mearly in it for the money, or presteege. They usually do not take the time to learn a system. They whip things out and just let the cards fall where they may. If they bother to comment code you will see things like 'not sure why this work but it does' or 'just incase not sure why' or 'someone else told me to do this'. Then you as an average programmer comes by and goes what the hell were they thinking?!

    Also this compiler may just be an enabler to get *AT* the features of a CPU. You do need a compiler capable of exploiting things. Will it have some sort of wizzy thing to auto thread things? Maybe, maybe not... If I didnt have a compiler that could let me at the other 'cells' in the Cell processor I would be rather mad at the people who make the compiler. Wouldnt you?

    But like *MOST* programs the best thinking is not done by the compiler but by the grey stuff holding your ears apart. The compiler is mearly the tool to get things done. It is not the thing that does the work for you. 99% of things out there need 0 optimization. It is that 1% where you will need it the most. The most optimal code in existance is the code that never runs. It uses 0 cycles. Real optimization comes in where you time the code then decide what needs to change to make something go faster. If you just 'hope' everything will be optimized by the compiler you will always have slow code. You will also never understand why. You are just 'guessing' the compiler is crap. And like most guesses when dealing with computers, you are wrong.

    I highly recommend the book debugging It shows many of the common errors people make when trying to make a system and diagnose one. And it is a fairly easy read to boot.

  62. Re:In other news... by Anonymous Coward · · Score: 0

    Hey! Linux is thwell becawth itth pwetty.

  63. But nothing more than that. by Nomihn0 · · Score: 1

    I understand that a program can be compiled with optimization flags specific to one hardware platform or another. What I'm confused by is the implicit claim made by IBM that the Octopiler does something more than this. I had always assumed that the Cell could interpret a flat program and divy up processing on the fly. That is what one of the cores is for, no? Apparently not. That's what tripped me up. IBM is proud that it has a proprietary compiler preconfigured for development on its Cell chip. Nothing more.

  64. Can't afford them by GunFodder · · Score: 1

    Only the DoD can afford Real Programmers. Game Companies have to settle for Quiche Eaters. On the bright side, Octopiler does a great job with Pascal code.

  65. yeah, you're right about that... by YesIAmAScript · · Score: 1

    RISC is over.

    RISC has some good ideas, but a fair number of drawbacks too.

    For small systems, uniform-sized instructions don't use memory effectively enough. Because of this ARM is abandoning RISC in favor of THUMB 2.
    For families of chips, the idea of exposing the hardware to the compiler turns out not to work because you cannot maintain many assumptions across individual incarnations in a family of processors. For example, look at MIPS. They eschewed interlocked pipeline stages, but had to put them in in their second processor in order to maintain binary compatibility with the first processor.

    I don't get your last comment though. We're still getting tons of optimizations at compile time. That part hasn't changed. The article is all about compiler optimizations!

    Anyway, yeah, strict RISC is dead. But many of the things we learned from RISC are still being employed.

    --
    http://lkml.org/lkml/2005/8/20/95
    1. Re:yeah, you're right about that... by tarpitcod · · Score: 1

      Yup - Code density definitely matters lots in embedded systems - but code density is much less important in the non-embedded space. Fixed instruction size still seems incredibly important - but I often wonder why more thumb like cpu's aren't out there. It seems like the following would be a winning solution:

      1) Maximize code density (as long as you can decode quickly). Store dense instructions in cache - and get more instructions in there.
      2) Allow complex instructions that run extremely quickly exploiting parallel functional units. Avoid any that make dependency analysis too horrible.
      3) Crank the core clock up - devote more chip bandwidth to data fetch vs instruction fetch.

      I also liked the MTA Tera idea - I still think that was really interesting. It looks like Sun is kinda going there.

      The whole MIPS - pipeline interlocks thing was a bit sad - but I do think MIPS/SGI nailed the R10000 tho :-)

      VLIW architectures often got poo-poo'd for the lack of backwards binary compatibility thing - seems like a bit of a moot point now - just ship stuff as intermediate code that gets translated into a binary.

      --Tarp

  66. Whew by packetmill · · Score: 1

    It's a good thing there's Sony had the good sense to make 8 cores instead of five, otherwise IBM would have given us the PENTOPILER, and I would have boycotted PS3 just for the name.

    I thank IBM, on behalf of the illiterate developer community of the world, for using naming conventions that suit the layman. Or the complete dumbfreak.

    Octopiler...

    1. Re:Whew by Anonymous Coward · · Score: 0

      Of course we could never go with 6 cores...

      SEXOPILER!

  67. Re:special compilers, expert programmer = DOA prod by AcidPenguin9873 · · Score: 1
    There are two kinds of performance we're talking about here: baseline CPU performance, and that few extra percent of performance you can get by doing some fancy compiler tricks.

    The GP was contending that if a fancy compiler is required to achieve good baseline CPU performance (i.e., using all the SPEs on the Cell concurrently), the architecture in question won't be as successful as an architecture that can get good baseline CPU performance without special optimizations.

    In modern CPUs, the out-of-order instruction window is what allows independent instructions to execute when their operands are ready, regardless of the schedule the compiler lays down in the binary. Sure, if you put a load and a use of that load right next to each other, the use is going to have to wait. But meanwhile, other instructions from earlier/later in the stream can execute. Dependencies are resolved on the fly via register renaming and memory disambiguation hardware.

    On the other hand, Cell needs a compiler to figure out where the dependencies are and aren't so it can schedule code to execute independently on different SPEs. Today's compilers could produce code that would execute well on one SPE, but all the rest of them would sit unused. This sort of "optimization" (I wouldn't even call it that, I would call it program transformation) is difficult to do.

    I realize the days of turning high-level languages into a fixed instruction sequence are long gone, but today's CPUs would get within, oh, say 80-85% of their optimal performance if a compiler did do that. The Cell, on the other hand, would see a slowdown of factors of 4 or 5 (vs. using all the SPEs) without a using parallelizing compiler or writing code in a completely different programming paradigm.

  68. Re:special compilers, expert programmer = DOA prod by qbwiz · · Score: 1

    But it's going to take m cycles to load the data, no matter what, so RISC chips can't decrease the maximum time, but only increase the minimum time. How does that help us make the program faster?

    --
    Ewige Blumenkraft.
  69. What else can I use? (for games at least) by Corngood · · Score: 1

    I would love to find a replacement for C++, and if there is something that will meet my requirements, please let me know.

    First of all, lets ignore the actual existence of the language tools, if there is even anything on the horizon, I'd like to know. In order to replace C++, I need something with that will be as efficient and predictable. I need some way to exploit all features of the machine, for example: If the machine has a reciprocal-square root instruction, I can make an function inline float rsqrt(float) in C++, and use inline asm to emit the instruction. The compiler will then slot that instruction right into the calling code and optimise it extremely well (no branches, no extra loads, etc). I'm not saying that exact method is ideal, but the bottom line is if the machine has a certain reciprocal-square root performance, I'd need to be able to at least approach it. The only reason this works in C++ is because you usually have an assembler that can generate any instruction the target machine can execute, and the C++ compiler gives you a window to it.

    Memory management is the other big issue. I need to have more control over memory than just a global garbage collected heap and a stack. The option of using a garbage collector on a block of memory is not unwelcome, but I need to have more control for some things. In C++ I can specify the address of an object, so for example, if the machine has scratchpad memory, I use that memory and still have reasonably maintainable code.

    C++ is not perfect by any means, but right now you can develop a game for all of the major platforms using it, and usually with it alone (with inline asm, and for the main processor at least). I await it's successor in this field, but I don't think it's out there just yet.

    PS. I would just be happy if we had more modern multipass compilers for C++ that would unify declaration and definition.

    1. Re:What else can I use? (for games at least) by gnuLNX · · Score: 1

      Have a good look at D. No seriously!

      http://www.digitalmars.com/d/index.html

      --
      what?
    2. Re:What else can I use? (for games at least) by Corngood · · Score: 1

      D is the only language that got my hopes up at all, but there are a few minor worries I have about it. I don't much like losing multiple inheritance, but it's not a dealbreaker. I'm not really sure how the heap works, and if you can ditch the garbage collector, or at least manage the heap manually. It seems not to have the option of static casts for certain types. I definitely like the idea of the language, and if anyone has any experiences with it, it would be nice to hear from you.

      Unfortunately, at this point for a language to be practical for games it would need to both interface with existing C calling conventions, compile to C as an intermediate language.

    3. Re:What else can I use? (for games at least) by Anonymous Coward · · Score: 0

      I switched to Ada a few years ago. It took me some times to understand it (Ada is really more complex than C++ and you have to think differently), but now there's no way I'm going back.

    4. Re:What else can I use? (for games at least) by gnuLNX · · Score: 1

      It definetly interfaces well with C. I actually don't have that much experience with it. I am a scientific coder and we face a lot of the same issues you games guy's do. I would love to start a small project from scratch just to see how D really stacks up. Like you said, right now C++ is pretty much the best we have.

      Cheers

      --
      what?
  70. RISC has never really been constant time... by YesIAmAScript · · Score: 1

    I know that was the idea, but it wasn't true for any popular RISC architecture.

    In the early days of RISC, integer multiplies typically took 3-5 cycles and divides took 33. Loads and stores of course have variable latencies too.

    AMDs 29K architecture turned an integer divides into a 33 instruction sequence to get around this. It also make it impossible to optimize this on later chips in the family, when 17 cycle divides became commonplace (first popularized on Pentium).

    With any modern architecture, RISC or CISC, the instruction scheduling restrictions are bestial. Which is another reason why it baffles me that people continue to use gcc as their compiler. It generates awful code.

    --
    http://lkml.org/lkml/2005/8/20/95
  71. Tier IV programmers and SPE branch stalls... by Kaldaien · · Score: 1

    This is an educated guess at best, but would not programmers in teir IV tend to write code that stands little to nothing to gain by Cell's parallell architecture? I mean, engine programmers would be in tier I - III, the menial tasks would be tier IV. I do not see controller polling and other boring game logic benefiting greatly by using more than one SPE. Matrix and vector math, on the other hand, have potential - but a good optimizing compiler can optimize that stuff without a lot of hints.

    If this compiler is truly intended to make unoptimized tier IV code Cell friendly, it had better do an extremely good job simplifying branches -- since the SPEs are highly pipelined and branching can stall for up to 18 cycles. A lot of highlevel programmers never think about the consequences of branch misprediction even though they write some of the most branch-heavy code. Compiler optimization feedback is nice, but I do not think tier IV programmers look for or even know what to do with it.

  72. Re:special compilers, expert programmer = DOA prod by gnuLNX · · Score: 1

    Actually I think you got it backwards. Software people need to realize that if we want to keep our jobs then it is in our best interests to become good programmers on the latest hardware. Our current computing paradign (hate that word) is coming to an end people. Multi core processing is here and the CELL is just looking at it from a different perspective. We will ALL be writing parallell code or we will not be writing code. I for one am quite excited about the return of optimized programming.

    --
    what?
  73. Virtual machines to the rescue? by hutchike · · Score: 1
    Surely this is an excellent example of where a well-designed virtual machines (e.g. JVM) could optimize bytecode for the 8 cores? (So long as the app was written to use threads). Does anyone have any JVM benchmarks for Cell?

    Similarly, a good Linux port will share processes over the 8 cores optimally - is Linux for Cell available yet? Benchmarks? I'm keen to see the Cell blade servers coming soon!

    Note that Sun Studio compilers were freely available before their new T1-powered servers were launched.

    Without the right toolsets, hot tech is not so cool. Let's hope Cell and T1 are not burried in the Alpha/Itanium graveyard!

    --
    Zen tips: Pay attention. Don't take it personally. Believe nothing.
    1. Re:Virtual machines to the rescue? by MarcTheLad · · Score: 1

      I hope you know that compared to the Cell, the UltraSparc T1 has up to 8 general purpose processors, each of which can handle up to 4 concurrent threads. The Cell "SPE" are made for floating point / SIMD / work only (they don't have some of the amenities of regular general purpose processors like branch predictors).

  74. Hentai by Anonymous Coward · · Score: 0

    Sounds like a hentai anime moster!

  75. Re:special compilers, expert programmer = DOA prod by DimGeo · · Score: 1
    On that note, it's somewhat heartwarming to envision hordes of recent CS grads, soaked in the latest OO paradigms, being told, "there's great money to be made programming for the Cell, but you're going to do it in High-Performance Fortran."

    Nice idea you have there! How about some more readable higher-level language whose compiler front-ends produce Fortran structures for the existing Fortran optimizers? Such a language could easily have the familiar C look&feel, etc...
  76. Mainframes by Wikipedia · · Score: 0

    IBM Mainframes have had this forever!

    --
    P2P Anonymous Distributed Web Search: http://www.yacy.net/
    1. Re:Mainframes by richman555 · · Score: 1

      IBM Mainframes also are slow when running virtual machines and java, both of which give you very very poor performance. Way too slow!

    2. Re:Mainframes by Wikipedia · · Score: 0

      I meant that Mainframes have had a special intermediary chip that translates code from the bytecode into a form readable by the current chip, or something like that, so that there aren't any problems running old code on a newer processor. I forgot what it's called, though. When will PCs get that kind of reliability?

      --
      P2P Anonymous Distributed Web Search: http://www.yacy.net/
  77. intermediate code... by YesIAmAScript · · Score: 1

    Interesting point on the intermediate code front. I think the hardware front ends in the chips that convert instructions into internal representations work very well and they are so small compared to the rest of the chip (even next to the cache) that saving that many transistors just doesn't amount to much.

    Additionally, again on the embedded front, if you translate intermediate code, you may not have a place to put it. For example, Xbox 360 or PS3 can have tens or hundreds of megabytes of code on a ROM but no equivalent amount of R/W storage to put the translated code in.

    That's a special case of course.

    --
    http://lkml.org/lkml/2005/8/20/95
    1. Re:intermediate code... by tepples · · Score: 1

      if you translate intermediate code, you may not have a place to put it.

      In RAM, of course. When your operating system's loader puts code into RAM ordinarily, it has to relocate the program and all libraries that it uses by translating addresses within the file to addresses in memory. The loader could do some lightweight recompilation at the same time.

      Xbox 360 or PS3 can have tens or hundreds of megabytes of code on a ROM

      Nope. Almost all the data on a PS2 disc is assets, not code.

  78. Re:special compilers, expert programmer = DOA prod by idlake · · Score: 1

    Pretty much all modern CPUs need special compilers to give good performance.

    If pretty much all modern CPUs need a particular feature, it's not "special" anymore. What's "special" about Cell is the features that make it different from mainstream CPUs.

    To prevent pipeline stalls, you need to make sure that you have a minimum of m instructions (and ideally m) between your load and your first operation that depends on the that data.

    So, in addition to having to wait for the completion of the variable time operation, which the data-dependent operation has to do anyway, the equivalen RISC instruction sequence is going to fill up the pipeline and the cache with junk. But, hey, at least the pipeline didn't stall--the CPU designers have successfully pushed the problem off their plate. Thanks for illustrating my point.

    In any case, I'm not defending complex instructions at any cost, I'm saying that CPU designers have gone too far in pushing problems off onto compilers. We may not need a string edit instruction, but we do need better support for various forms of parallelism than Cell or Itanium. I expect the evolution of MMX, hyperthreading, and multicore chips is going to be much more important than architectures like Cell or Itanium.

  79. I remember by DSP_Geek · · Score: 3, Interesting

    About ten years ago VM Labs came out with something not too far off conceptually from the Cell - vector instructions, local memory you had to DMA in and out of, 4 processors on a chip. It wasn't floating point, however, and the development tools were best described as rudimentary: the best way of debugging was to deliberately crash the box and examine the register dump barfed back over TCP/IP.

    They called a developer's conference in August 1998, where after the presentation a veteran game coder shrugged: "Another weird British assembler programming cult".

    The Cell strikes me the same way, and for the same reasons, although Big Blue likely has more development tool budget than VM ever did. Not to take anything away from the smart guys at IBM, but I suspect they'll have a fun time working around the Cell's limitations. I can tell them from experience that DMAed local memory will be much more of a pain in the ass than they can imagine, and unless they can guarantee sync in hardware they'll be wasting a bunch of time schlepping spinlocks in and out of memory. The vector stuff will also be nontrivial: the best way to make that usable, apart from having everyone write vector code from the git-go, would be to provide a stonking great math library in the style of the Intel Integrated Performance Primitives.

    As an aside, the PS3 is in the tradition of Sony not caring about who programs their machine: the PS1 was easier to code than the Saturn, which was a true horror, the PS2 upped the difficulty a fair bit, and now even experienced coders are bitching about the PS3. Meanwhile Microsoft is learning from their mistakes: the X360 is easier than the X1, and if you doubt that makes a difference, check out game development budgets and time to delivery. I don't care, really: I eat algorithms and machine code for breakfast, so this just means more jobs and money for me.

    1. Re:I remember by MikeBabcock · · Score: 1

      If you're that experienced, I'd be interested in hearing your opinion after you've had a chance to go through some of the development specs of the cell if they ever get publically released (I'm sure they will). The APIs Sony has been developing are supposedly very complete as well (pre-built physics engines and such) so that game developers can just make the game, not the back-end engine.

      I'm not a full-time assembly programmer though :)

      --
      - Michael T. Babcock (Yes, I blog)
  80. Why the Cell processor is such a pain by Animats · · Score: 4, Interesting
    The basic problem with the Cell processor is that the SPEs each have only 256K of private memory, with uncached, although asynchronous, access to main memory. It's the unshared memory that's the problem.

    This architecture has been tried before, for supercomputers. Mostly unsuccessful supercomputers you've never heard of, such as the nCube and the BBN Butterfly. There's no hardware problem building such machines; in fact, it's much easier than building an efficient shared-memory machine with properly interlocked caches. But these beasts are tough to program. The last time around, everybody gave up, mainly because more vanilla hardware came along and it wasn't worth dealing with wierd architectures.

    The approach works fine if you're doing something that looks like "streaming", such as multi-stream MPEG compression or cell phone processing. If you want to do eight unrelated things on eight processors, you're good.

    But applying eight such processors to the same problem is tough. You've got to somehow break the problem into sections which can be pumped into the little CPUs in chunks that don't require access to any data in main memory. The chunks can't be bigger than 50-100K or so, because you have to double buffer (to overlap the transfers to and from main memory with computation) and you have to fit all the code to process the chunk into the same 256K. That's a program architecture problem; the compiler can't help you much there. Your whole program has to be architected around this limitation. That's the not-fun part.

    You have to make sure that you do enough work on each chunk to justify pumping it in and out of the Cell processor. It's like cluster programming, although the I/O overhead is much less.

    In some ways, C and C++ are ill-suited to this kind of architecture. There's a basic assumption in C and C++ that all memory is equally accessable, that the way to pass data around is by passing a pointer or reference to it, and that data can be linked to other data. None of that works well on the Cell. You need a language that encourages copying, rather than linking. Although it's not general-purpose, OpenGL shader language is such a language, with "in" and "out" parameters, no pointers, and no interaction between shader programs.

    Note that the Cell processors don't do the rendering in the PS3. Sony gave up on that idea and added a conventional NVidia graphics chip. (This guaranteed that the early games would work, even if they didn't do much with the Cell engines.) Since the cell processors didn't have useful access to the frame buffer, that was essential. So, unlike the PS2, the processors with the new architecture aren't doing the rendering.

    It's possible to work around all these problems, but development cost, time, and risk all go up. If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast. The Cell approach is something you do because you have to, not because you want to.

    1. Re:Why the Cell processor is such a pain by tepples · · Score: 1

      The approach works fine if you're doing something that looks like "streaming", such as multi-stream MPEG compression or cell phone processing.

      Or processing of transformations and lighting operations on vertices and texels. Or processing of physics on skeletal models.

      There's a basic assumption in C and C++ that all memory is equally accessable, that the way to pass data around is by passing a pointer or reference to it

      Functional programming could change this.

    2. Re:Why the Cell processor is such a pain by kabocox · · Score: 1

      The last time around, everybody gave up, mainly because more vanilla hardware came along and it wasn't worth dealing with wierd architectures.

      You need to ask yourself how many programmers ever got a chance to seriously work with those odd machines. Now ask your self how many game companies will have to pour resources into developing for the PS3. Here is a really odd thought. Did you ever think that IBMs whole secret plot behind PS3 was to bring the cell architure to the most programmers worldwide? Sure IBM could have hand a handful of groups or contractors that were trained on their inhouse super high end cell boxes, but wouldn't be much easier if you could make a slightly scaled down version that introduces the same programming "problems" to the entire video game industry that is known for trying to get the absolute most out of the hardware?

      It's wierd because currently it is a unquie game machine. Shortly there will be thousands to maybe millions of them sold. There will be alot more interest in trying to force the platform to work. I'd bet that in 2 years some odd little programming breakthroughs are made and licensed out by little known video game companies that will make this work.

    3. Re:Why the Cell processor is such a pain by imsabbel · · Score: 1

      Well, smarty, guess what the vertex shaders on the GPU of the PS3 do?

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    4. Re:Why the Cell processor is such a pain by johnwbyrd · · Score: 1

      Actually, your example of on-the-fly MPEG is a particularly difficult case for Cell... you don't have access to an entire frame buffer at one time. It's necessary to slice up the HD frame buffer into 256 KB sized chunks to get any kind of decent performance. And remember that you want access to P and B frames most of the time, which means that if you leave cache at least twice for encoding or decoding most frames. It can be done, but a naive implementation will suck spectacularly.

    5. Re:Why the Cell processor is such a pain by Animats · · Score: 1
      You need to ask yourself how many programmers ever got a chance to seriously work with those odd machines.

      Very few. (I've used an nCube.)

      What we'll probably see on the PS3, at least for early games, is that one SPE is devoted to audio, one is devoted to something like decompressing textures and geometry from the disk, one is devoted to dealing with the network, one is doing the 2D GUI components, and the rest are idle. The PowerPC CPU will be doing all the gameplay and the nVidia chip will be doing the 3D graphics. This is suboptimal, but not hard to implement.

      More general distributed processing may come later. It would be nice to be able to do physics in the SPEs, but, as the author of the first physics engine that did ragdolls, I can say that making that work comes under the heading of "not fun".

    6. Re:Why the Cell processor is such a pain by ExoticMandibles · · Score: 1
      If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast.

      Well, maybe not 8-core, but the XBox 360 has six cores. Three dual-core PPCs.


      larry

    7. Re:Why the Cell processor is such a pain by be-fan · · Score: 1

      The XBox 360 chip has a single chip with three cores. The cores are dual-threaded, but only has one set of execution resources.

      --
      A deep unwavering belief is a sure sign you're missing something...
    8. Re:Why the Cell processor is such a pain by awgupta · · Score: 1

      If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast
      how does sharing cores address the shared cache coherence problem?

    9. Re:Why the Cell processor is such a pain by Anonymous Coward · · Score: 0

      All SMT implementations share execution units.

    10. Re:Why the Cell processor is such a pain by be-fan · · Score: 1

      I know. The Cell's cores are SMT, not "dual core" as the OP alleged.

      --
      A deep unwavering belief is a sure sign you're missing something...
    11. Re:Why the Cell processor is such a pain by Samurai+Crow · · Score: 1

      The Cell can share memory between SPEs. That's part of how the DMA controller works. The problem with this is that only one of the SPEs can function when the local store is not in "isolate mode". As for conventional caches being able to keep up with local store, have you ever tried to cache a linked-list? You can do this with a DMA controller.

  81. Haskell by tepples · · Score: 1

    and virtually all useful programming languages have global side-effects.

    Haskell being the exception that could break the logjam, right?

  82. Why limit yourself to := ? by tepples · · Score: 1

    If Cell can't deliver top-notch performance with a simple compiler back-end and regular programmers who know how to write decent imperative code, then Cell is going to lose.

    What's so good about imperative code? What's so bad about purely functional languages such as Haskell?

  83. Is there a better free compiler than GCC? by tepples · · Score: 1

    Which is another reason why it baffles me that people continue to use gcc as their compiler. It generates awful code.

    What C compiler generates better code and can be distributed with a Free operating system?

  84. C should die, its brother C++ should live by woolio · · Score: 1
    Most of the problems you cite are only truly applicable to "C".

    In C++,
    • Macros largely unnecessary with the use of templates
    • C++ is strongly typed (much more than C)
    • Character strings unnecessary (object-oriented replacements more generic, more powerful)
    • GC? Well, it can be done a bit more nicely with some object-oriented techniques and programmer dicipline)
    • Could be extremely portable, but most people seem to not bother to make their programs that way.


    I'm not sure what you mean about "extreme verbosity"... Sounds like you're a Perl programmer (;->

    Also static introspection is possible in C++ using overloading and/or templates... [But it would have been more convenient if it was an explicitly-built-in feature] Also, templates can be used in very powerful ways (template metaprogramming), for example to automatically choose the best sort routine (at compile-time) for a particular data-type. It is even possible to use templates to create highly portable code (e.g. when char,short,int,long vary across architectures and a minimum level of precision is required).

    In terms of writing complex, high-performance software, I don't see anything replacing C++ (not even Java). But for applications where performance is not an issue, I find the strong typing features of C++ to be an advantage...

    Think of how many web CGI scripts have security flaws because they are passing un-sanitized data from the GET/POST data to SQL queries or the command-line? Well, these flaws could have been prevented at COMPILE TIME, with a strongly typed language, such as C++. (Strings in different domains could have different types, forcing the programmer to run specialized functions for sanitizing one string before using it in a different domain)

    My little knowledge of functional-based languages is that they tend to copy data unnecessarily. This doesn't matter in many applications, but it becomes a show-stopper in terms of performance for some.
  85. Number of processing units by tepples · · Score: 1

    The number of threads it is optimal to create is roughly the number of processing units at your disposal - something you don't actually know until runtime

    O rly? I thought that by definition of the Cell processor, you got one PowerPC and 7 DSPs on every PS3.

    1. Re:Number of processing units by TheRaven64 · · Score: 1

      If you are targeting the Cell, then you won't be using Java because there is no JRE for the SPUs. You could write code for the CPU using Java, but that would eliminate the need for parallelism.

      --
      I am TheRaven on Soylent News
    2. Re:Number of processing units by tepples · · Score: 1

      If you are targeting the Cell, then you won't be using Java because there is no JRE for the SPUs.

      First of all, where did I mention Java technology in particular? I was talking about threads in general. Second, what makes you think some PS3 middleware vendor isn't going to come out with some sort of technology that does the same thing as Java technology?

  86. yes, it is by YesIAmAScript · · Score: 1

    But if Sony comes up with the right dev tools and market share, it won't matter. Sega's problem was at least as much poor market share making it less profitable to write for their platform as much as it was being tough to program for.

    And in Sony's defense, it isn't like 360 is easy to write for either, you have to write multi-threaded games, which isn't quite the norm.

    It's a calculated risk Sony is taking. They had to take some kind of chance, you're not going to deliver the kind of performance it takes to match up to 360 by sticking with conventional design.

    --
    http://lkml.org/lkml/2005/8/20/95
  87. None. But that isn't everything... by YesIAmAScript · · Score: 1

    Apple uses gcc. Their OS isn't free.
    Sony uses gcc for PS2 and currently uses gcc for PS3, and nothing about their platform is free. And their dev kit is FAR from free.

    I do agree, gcc is a good value for the money. But if you have a choice, you can get a lot better on performance than gcc. In many systems, it's worth the cost.

    --
    http://lkml.org/lkml/2005/8/20/95
  88. Yeah, and... by YesIAmAScript · · Score: 1

    In RAM means you spend extra time recompiling every time. That makes no sense. That's not the same as using an intermediate format as a distribution format. You're essentially talking about using a dynamic recompiler. And they just wouldn't match up on performance to a native compiler due to having to recompile it each time you load the code.

    As to your "Nope", a game can have 10MB of code without much difficulty. Yes, almost all the data on the disc is assets, but given that the disc can have 9.0GB, having 0.01GB worth of code would still mean almost all the data on the disc is assets, yet it still has 10MB of code. Even at 100MB of code it'd still be 99% assets. One hundred MB might be a stretch, but it is quite possible, given the ROM size and your system can't break down when someone uses the system in a way you didn't expect.

    In addition, I said "Xbox 360 or PS3", and you responded with a comment about the PS2. Weak.

    --
    http://lkml.org/lkml/2005/8/20/95
    1. Re:Yeah, and... by tarpitcod · · Score: 1

      For a console it seems like you wouldn't want to ship intermediate code - I was thinking more for desktops where the idea of binary compatibility is touted. Although I do wonder after these last two comments if the code is such a relatively small size, perhaps you could ship IC and Binaries (for exact target platform) on the same Disc.

      That way if you release XBox-720 or Playstation 4 - you could perhaps use the Intermediate Code instead of the raw binaries for PS3... Kinda like a FAT binary - but I would have two separate copies.

      Just a crazy idea.

      --Tarp

  89. Transmeta supplied the tools to IBM by Anonymous Coward · · Score: 0

    Transmeta was contracted to write some of the tools for programing on the cell. So it wouldnt surprise me if the compiler mainly came from Transmeta. I beleive that part of the reason they got the nod for this was their previous experience developing their proprietary "Code Morphing Software" layer. So perhaps this compiler isnt as horrid as everyone seems to believe.

  90. actually, wouldn't it be the by eamonman · · Score: 1

    Septapiler for the PS3?

    --
    0- Eamonman Proud member of DNRC
    1. Re:actually, wouldn't it be the by Anonymous Coward · · Score: 0

      Why would it be septapiler? 1 Main CPU + 7 SPEs (8 SPEs, 1 disabled) = 8 cores... There's at least two ways I can come up with "8" in there...

  91. Don't be a revisionist by Macka · · Score: 2, Interesting
    The Itanium on the other hand was obsolete on it's launch. Even HP dumped it after killing their own better performing 64 bit processor for it and spending billions of dollars and ten years building it.
    HP most certainly have not dumped it. If anything they're pushing harder than ever. All I hear from HP these days is Itanium, Itanium, Itanium .... and I've been to a few HP pre-sales events in the last couple of months where they've been pushing it very hard. In a few months they'll be revising their Integrity line and introducing systems that are Montecito ready. Right now HP are saying that for Integrity, they will not be beaten on price. And if you're in the market for an Itanium server you can expect to get some pretty hefty discounts!

    Yes Itanium has failed to grab anything like the market share it was meant to. But that has nothing to do with its architecture. There's an arstechnia review from last year (I think) which talked about the Itanium architecture, and they were very up beat and complementary about it. The summary of that article was that as fabrication tech improves and die shrinks follow, and it becomes possible to cram more cores and larger and larger caches on to a chip, the Itanium architecture has more scope to grow and perform than any of its current competition. EPIC loves large caches.

    There is only one real reason why Itanium has been such a flop so far, and that's x86-64. Intel had no intention of bolting 64 bit tech onto the x86 architecture. If you wanted 64 bit computing you were meant to go Itanium. End of story. That was the way Itanium was going to get its market share, and large volumes were going to drive the costs down. Intel either didn't see AMD coming, or didn't see what they were doing as a threat until it was too late. The x86-64 bomb shell, when it hit, threw Intel into complete disarray. Not only was x86-64 way cheaper than Itanium, but it out performed it and it offered seamless backward compatibility. The Itanium volume market plan was doomed from that moment on. As a consequence Intel had to scrap their x86 road map and re-draw it with their own 64 bit implementation, i.e. EM64T. They've been playing catch up ever since.

    A side effect of the Intel's change in direction and focus has been a change in where they've put their resources. Itanium got starved of the resources it was originally planned to have and as a consequence Montecito is way late and isn't quite the kick ass design it was meant to be. Intel's partners like HP have suffered as a consequence.

    Never the less Itanium is not going away, and even though Montecito is late, the current crop of Itanium chips are no slouch. When Montecito arrives it's going to give a much needed boost to HP Itanium sales. That's what they hope for anyway.

  92. What I'd like to see by 21mhz · · Score: 1

    If I were to program games on Cell, I'd rather not use a dumbed down all-in-one compiler. This is the kind of an easy solution for a complex problem that's never going to work well. And programming a heterogeneous, asynchronous, memory-asymmetric architecture is complex.

    Let there be an SPE compiler that produces "tasklets": bits of SPE code plus some positioning information, such as location of DMA areas. The compiler may be for some specialized vector-friendly language to match the units' instruction set well. Then, make a library for the main CPU to facilitate deployment of SPE tasklets, handle synchronization, DMA area management, dynamic unit allocation and so on. You'll be amazed how many programmers turn out ready to work in this model.

    If someone just wants to port existing code, well, there is a whole POWER core there, AltiVec and all! Is it too weak?

    --
    My exception safety is -fno-exceptions.
  93. The 'octa' part is wrong! by Anonymous Coward · · Score: 0

    Although their 8 cells on each PS3 processor, one is redundant (to improve yields) and one is assigned to the OS. So you 'only' have to handle six different cells.

  94. Performance Improvement: averaging at 22 percent. by guysmilee · · Score: 1

    Based on the blow quote ... unless I am missing something 22% performance improvement hardly seems worth waiting for ... am I miss-reading something: "We first evaluate the optimized SPE code generation techniques presented in the section "Optimized SPE code generation." Figure 11 presents the reduction in program-execution time for each optimization relative to the performance of the original compiler. We achieved a reduction which ranged from 11 to 51 percent, averaging at 22 percent."

  95. Re:special compilers, expert programmer = DOA prod by vladmihaisima · · Score: 1

    This maybe true for the desktop market, but for embedded the special compiler IS needed, and also not everyone can write a program that will run ok on a mobile phone. The programmers must take into account the architecture specifities so that the code executes acceptably in the constrained environment.

    And the general trend is to take into account more and more variables (like power consumpption and different execution units that can perform different tasks).

    The future of Cell depends on IBM on the developers and on the users. By itself needing a complex compiler is not a reason to fail.

  96. Why a sequential language? by Anonymous Coward · · Score: 0

    There are plenty of languages that do much better with parallelism. Erlang, for example. Maybe you want static typing for speed, but the point is, if you use message-passing or dataflow concurrency, instead of the usual resource locking, you make parallel processing a lot easier on programmers, and don't have to build a heroic compiler that somehow figures out parallelism from sequential code.

  97. Functional Programming by BCGlorfindel · · Score: 1

    Tim Sweeney presented a paper recently on the topic of game engine design for multi core systems. Basically it amounts to changing from c/c++ to a functional language like Haskell for engine development so the language takes care of the task divisions needed. Code complexity is already a problem for engine development, managing threads is just gonna make that worse. When engine development time is as long as it currently is, taking a performance hit in code execution can be offset by a faster time to market.

  98. Multitasking and pre-emption by bLanark · · Score: 1

    All this clever is design going into getting the compiler working well, but surely the cleverest parallel code could be ruined by having to share the processor(s) with other tasks/processes/threads.

    I wonder what the cost of being pre-empted on one or more processors is. A really clever design might allow the programmer to place hints such as "don't preempt this chunk, it's optimised really nicely" or "dedicate a processor to this code until I tell you otherwise".

    --
    Note to ACs: I won't mod you up, even if you are being funny or insightful. So take a chance! It's not real life!
  99. Re:special compilers, expert programmer = DOA prod by MikeBabcock · · Score: 1

    There's a major problem with your argument.

    "Extra few percent"

    I hear that all the time.

    "Instead of compiling with optimizations, I should be able to distribute a debug build and just make them buy better hardware".

    We're talking at least a 25% increase in speed in many CPU-bound applications, and often a several-fold increase with specialized compilers.

    People who do video and audio encoding are not the target here either -- in those cases, optimized builds often make little difference because somebody went to the effort of hand-writing the main loops in optimized assembly already.

    Check out Intel's C compiler versus Microsoft's (or GCC) for simple non-CPU-bound application performance differences.

    --
    - Michael T. Babcock (Yes, I blog)
  100. that's the best post on the topic I've seen by YesIAmAScript · · Score: 1

    And perhaps you're even on the inside, since you know Sony tried to go without the NVidia chip.

    I was around for last range of machines, HyperCubes and such that you speak of, and you're dead right as to why they were dumped. Most tasks couldn't be divided up well enough to use the hardware effectively.

    I would add a little bit. First, using 8 (7) processors will be a lot easier than using the hundreds in those older machines. Second, given currently technology limitations, it isn't likely someone is going to match the potential Cell performance with a shared-memory design at the price Sony pays for Cell.

    Additionally, if you read the article, IBM has a proposed compiler-based solution to the necessity of using 256KB pages. I have to say I'm more than a bit skeptical about this.

    --
    http://lkml.org/lkml/2005/8/20/95
  101. I heard Cthulhu was available... by Senzei · · Score: 1
    ...but he only works in Cobol, I heard he was learning .Net though.

    --
    Slashdot: Where anecdotes and generalizations can be freely substituted for facts, logic, or intelligence
  102. 360 is not easier than Xbox 1. by YesIAmAScript · · Score: 1

    Xbox 1 you could port games to and from it with ease. Using conventional programming.

    360 requires multiple threads to use it well. Additionally, you have to do GPU programming (shader programming) to use it well. Those are huge increases in complexity from Xbox 1, which was quite straightforward to program for.

    The 360 still has the unified memory architecture at least.

    --
    http://lkml.org/lkml/2005/8/20/95
  103. Re:special compilers, expert programmer = DOA prod by AcidPenguin9873 · · Score: 1
    You missed the entire point of my argument, which, after reading it again, wasn't clear enough:

    The Cell fundamentally requires program transformations to be performed by a compiler to make use of most of the chip. The only other CPU that comes somewhat close to that is Itanium.

    Now, we can debate just how much performance loss is seen with unoptimized code on dynamically-scheduled out-of-order superscalars, and you have a point there: it can be significant. But not as significant as only using 1/8th (or 1/nth, where n is the number of processing elements) of a chip.

  104. Remember the Sega Saturn? by Anonymous Coward · · Score: 0

    It's a well known fact that the Sega Saturn was a powerful machine due to it many processors. If I remember correctly, it had 2 processors used for calculations, a sound processor, and maybe three more used in various graphical function (the assigned specific functions to each other. Their reasoning was that by using a much more parallel architechture they could create a much more powerful machine at a lower cost.

    It backfired when they didn't make it easy to code for such a beast and also didn't provide good support documentation to developers (if I remember what I read). The machine ultimately died to due this that caused a lack of good games to be released.

  105. There is.. it is called OpenMP by Anonymous Coward · · Score: 0

    Can you say OpenMP? (www.openmp.org)

  106. PS2 Was an Oddball too. by Nazmun · · Score: 1

    The emotion engine had 3 parts to it... Programming for it was in no way normal but it looks like the console dev's pulled it off.

    --
    Hmmm... Pie...
  107. Octopiler overlords by pentium69 · · Score: 1

    I for one welcome our Octopiler overlords

    --
    Mystika
  108. Publisher, developer, and asset licensor logos by tepples · · Score: 1

    In RAM means you spend extra time recompiling every time. That makes no sense.

    What else is the CPU doing while the optical drive is loading things from disc?

    And they just wouldn't match up on performance to a native compiler due to having to recompile it each time you load the code.

    A lot of programs for video game consoles only load code once, namely at the beginning. Couldn't they do this recompilation while displaying the allegedly legally required unskippable copyright notices and unskippable logos for the publisher, developer, and licensor of the asset franchise (e.g. a movie studio or a sport league) and while loading the asset data for the title screen and main menu?

    One hundred MB [of code that must be recompiled] might be a stretch

    How many source lines of code compiles to 100,000,000 bytes of object code, how long would it take for human beings to develop and test that much code, and how would such an enormous project be funded? Does Microsoft Windows even contain that much code?

    In addition, I said "Xbox 360 or PS3", and you responded with a comment about the PS2. Weak.

    The generalization from PS2 to the next generation was left as an exercise for the reader. Xbox 360 media (DVD-9) is not larger than PS2 media (also DVD-9), and just as a lot of PS2 games came on CD-ROM, I expect a lot of PS3 games to come on DVD-9 media due to the higher initial replication cost of BD-ROM.

  109. more compl{e|i}mentary confusion by spage · · Score: 1
    they were very up beat and compl e mentary about it

    Like in 1's and 0's and XOR's? You mean complimentary. Most people garble the other usage, e.g. "A good compiler is an essential complement to advanced hardware".

    Also "upbeat" is one word.

    Without your e-mail address, Ms. Edna Krabappel can only correct in public.

    --
    =S
    1. Re:more compl{e|i}mentary confusion by Macka · · Score: 1


      Thanks. I'll watch out for that one in future and make sure I get it right next time.

  110. Well, yes, but they can both retire. by DichotomicAnalogist · · Score: 1
    Most of the problems you cite are only truly applicable to "C".
    I'm afraid that's not true. C++ does provide crutches for C, and can solve a few problems indeed, but in my experience, in the long run, it just fails.

    * Macros largely unnecessary with the use of templates
    True, in theory. Macros stay largely used in practice.
    * C++ is strongly typed (much more than C)
    I'm afraid you're confusing "static typing" and "strong typing". C++ is a mix of static weak typing, tiny bits of dynamic typing and big chunks of no typing at all. In other words,
    • the type system is invalid (i.e. no subject reduction)
    • just the template type system itself is invalid
    • the type system doesn't guarantee anything -- constants can change, references can be NULL, a variable of type T can contain a value of unrelated type U, etc.
    • except for some aspects of virtual methods, no type errors are caught at run-time.

    * Character strings unnecessary (object-oriented replacements more generic, more powerful)
    Indeed. However, Mozilla has 12 different classes of character strings, Apache Xerces 2 or 3, wxWidgets 4 or 5, and none of them are compatible across project or with the STL. Which was exactly my point.
    * GC? Well, it can be done a bit more nicely with some object-oriented techniques and programmer dicipline)
    True. Unfortunately, that's quite hard to do, and it grows nigh-impossible when you're trying to mix two different libraries. And, well, come on, unless I'm writing a critical or semi-critical app, I don't want to spend half of my time, if not more, managing memory. If I'm writing a critical or semi-critical app, I won't use C++ in the first place. I'll probably go Ada or Esterel, depending on the task.
    * Could be extremely portable, but most people seem to not bother to make their programs that way.
    In my experience, it's doomed at low-level, starting with the low-level libraries you will probably need to write your program. Making it portable actually requires fighting against the conventions of these libraries.

    I'm not sure what you mean about "extreme verbosity"... Sounds like you're a Perl programmer (;->
    :)
    I'm a functional programmer. Walking through a tree to collect informations should be something I can write in 5 short lines, including type information. Not in 100 lines (assuming a good library), to obtain unsafe, MT-unsafe and harder to read results.

    Also static introspection is possible in C++ using overloading and/or templates... [...]
    True. But comparing this to Lisp-style or MetaOCaml-style static introspection makes me want to weep.
    In terms of writing complex, high-performance software, I don't see anything replacing C++ (not even Java). But for applications where performance is not an issue, I find the strong typing features of C++ to be an advantage...
    I believe it depends on the domain. Given complete choice, for most applications which do not require direct access to the hardware, I would probably use OCaml (if performances matter most) or Haskell (if readability matters most) and now maybe F# (for portability). For distributed applications, I would probably use Mozart. Or my own upcoming language :) For expert systems, I'll use Prolog. For databases, well, either SQL or some embedded SQL (what's the name ? link ?). Etc. In any case, I will choose a language with actual strong typing rather than a false sense of security.

    Think of how many web CGI scripts have security flaws because they are passing un-sanitized data from the GET/POST data to SQL queries or the command-line? Well, these flaws could have been prevented at COMPILE TIME, with a strongly typed language, such as C++. (Strings in different domains could have different types, forcing the programmer to run specialized functions for sanitizing one string before using it in a different domain)
    I agree

    --
    This troll is over. You can now resume a normal activity.
  111. what else? by YesIAmAScript · · Score: 1

    What else is the CPU doing while loading things up from disc? Hopefully running the game. Let's not get trapped in the "please wait, loading" metaphor here. Jak 2 showed years ago that you can load while the game is running. So my assumption is that while the DVD is loading stuff, the game is still runnning and the CPU is not idle.

    About your comments about only loading once at the beginning, well, I don't really believe that. Yeah, I can see it with some games, but not a lot of them. They'll load some specialized code level-by-level. I do hear you that you could do some more work in there, but personally, if the developer is going to take extra effort to figure out how to get some work done behind those time-wasting screens that it be real work that will save time later instead of make-work that just gets us back to no time deficit.

    I do agree 100MB is a stretch. Windows does perhaps have that much code, but it isn't all loaded at once. If you count all the apps and all the different drivers, it might come out to that much code. But again, you don't load every driver and every app at once. I don't expect games to have that much code.

    "A lot" of PS2 games didn't come on CD-ROM. All of them did for a year, but after that, it died out quickly. And given that the number of games for the platform in the first year was about the same as the number of games it gets in a month now, I wouldn't say "a lot" of PS2 games came on CD-ROM.

    I do agree on PS3, I think all games will come on DVD-ROM for at least a year. I think that games will stay on DVD-ROM for PS3 longer than they stayed on CD-ROM for PS2 simply because many PS3 games will be 360 games also, and 360 is DVD-only. PS2's only contemporary at launch was Dreamcast, and it already had more storage than a CD-ROM, so developers could "break free" a little. And most of them ignored Dreamcast anyway, giving another reason they could break free.

    Anyway, I still find your arguments uncompelling. Consoles have found little reason to change processors or such in the middle of a product cycle, woudln't they rather keep with the older one which certainly costs less anyway? A least until the new generation of machines comes out? Perhaps it would be good for long-term backward compatibility. But then again perhaps MIPS code is as good an intermediate distribution format as an arbitrary bytecode anyway.

    --
    http://lkml.org/lkml/2005/8/20/95
    1. Re:what else? by tepples · · Score: 1

      Yeah, I can see it with some games, but not a lot of them. They'll load some specialized code level-by-level.

      Bytecode interpreters work for per-level scripts. I seem to remember some PS2 platformer (Ratchet and Clank? Jak and Daxter?) having most of its game logic written in a Lisp dialect.

      PS2's only contemporary at launch was Dreamcast, and it already had more storage than a CD-ROM, so developers could "break free" a little.

      I agree, but I would like to point out that the GameCube disc's capacity wasn't really any bigger than the Dreamcast GD-ROM's. Would you claim that developers ignore the GameCube for the same reasons that they ignored the Dreamcast?

      Consoles have found little reason to change processors or such in the middle of a product cycle, woudln't they rather keep with the older one which certainly costs less anyway?

      Does Nintendo still sell N64? Will Sony still sell PSOne when the PS3 comes out?

      But then again perhaps MIPS code is as good an intermediate distribution format as an arbitrary bytecode anyway.

      Some might claim that Thumb bytecode (the 16-bit bytecode used on newer ARM CPUs), Java bytecode, or MSIL bytecode has better intermediate properties than MIPS or x86 bytecode.

  112. Why some need 64 bit precision, or more by G3ckoG33k · · Score: 1

    A very nice pdf article which shows why high precision is needed - for precision AND repeatability: Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications.