Slashdot Mirror


Octopiler to Ease Use of Cell Processor

Sean0michael writes "Ars Technica is running a piece about The Octopiler from IBM. The Octopiler is supposed to be compiler designed to handle the Cell processor (the one inside Sony's PS3). From the article: 'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip. So Cell has immense performance potential, but if you want to make it programable by mere mortals then you need a compiler that can ingest code written in a high-level language and produce optimized binaries that fit not just a programming model or a microarchitecture, but an entire multiprocessor system.' The article also has several links to some technical information released by IBM."

25 of 423 comments (clear)

  1. So don't hire mere mortals by ScrewMaster · · Score: 4, Funny

    Hire "Real Programmers". You know, the ones that only code in Assembler, and if they can't do it in Assembler then it isn't worth doing.

    --
    The higher the technology, the sharper that two-edged sword.
    1. Re:So don't hire mere mortals by SkyFire360 · · Score: 3, Funny

      So don't hire mere mortals, Hire "Real Programmers"

      Zeus was booked, Apollo was out of town, Hermes is still learning, Posideon just signed a 500-year agreement with Apple and Ares was killed off in God of War, so most of the good non-mortal programmers were out of the question. Hades claims to be a writer instead of a programmer, but most of the plot lines he comes up with ends up with everyone dead.

    2. Re:So don't hire mere mortals by Kadin2048 · · Score: 3, Funny

      Oh, come on. Everyone knows that Hades isn't a programmer any more, not since he got promoted to Management and got that whole division to run down there.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    3. Re:So don't hire mere mortals by LostCluster · · Score: 3, Funny

      Help Wanted: Game Programmers

      Must have 5 years experience coding in Assembly for the IBM Cell processor

  2. Makes you wonder by Egregius · · Score: 5, Insightful

    It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now. And 'the PS3 is due out in 2006.'

  3. Hello, Itanium... by general_re · · Score: 5, Insightful

    Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."

    --
    ABSURDITY, n.: A statement or belief manifestly inconsistent with one's own opinion.
    1. Re:Hello, Itanium... by Brain_Recall · · Score: 3, Informative
      More familiar than you may think. Some of the first Itanium compilers were spitting out nearly 40% NOP's, which are simply do-nothings. Because the IA-64 is explicilty parallel, instructions are generated and bundled together to be executed in parallel. The problem is branches, which destroy parallelism since they can change the code direction. On average, there are about 6 instructions between branches, so, such a design is very costly since the memory controller will be stuck getting inscructions that are empty. Of course, speculation and branch-prediction is generally a good way to increase performance, but like many things on the IA-64, that's left to the compilier to figure out. These are some of the exact same problems with the Cell, although, I wish I knew how the instruction set was. If it's more like Itanium, then they got all of the problems of the Itanium. If it's more of a direct approach, they may be able to pull it of because of the work in multi-processor systems that are done today. But, they simply can't expect the "super-computer" numbers Sony keeps flashing around. It may be good on certain tightly coded scientific calculations, but when it comes down to real-world code, it's stuck to the stripped-down Power4 that is coordinating the Cells.


      They didn't call it the Itanic for nothing...

    2. Re:Hello, Itanium... by timeOday · · Score: 3, Insightful
      Everybody prefers a simpler programming model, there's no doubt about that. But with the recent lack of progress in unicore speeds, something has to give, and apparently that "something" is programming complexity. While the PC world moves from 1 to 2 cores, the PS3 is jumping straight to 8. But going from 1 to 2 threads is a bigger conceptual jump than from 2 to 8 anyways.

      Fortunately for IBM and Sony, games are one place where hand-optimizing certain algorithms is still practical. I doubt they will place all their eggs in the octopiler basket. I can't imagine a compiler will find that much paralellism in code that isn't explicitly written to be parallel. Personally, I think they should instead focus on explicitly parallel libraries for common game algorithms like collision detection.

  4. Sadly, not a lotta FPU hardware. by mosel-saar-ruwer · · Score: 4, Insightful

    'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip.

    Sadly, there's almost no FPU hardware to speak of: 32-bit single precision floats in hardware; 64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees.

    Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?

    1. Re:Sadly, not a lotta FPU hardware. by stedo · · Score: 3, Insightful

      The basic purpose of the Cell is to make the PS3 work. The basic purpose of the PS3 is to play games. Games, as a rule, don't give a damn about 64-bit floating point. Games can get away with 32-bit because they don't need to be incredibly accurate, they just need to be fast. No gamer will care whether or not the trajectory of the bullet was out by 0.000000000023~ as long as it moves fluidly. So, in making a chip for gaming, you are far better off making 32-bit really fast than spending time and die space on perfecting useless 64-bit.

    2. Re:Sadly, not a lotta FPU hardware. by Animats · · Score: 3, Interesting
      Games, as a rule, don't give a damn about 64-bit floating point.

      You wish. In a big 32-bit game world, effort has to be made to re-origin the data as you move. Suppose you want vertices to be positioned to within 1cm (worse than that and you'll see it), and you're 10km from the origin. The low order bit of a 32-bit floating point number is now more than 1cm.

      It's even worse for physics engines, but that's another story.

      If the XBox 360 had simply been a dual- or quad-core IA-32, life would have been much simpler for the game industry.

    3. Re:Sadly, not a lotta FPU hardware. by OldManAndTheC++ · · Score: 3, Funny
      Are we really that tiny a proportion of the world's population?

      You math geeks need to multiply. :)

      --
      Soylent Green is peoplicious!
  5. Anyone having flashbacks? by SmallFurryCreature · · Score: 4, Insightful
    I seem to remember that the PS2 was a bitch to code for as well and that many of the early titles did not make full use of its capabilities. So?

    All this meant that as the PS2 aged it could 'keep up' because the coders kept getting better and better.

    Mere mortals do not write the latest graphics engines. I think there are a lot more tier1 people running around then /. seems to think. They are just to busy to comment here.

    All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan.

    If your a game company and your faced with the choice of either making just another engine OR spending some money on the kind of people that code for super computers and get an engine that will blow the competition out of the water then it will be a simple choice.

    Just because some guy on website finds it hard doesn't mean nobody can do it.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

  6. compilers ... by dioscaido · · Score: 4, Insightful

    ... can get you only so far. You need to have parallelism in mind when you write the high-level code, otherwise it may end up with needless dependence on serial execution that a compiler may not be able to break, reducing the benefits of such an architecture. It will be interesting to see how well games are suited for concurrent execution. Logically there are lots of computations that can be performed independently (AI, physics) but all of it has inherent interaction with a central data source (the game world).

  7. No, it's there alright by Daath · · Score: 4, Informative

    Nah, it's there. Download it, if you want ;)

    --
    Any technology distinguishable from magic, is insufficiently advanced.
  8. here's the real article... by advocate_one · · Score: 4, Informative
    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
  9. Re:Far too complex? by stedo · · Score: 3, Insightful

    Your average C programmer will not be developing the core code. Most likely, a group of very good coders will create a game engine, and the average C programmers can use the API that the highly-skilled, highly-paid engine coders created to hide unnecessary implementation details.

  10. Re:A summary of the idea here... by irexe · · Score: 4, Insightful
    Hypothesis: A compiler can be developed that takes serially written programs and auto-transforms them into parallel programs to exploit the benefits of parallelism.

    Parallel programming and automated parallelization have already been researched exhaustively throughout the last thirty years of the 20th century. The outcome of all this research is that it is not feasible/tractable to create a compiler that is capable of recongising parallelism, as you suggest. Compilers that can do this are sometimes called 'heroic' compilers, for the reason that the required transformations are so incredibly difficult, and heroic compilers that actually work (well) simply don't exist.

  11. Check out William Kahan at UC-Berkeley. by mosel-saar-ruwer · · Score: 3, Informative

    What benefit does increasing the precision of floats to 128bits bring? 64bits are more than enough for 99.9999% and the remaining cases can be handled in sw emulation. You can still not solve (without massive growth of the error terms) an equation system described by a Hilbert-matrix using Gaussean-elimination no matter how many bits you make the mantissa.

    Check out some of Professor Kahan's shiznat at UC-Berkeley:

    http://www.cs.berkeley.edu/~wkahan/
    In particular, look at the pictures of "Borda's Mouthpiece" [page 13] or "Joukowski's Aerofoil" [page 14] in the following PDF document:
    How Java's Floating-Point Hurts Everyone Everywhere
    http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
    WARNING: PDF DOCUMENT
    As I understand it, the "wrong" pictures are computed using Java's strict 64-bit requirement; the "right" pictures are computed by embedding the 64-bit calculation within Intel/AMD 80-bit extended doubles, performing the calculations in 80-bits worth of hardware, and then rounding back down to 64-bits to present the final answer.

    MORAL OF THE STORY: Precision matters. You can never have enough of it.

    1. Re:Check out William Kahan at UC-Berkeley. by greg_barton · · Score: 3, Informative

      How Java's Floating-Point Hurts Everyone Everywhere

      Gods.

      This is eight years old, (1998) and has been fixed for five years.

      FIVE YEARS. Join the 21st century, for god's sake.

      java.lang.StrictMath

      How long will people repeat this, even though it's been fixed for five years, in java 1.3? The latest beta VM is 1.6...

  12. Re:special compilers, expert programmer = DOA prod by theJML · · Score: 3, Insightful

    As a programmer, there's only so much that can be done in software. Sure you can parallize things, and you can come up with newer/faster algorthms, but if we didn't get dual proc systems, that would have been pointless. So with parallel procs, we get better parallel code. Hardware advances will create software advances, and new algorthms will direct hardware futures. This is the way the world works, and I think it's worked out fairly well so far. Lets see what the Cell and processors after it can do!

    --
    -=JML=-
  13. Re:Wasn't this the same mistake Sega made? by CarpetShark · · Score: 3, Interesting

    The Cell doesn't seem to be that complex. It's a powerful processor, with multiple elements and associated timing issues that you have to be aware of, but that's nothing like the Gamecube or similar, which had all these weird modes and issues that I can't even recall now, probably because my brain blocked it out ;) It'll be a challenge for people who don't know parallel programming, and it might frustrate some who imagine that a cpu with 8 SPEs should act like 8 entirely independent machines, each with its own SPE. But, I think games developers these days will take it as par for the course. There seems to be a trend now that only the biggest and best games companies actually develop game engines (ie, right low-level optimised code), while the other companies just rent the technology and develop levels and artwork and scripting based on that engine. So, the big question is how many of the engine developers will get on board early and if they'll be sufficiently inspired and up to the task. I think they'll find a way :)

  14. Re:Time to let C die ? by Bazzalisk · · Score: 3, Interesting
    C lacks a lot of features of more modern languages - but I think you'd be hard-pressed to find a modern autogarbage-collecting dynamicly typed modularise language which can handle low-level programming anything like as well as C.

    Certainly if I'm writing a pleasant little modern desktop application I'm going to write in Objective C or C# - would seem a little silly not to ... but for writing a compiler, a network stack, or gods forbid a kernel I don't know of anything that works even close to as well as C. C still has a niche, can't realy change that.

    --
    James P. Barrett
  15. I remember by DSP_Geek · · Score: 3, Interesting

    About ten years ago VM Labs came out with something not too far off conceptually from the Cell - vector instructions, local memory you had to DMA in and out of, 4 processors on a chip. It wasn't floating point, however, and the development tools were best described as rudimentary: the best way of debugging was to deliberately crash the box and examine the register dump barfed back over TCP/IP.

    They called a developer's conference in August 1998, where after the presentation a veteran game coder shrugged: "Another weird British assembler programming cult".

    The Cell strikes me the same way, and for the same reasons, although Big Blue likely has more development tool budget than VM ever did. Not to take anything away from the smart guys at IBM, but I suspect they'll have a fun time working around the Cell's limitations. I can tell them from experience that DMAed local memory will be much more of a pain in the ass than they can imagine, and unless they can guarantee sync in hardware they'll be wasting a bunch of time schlepping spinlocks in and out of memory. The vector stuff will also be nontrivial: the best way to make that usable, apart from having everyone write vector code from the git-go, would be to provide a stonking great math library in the style of the Intel Integrated Performance Primitives.

    As an aside, the PS3 is in the tradition of Sony not caring about who programs their machine: the PS1 was easier to code than the Saturn, which was a true horror, the PS2 upped the difficulty a fair bit, and now even experienced coders are bitching about the PS3. Meanwhile Microsoft is learning from their mistakes: the X360 is easier than the X1, and if you doubt that makes a difference, check out game development budgets and time to delivery. I don't care, really: I eat algorithms and machine code for breakfast, so this just means more jobs and money for me.

  16. Why the Cell processor is such a pain by Animats · · Score: 4, Interesting
    The basic problem with the Cell processor is that the SPEs each have only 256K of private memory, with uncached, although asynchronous, access to main memory. It's the unshared memory that's the problem.

    This architecture has been tried before, for supercomputers. Mostly unsuccessful supercomputers you've never heard of, such as the nCube and the BBN Butterfly. There's no hardware problem building such machines; in fact, it's much easier than building an efficient shared-memory machine with properly interlocked caches. But these beasts are tough to program. The last time around, everybody gave up, mainly because more vanilla hardware came along and it wasn't worth dealing with wierd architectures.

    The approach works fine if you're doing something that looks like "streaming", such as multi-stream MPEG compression or cell phone processing. If you want to do eight unrelated things on eight processors, you're good.

    But applying eight such processors to the same problem is tough. You've got to somehow break the problem into sections which can be pumped into the little CPUs in chunks that don't require access to any data in main memory. The chunks can't be bigger than 50-100K or so, because you have to double buffer (to overlap the transfers to and from main memory with computation) and you have to fit all the code to process the chunk into the same 256K. That's a program architecture problem; the compiler can't help you much there. Your whole program has to be architected around this limitation. That's the not-fun part.

    You have to make sure that you do enough work on each chunk to justify pumping it in and out of the Cell processor. It's like cluster programming, although the I/O overhead is much less.

    In some ways, C and C++ are ill-suited to this kind of architecture. There's a basic assumption in C and C++ that all memory is equally accessable, that the way to pass data around is by passing a pointer or reference to it, and that data can be linked to other data. None of that works well on the Cell. You need a language that encourages copying, rather than linking. Although it's not general-purpose, OpenGL shader language is such a language, with "in" and "out" parameters, no pointers, and no interaction between shader programs.

    Note that the Cell processors don't do the rendering in the PS3. Sony gave up on that idea and added a conventional NVidia graphics chip. (This guaranteed that the early games would work, even if they didn't do much with the Cell engines.) Since the cell processors didn't have useful access to the frame buffer, that was essential. So, unlike the PS2, the processors with the new architecture aren't doing the rendering.

    It's possible to work around all these problems, but development cost, time, and risk all go up. If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast. The Cell approach is something you do because you have to, not because you want to.