Octopiler to Ease Use of Cell Processor
Sean0michael writes "Ars Technica is running a piece about The Octopiler from IBM. The Octopiler is supposed to be compiler designed to handle the Cell processor (the one inside Sony's PS3). From the article: 'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip. So Cell has immense performance potential, but if you want to make it programable by mere mortals then you need a compiler that can ingest code written in a high-level language and produce optimized binaries that fit not just a programming model or a microarchitecture, but an entire multiprocessor system.' The article also has several links to some technical information released by IBM."
Hire "Real Programmers". You know, the ones that only code in Assembler, and if they can't do it in Assembler then it isn't worth doing.
The higher the technology, the sharper that two-edged sword.
It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now. And 'the PS3 is due out in 2006.'
Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."
ABSURDITY, n.: A statement or belief manifestly inconsistent with one's own opinion.
'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip.
Sadly, there's almost no FPU hardware to speak of: 32-bit single precision floats in hardware; 64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees.
Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?
All this meant that as the PS2 aged it could 'keep up' because the coders kept getting better and better.
Mere mortals do not write the latest graphics engines. I think there are a lot more tier1 people running around then /. seems to think. They are just to busy to comment here.
All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan.
If your a game company and your faced with the choice of either making just another engine OR spending some money on the kind of people that code for super computers and get an engine that will blow the competition out of the water then it will be a simple choice.
Just because some guy on website finds it hard doesn't mean nobody can do it.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
... can get you only so far. You need to have parallelism in mind when you write the high-level code, otherwise it may end up with needless dependence on serial execution that a compiler may not be able to break, reducing the benefits of such an architecture. It will be interesting to see how well games are suited for concurrent execution. Logically there are lots of computations that can be performed independently (AI, physics) but all of it has inherent interaction with a central data source (the game world).
Nah, it's there. Download it, if you want ;)
Any technology distinguishable from magic, is insufficiently advanced.
enjoy... :)
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
Your average C programmer will not be developing the core code. Most likely, a group of very good coders will create a game engine, and the average C programmers can use the API that the highly-skilled, highly-paid engine coders created to hide unnecessary implementation details.
Parallel programming and automated parallelization have already been researched exhaustively throughout the last thirty years of the 20th century. The outcome of all this research is that it is not feasible/tractable to create a compiler that is capable of recongising parallelism, as you suggest. Compilers that can do this are sometimes called 'heroic' compilers, for the reason that the required transformations are so incredibly difficult, and heroic compilers that actually work (well) simply don't exist.
What benefit does increasing the precision of floats to 128bits bring? 64bits are more than enough for 99.9999% and the remaining cases can be handled in sw emulation. You can still not solve (without massive growth of the error terms) an equation system described by a Hilbert-matrix using Gaussean-elimination no matter how many bits you make the mantissa.
Check out some of Professor Kahan's shiznat at UC-Berkeley:
In particular, look at the pictures of "Borda's Mouthpiece" [page 13] or "Joukowski's Aerofoil" [page 14] in the following PDF document: As I understand it, the "wrong" pictures are computed using Java's strict 64-bit requirement; the "right" pictures are computed by embedding the 64-bit calculation within Intel/AMD 80-bit extended doubles, performing the calculations in 80-bits worth of hardware, and then rounding back down to 64-bits to present the final answer.MORAL OF THE STORY: Precision matters. You can never have enough of it.
As a programmer, there's only so much that can be done in software. Sure you can parallize things, and you can come up with newer/faster algorthms, but if we didn't get dual proc systems, that would have been pointless. So with parallel procs, we get better parallel code. Hardware advances will create software advances, and new algorthms will direct hardware futures. This is the way the world works, and I think it's worked out fairly well so far. Lets see what the Cell and processors after it can do!
-=JML=-
The Cell doesn't seem to be that complex. It's a powerful processor, with multiple elements and associated timing issues that you have to be aware of, but that's nothing like the Gamecube or similar, which had all these weird modes and issues that I can't even recall now, probably because my brain blocked it out ;) It'll be a challenge for people who don't know parallel programming, and it might frustrate some who imagine that a cpu with 8 SPEs should act like 8 entirely independent machines, each with its own SPE. But, I think games developers these days will take it as par for the course. There seems to be a trend now that only the biggest and best games companies actually develop game engines (ie, right low-level optimised code), while the other companies just rent the technology and develop levels and artwork and scripting based on that engine. So, the big question is how many of the engine developers will get on board early and if they'll be sufficiently inspired and up to the task. I think they'll find a way :)
Certainly if I'm writing a pleasant little modern desktop application I'm going to write in Objective C or C# - would seem a little silly not to ... but for writing a compiler, a network stack, or gods forbid a kernel I don't know of anything that works even close to as well as C. C still has a niche, can't realy change that.
James P. Barrett
About ten years ago VM Labs came out with something not too far off conceptually from the Cell - vector instructions, local memory you had to DMA in and out of, 4 processors on a chip. It wasn't floating point, however, and the development tools were best described as rudimentary: the best way of debugging was to deliberately crash the box and examine the register dump barfed back over TCP/IP.
They called a developer's conference in August 1998, where after the presentation a veteran game coder shrugged: "Another weird British assembler programming cult".
The Cell strikes me the same way, and for the same reasons, although Big Blue likely has more development tool budget than VM ever did. Not to take anything away from the smart guys at IBM, but I suspect they'll have a fun time working around the Cell's limitations. I can tell them from experience that DMAed local memory will be much more of a pain in the ass than they can imagine, and unless they can guarantee sync in hardware they'll be wasting a bunch of time schlepping spinlocks in and out of memory. The vector stuff will also be nontrivial: the best way to make that usable, apart from having everyone write vector code from the git-go, would be to provide a stonking great math library in the style of the Intel Integrated Performance Primitives.
As an aside, the PS3 is in the tradition of Sony not caring about who programs their machine: the PS1 was easier to code than the Saturn, which was a true horror, the PS2 upped the difficulty a fair bit, and now even experienced coders are bitching about the PS3. Meanwhile Microsoft is learning from their mistakes: the X360 is easier than the X1, and if you doubt that makes a difference, check out game development budgets and time to delivery. I don't care, really: I eat algorithms and machine code for breakfast, so this just means more jobs and money for me.
This architecture has been tried before, for supercomputers. Mostly unsuccessful supercomputers you've never heard of, such as the nCube and the BBN Butterfly. There's no hardware problem building such machines; in fact, it's much easier than building an efficient shared-memory machine with properly interlocked caches. But these beasts are tough to program. The last time around, everybody gave up, mainly because more vanilla hardware came along and it wasn't worth dealing with wierd architectures.
The approach works fine if you're doing something that looks like "streaming", such as multi-stream MPEG compression or cell phone processing. If you want to do eight unrelated things on eight processors, you're good.
But applying eight such processors to the same problem is tough. You've got to somehow break the problem into sections which can be pumped into the little CPUs in chunks that don't require access to any data in main memory. The chunks can't be bigger than 50-100K or so, because you have to double buffer (to overlap the transfers to and from main memory with computation) and you have to fit all the code to process the chunk into the same 256K. That's a program architecture problem; the compiler can't help you much there. Your whole program has to be architected around this limitation. That's the not-fun part.
You have to make sure that you do enough work on each chunk to justify pumping it in and out of the Cell processor. It's like cluster programming, although the I/O overhead is much less.
In some ways, C and C++ are ill-suited to this kind of architecture. There's a basic assumption in C and C++ that all memory is equally accessable, that the way to pass data around is by passing a pointer or reference to it, and that data can be linked to other data. None of that works well on the Cell. You need a language that encourages copying, rather than linking. Although it's not general-purpose, OpenGL shader language is such a language, with "in" and "out" parameters, no pointers, and no interaction between shader programs.
Note that the Cell processors don't do the rendering in the PS3. Sony gave up on that idea and added a conventional NVidia graphics chip. (This guaranteed that the early games would work, even if they didn't do much with the Cell engines.) Since the cell processors didn't have useful access to the frame buffer, that was essential. So, unlike the PS2, the processors with the new architecture aren't doing the rendering.
It's possible to work around all these problems, but development cost, time, and risk all go up. If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast. The Cell approach is something you do because you have to, not because you want to.