Octopiler to Ease Use of Cell Processor
Sean0michael writes "Ars Technica is running a piece about The Octopiler from IBM. The Octopiler is supposed to be compiler designed to handle the Cell processor (the one inside Sony's PS3). From the article: 'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip. So Cell has immense performance potential, but if you want to make it programable by mere mortals then you need a compiler that can ingest code written in a high-level language and produce optimized binaries that fit not just a programming model or a microarchitecture, but an entire multiprocessor system.' The article also has several links to some technical information released by IBM."
Hire "Real Programmers". You know, the ones that only code in Assembler, and if they can't do it in Assembler then it isn't worth doing.
The higher the technology, the sharper that two-edged sword.
It makes you wonder what the release-titles of the PS3 will be like, if they didn't have a decent compiler untill now. And 'the PS3 is due out in 2006.'
Sound familiar? "All we need to make it work as advertised is a really slick compiler that doesn't actually exist yet..."
ABSURDITY, n.: A statement or belief manifestly inconsistent with one's own opinion.
'Cell's greatest strength is that there's a lot of hardware on that chip. And Cell's greatest weakness is that there's a lot of hardware on that chip.
Sadly, there's almost no FPU hardware to speak of: 32-bit single precision floats in hardware; 64-bit double precision floats are [somehow?] implemented in software and bring the chip to its knees.
Why can't someone invent a chip for math geeks? With 128-bit hardware doubles? Are we really that tiny a proportion of the world's population?
Reading this is making me nostalgic for LISP machines and interpreter environments that let programmers really play with the machine instead of abstracting it away. What I'd really like to see is someone who takes all the potential for reconfiguration and parallelism and doesn't hide it away but makes it available.
All this meant that as the PS2 aged it could 'keep up' because the coders kept getting better and better.
Mere mortals do not write the latest graphics engines. I think there are a lot more tier1 people running around then /. seems to think. They are just to busy to comment here.
All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan.
If your a game company and your faced with the choice of either making just another engine OR spending some money on the kind of people that code for super computers and get an engine that will blow the competition out of the water then it will be a simple choice.
Just because some guy on website finds it hard doesn't mean nobody can do it.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
... can get you only so far. You need to have parallelism in mind when you write the high-level code, otherwise it may end up with needless dependence on serial execution that a compiler may not be able to break, reducing the benefits of such an architecture. It will be interesting to see how well games are suited for concurrent execution. Logically there are lots of computations that can be performed independently (AI, physics) but all of it has inherent interaction with a central data source (the game world).
You were lucky. We had to write our own microinstructions using a 12 bit ALU with no barrel shifter, and then burn them into ROM using a magnifying glass to vaporise the aluminium interconnect. And you had hard disks? We had to hand code on paper tape using a leather punch to make the holes. And we thought we were lucky. Next door, the guys in Alan Turing's department were having to stick together infinite paper tapes for some machine he made in the 30s.
Pining for the fjords
Cells big programming problem goes right down to each SPE: The assembler commands for which cannot actually address main memory! Every time information is read into / out of the 256K "local storage" on each SPE, a dma command must be issued. Now, while this is Cell's greatest asset (Execution continues while seriously slow memory movement occurs), it is also difficult to work with.
Your average C programmer doesn't take architecture into account, and so there's no user indication of whether a variable can be paged to maim memory, if code needs to be fetched, and crucially: how far in advance data can be pre-loaded into the local storage, to avoid the SPE hanging on a memory operation.
I'd guess that this new compiler will try to address these issues, which is suggested by the article.
Nah, it's there. Download it, if you want ;)
Any technology distinguishable from magic, is insufficiently advanced.
enjoy... :)
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
Parallel programming and automated parallelization have already been researched exhaustively throughout the last thirty years of the 20th century. The outcome of all this research is that it is not feasible/tractable to create a compiler that is capable of recongising parallelism, as you suggest. Compilers that can do this are sometimes called 'heroic' compilers, for the reason that the required transformations are so incredibly difficult, and heroic compilers that actually work (well) simply don't exist.
Your post reminds me of the old adage, "Any sufficiently advanced fanboyism is indistinguishable from trolling."
English is easier said than done.
If a CPU needs a special compiler in order to give good performance, it's basically dead; there are simply too many different applications that do binary code generation.
Also, the division into "expert programmer" and "regular programmer" is silly. Most coding is done by people who aren't experts in the cell architecture (or any other architecture). That's not because people are too stupid to do this sort of thing, it's because it's not worth the investment.
If Cell can't deliver top-notch performance with a simple compiler back-end and regular programmers who know how to write decent imperative code, then Cell is going to lose. Hardware designers really need to get over the notion that they can push off all the hard stuff into software. People want hardware that works reliably, predictably,and with a minimum of software complexity.
Maybe CISC wasn't such a bad idea after all--you may get less bang for the buck, but at least you get a predictable bang for the buck.
I'm glad to see some real progress in the processor world. We are so guided by the enterprise market that we've had to support x86 WAY longer than we should have. The cell looks like it has a real chance of becoming the next big advancement. For one, IBM is working heavily with the open source community. This is possibly one of the best things they could have done to help the cell. By doing this, you make open source developers happy and more inclined to port over their applications. One of the hardest things to do in getting a new arch out is getting application support, and they've pretty much guaranteed a modest amount of applications by going open source. The nokia 770 is a pefect example of this. They've supported open source and made available more than enough tools for quick porting of applications, and look at the huge amount available already in the first few months. The nokia 770 probably sets records in how many applications were ported in such a short period of time.
Make the developers happy, and they will port their apps. With large amounts of available applications, the consumers will buy. When the consumers buy, you have a successful new arch.
If an officer ever threatens to taze you, say you have a pacemaker.
What benefit does increasing the precision of floats to 128bits bring? 64bits are more than enough for 99.9999% and the remaining cases can be handled in sw emulation. You can still not solve (without massive growth of the error terms) an equation system described by a Hilbert-matrix using Gaussean-elimination no matter how many bits you make the mantissa.
Check out some of Professor Kahan's shiznat at UC-Berkeley:
In particular, look at the pictures of "Borda's Mouthpiece" [page 13] or "Joukowski's Aerofoil" [page 14] in the following PDF document: As I understand it, the "wrong" pictures are computed using Java's strict 64-bit requirement; the "right" pictures are computed by embedding the 64-bit calculation within Intel/AMD 80-bit extended doubles, performing the calculations in 80-bits worth of hardware, and then rounding back down to 64-bits to present the final answer.MORAL OF THE STORY: Precision matters. You can never have enough of it.
"All that really matters is wether the launch titles will be 'good' enough. Then the full power of the system can be unleashed over its lifespan."
Yea, but what's the full power of a system? Prettier graphics?
The "full power" of the PS1 seemed to be that its games became marginally less ugly as time went on, although FF7 was very well done since it didn't use textured polygons for most of it (the shading methods were much sexier). When I think about FF9, I don't like it more because it uses the PS1 at a fuller power level than FF7, I like it better because the story is cuter.
I like PGR2 better than PGR3 because PGR2 has cars I know and love from Initial D and my own experience, whereas PGR3 has super cars I've never driven or seen before.
I don't think Rez taxes the PS2 more than Wild Arms 3, but I like it better than Wild Arms 3. I also like most of the iterations of DDR, and they're not taxing in the slightest.
The full power of a system is not its graphics capability or how easy it is to control or its controller or its games -- it's the entire package. Does the PS3 have a good package? The Xbox 360 sure doesn't -- the controller power-up button is nice, but there is nothing new or interesting; it's a rehash. The PS3 is a rehash too.
The Sega Saturn was a rehash of the 8-bit and 16-bit 2D eras. It died. The PS3 and Xbox 360 are rehashes of the 64-bit and 128-bit 3D gaming eras.
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
A key problem with CISC was that doing virtual memory and handling page faults on a CISC processor was so incredibly insanely complicated that you ended up going insane and designing your pipeline could throw multiple page faults on one instruction and you had a god-awful mess to clean up.
The problem with the Cell is actually pretty interesting. They decided to go for in-order CPU's for the SPE's which means that to get good performance you sure as hell better know what your dependencies are and take into account memory latency etc.
OTOH modern RISC CPU's normally do nice out-of-order stuff which whilst making the CPU more complicated makes life easier for the programmer - compiler.
Itanium took the clean approach - and it flies on FP workloads that the compiler can do a good job on. The PS3 (like Itanium) should rock - once programmers get lots of nice little kernels that do groovy stuff (think super shader programs) in the SPE's. Just that will make the eye candy pretty.
The counter argument is the 'Look at what happened with the i860'. It had amazing performance on kernels but was just totaly evil to program and compiler writers pulled out their hair.
I don't know enough about modern game programming to know if the PS3 route is a good one to take - and it's easy to bitch at Sony for going too far - OTOH look at the PS2 games now vs at release. The PS3 games should slowly get better and better and better if they don't crash and burn and give up...
--Tarp
Yeah, but the advantage of doing it this way is that the 2nd transition (from risc back to risc) is really quick!
Let me summarize
- take one of the most unsafe, slowest-to-compile, pitfall-ish, unspecified languages in existence (ok, I might be exagerating on the "unspecified" part)
- add even more #pragmas and other half-specified annotations which are going to change the result of a program near invisibly
- don't provide a debugger
- require even more interactions between the programmer and the profiler, just to understand what's going on with his code
- add unguaranteed and slow static analysis
- ...
- lots of money ?
Am I the only one (with Unreal's Tim Sweeney) who thinks that now might be the right time to let C die, or at least return to its assembly-language niche ? I mean, C is a language based on technologies of the 50s 60s (yes, I know, the language itself only came around in the late 60s), and it shows. Since then, the world has seenAnd what are C and C++ programmers stuck with ?
So, now, we hear that IBM is trying to maintain C alive, under perfusion. IBM, please stop. Let granddaddy rest in peace. He had his time of glory, but now, he deserves that rest.
Oh, and just for the record. I program in C/C++ quite often as an open-source developer and my field is distributed computing. But I try to keep these subjects as far away from each other as I can.
(well, venting off feels good)
This troll is over. You can now resume a normal activity.
"I recall a common complaint by development houses about Sega consoles were that they were very difficult to code for because of hardware complexity. Isn't Sony now making the very same mistake that doomed Sega's console business?"
Sega didn't make a single mistake, they made a LOT of them. I imagine you're thinking of the Saturn. It was supposed to be a SNES killer. In other words, all the fancy technology it had was meant to throw sprites on the screen. Then Sony showed up with it's fancy ass 3D architecture, and Sega said oops. So they band-aided some hardware in there to perform 3D functions. Unfortunately, this added another processor to the mix. The result? It was a bitch to program for, and it never really reached the performance levels of the PS. The result? Saturn games looked inferior to PS games. However, in the 2D fighter realm, the Saturn did quite well. As I recall, the Saturn was actually fairly successful in Japan for this.
The Genesis was pretty easy to program for, at least compared to the SNES. The SNES had a weaker CPU, but it had extra hardware to beef up its graphics. In the end, the SNES won, but not without a couple of years of Genesis superiority. I remember lots of people bitching about the SNES slowing down when it came to a lot of sprites on the screen. This complaint died when Donkey Kong Country hit the scene.
The Dreamcast... well I don't know as much about it. As I understand it, it wasn't too hard to program for. It even had some great hardware for throwing textures on the screen. This gave the DC an edge against the first generation of PS2 games despite having considerably weaker specs.
The Saturn definitely hurt Sega. One could attribute this to the difficulty of programming for the system, and they'd likely be correct. PS ports to the Saturn often came many months after the original release, and they simply didn't do as well graphically. Sega had also flooded the market with hardware. Between the Genesis, the Sega CD, the 32X, and the Saturn, the market was pretty confused. Sega wasn't focused where they should have been and it came back and bit them in the keyster.
Sega was in pretty sad shape financially when the DC was released. I vaguely recall that the president of Sega at the time had given up most of his shares of stock to keep the company afloat. (I want to say it was around 100 million dollars roughly, but I don't recall the specifics. I do remember thinking "wow, that's one dedicated dude.") In the end, though, Sega needed several hundred million dollars in order to get 10 million DCs out there in order to really start raking in money. But they simply didn't have the assets to do it. Kerplunk, the Dreamcast died, and Sega focused on software.
With all that said, I'm sure a number of people will chime in with their own contribuatory reasons for Sega's demise. They wouldn't necessarily be wrong, either. It took a number of things to take Sega down, not one key one.
"Speaking of which, is XB360 easier to code for than PS3?"
I read an interview with Carmack not too long ago, and his answer was basically 'yes'. He did NOT go one to say that the difference would be a huge huge factor or not, though. Frankly, I have difficulty imagining it making all that big of difference, at least from a financial point of view. As these machines get more powerful, the weight of development shifts more towards the artists than the actual programmers. That is just an opinion, though. I'm a 3D artist by trade. Maybe my view is biased. But I know how much it costs to keep me seated at my desk. I know about how the work piles up by orders of magnitude as projects get more ambitious. And I have a pretty good sense of how artistry in video games has evolved over the last decade. Compare Super Mario 64 to Resident Evil 4 and you'll see what I mean.
"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
As a programmer, there's only so much that can be done in software. Sure you can parallize things, and you can come up with newer/faster algorthms, but if we didn't get dual proc systems, that would have been pointless. So with parallel procs, we get better parallel code. Hardware advances will create software advances, and new algorthms will direct hardware futures. This is the way the world works, and I think it's worked out fairly well so far. Lets see what the Cell and processors after it can do!
-=JML=-
these Octopiler coders are doing their work for the love of coding. If they want a salary for this then they're not worth their weight in salt.
[/kfg mode off]
--- Grow a pair, liberals... stop letting the Republicans bully you!
Pretty much all modern CPUs need special compilers to give good performance. Unless you can keep track of the number of pipeline stages, the degree of superscalar architecture, etc. you will get sub-optimal code. The P4, for example, can have 140 instructions in-flight at once. Can you keep track of your code over a 140 instruction window and make sure there are no hazards? If not, then you're probably better off using a 'special' compiler.
The days when a compiler could just turn each statement into a fixed instruction sequence are long gone.
Maybe CISC wasn't such a bad idea after all--you may get less bang for the buck, but at least you get a predictable bang for the buck.
No, actually, you don't. One of the key features of RISC was that instructions took the same time to execute. On a CISC architecture, instruction timings are far from constant. Some instructions (have you looked at the x86 instruction set? It even has string manipulation instructions) can take several times longer to execute than others, which makes generating code very difficult. For example, you might know that it takes n instructions for a load to complete if accessing from memory and m if accessing from cache. How many instructions is that? That's much, much easier to work out on RISC. To prevent pipeline stalls, you need to make sure that you have a minimum of m instructions (and ideally m) between your load and your first operation that depends on the that data. Try doing that with a fixed-timing instruction set (RISC), and then with a variable-timing instruction set (CISC), and see which is easier.
I am TheRaven on Soylent News
The Cell doesn't seem to be that complex. It's a powerful processor, with multiple elements and associated timing issues that you have to be aware of, but that's nothing like the Gamecube or similar, which had all these weird modes and issues that I can't even recall now, probably because my brain blocked it out ;) It'll be a challenge for people who don't know parallel programming, and it might frustrate some who imagine that a cpu with 8 SPEs should act like 8 entirely independent machines, each with its own SPE. But, I think games developers these days will take it as par for the course. There seems to be a trend now that only the biggest and best games companies actually develop game engines (ie, right low-level optimised code), while the other companies just rent the technology and develop levels and artwork and scripting based on that engine. So, the big question is how many of the engine developers will get on board early and if they'll be sufficiently inspired and up to the task. I think they'll find a way :)
Let's also not lose sight of the big picture with regard to the Cell: the 8 parallel vector processors are coupled with a single CPU core derived from the PowerPC chip. So the overarching structure of the Cell isn't all that different conceptually from a typical CPU-GPU setup in most PCs today.
The problems IBM programmers are having are emblematic of the problems that the PC industry is going to be facing in a few years. Multi-core is the future of PC performance. Increasing GHz and IPC of single processors has pretty much hit a wall. Creating Dual and multi-core CPUs is the best approach we have left for increasing performance with future increases in transistor count/density.
The problem is that single threaded programs will run just as slowly on your quad-core 'Core-Quattro' in 2008, as they did on your old Pentium 4 - c. 2005. Great, yeah, I know, server loads parallelize very nicely (witness the miracle of Niagra), but consumer grade CPUs are where the volume is at, and people are going to have to notice a real difference in performance in order to stay on the hardware upgrade treadmill. This necessitates that Intel/AMD/IBM come up with new programming models that make it easy to parallelize existing code. Parallelized libraries and frameworks are all well and good, but it will be 20 years before everyone gets around to recoding the existing codebade to the the new platform - and most of them are probably not going to generate optimal code.
No, what we need are compilers that take programs written in a serial fashion, and emit code that scales well on multiple processors. The problems with the PS3 are only the beginning.
>We can only wonder how things would have been if Intel had opened things up like IBM has, instead of making it so people have to figure things out on their own.
/is/ out there, you just need to do a lot of work to get it.
It's not quite as clean as it looks. "Full specifications" doesn't include any information on instruction latencies, cache performance, etc. They've documented the platform itself, but not the specific implementation. This makes optimization difficult.
I've had to distill information from several publications to determine even basic things like how many cycles it takes to retire a floating point add. So the information
About ten years ago VM Labs came out with something not too far off conceptually from the Cell - vector instructions, local memory you had to DMA in and out of, 4 processors on a chip. It wasn't floating point, however, and the development tools were best described as rudimentary: the best way of debugging was to deliberately crash the box and examine the register dump barfed back over TCP/IP.
They called a developer's conference in August 1998, where after the presentation a veteran game coder shrugged: "Another weird British assembler programming cult".
The Cell strikes me the same way, and for the same reasons, although Big Blue likely has more development tool budget than VM ever did. Not to take anything away from the smart guys at IBM, but I suspect they'll have a fun time working around the Cell's limitations. I can tell them from experience that DMAed local memory will be much more of a pain in the ass than they can imagine, and unless they can guarantee sync in hardware they'll be wasting a bunch of time schlepping spinlocks in and out of memory. The vector stuff will also be nontrivial: the best way to make that usable, apart from having everyone write vector code from the git-go, would be to provide a stonking great math library in the style of the Intel Integrated Performance Primitives.
As an aside, the PS3 is in the tradition of Sony not caring about who programs their machine: the PS1 was easier to code than the Saturn, which was a true horror, the PS2 upped the difficulty a fair bit, and now even experienced coders are bitching about the PS3. Meanwhile Microsoft is learning from their mistakes: the X360 is easier than the X1, and if you doubt that makes a difference, check out game development budgets and time to delivery. I don't care, really: I eat algorithms and machine code for breakfast, so this just means more jobs and money for me.
This architecture has been tried before, for supercomputers. Mostly unsuccessful supercomputers you've never heard of, such as the nCube and the BBN Butterfly. There's no hardware problem building such machines; in fact, it's much easier than building an efficient shared-memory machine with properly interlocked caches. But these beasts are tough to program. The last time around, everybody gave up, mainly because more vanilla hardware came along and it wasn't worth dealing with wierd architectures.
The approach works fine if you're doing something that looks like "streaming", such as multi-stream MPEG compression or cell phone processing. If you want to do eight unrelated things on eight processors, you're good.
But applying eight such processors to the same problem is tough. You've got to somehow break the problem into sections which can be pumped into the little CPUs in chunks that don't require access to any data in main memory. The chunks can't be bigger than 50-100K or so, because you have to double buffer (to overlap the transfers to and from main memory with computation) and you have to fit all the code to process the chunk into the same 256K. That's a program architecture problem; the compiler can't help you much there. Your whole program has to be architected around this limitation. That's the not-fun part.
You have to make sure that you do enough work on each chunk to justify pumping it in and out of the Cell processor. It's like cluster programming, although the I/O overhead is much less.
In some ways, C and C++ are ill-suited to this kind of architecture. There's a basic assumption in C and C++ that all memory is equally accessable, that the way to pass data around is by passing a pointer or reference to it, and that data can be linked to other data. None of that works well on the Cell. You need a language that encourages copying, rather than linking. Although it's not general-purpose, OpenGL shader language is such a language, with "in" and "out" parameters, no pointers, and no interaction between shader programs.
Note that the Cell processors don't do the rendering in the PS3. Sony gave up on that idea and added a conventional NVidia graphics chip. (This guaranteed that the early games would work, even if they didn't do much with the Cell engines.) Since the cell processors didn't have useful access to the frame buffer, that was essential. So, unlike the PS2, the processors with the new architecture aren't doing the rendering.
It's possible to work around all these problems, but development cost, time, and risk all go up. If somebody builds a low-priced 8-core shared memory multiprocessor, the Cell guys are toast. The Cell approach is something you do because you have to, not because you want to.
Yes Itanium has failed to grab anything like the market share it was meant to. But that has nothing to do with its architecture. There's an arstechnia review from last year (I think) which talked about the Itanium architecture, and they were very up beat and complementary about it. The summary of that article was that as fabrication tech improves and die shrinks follow, and it becomes possible to cram more cores and larger and larger caches on to a chip, the Itanium architecture has more scope to grow and perform than any of its current competition. EPIC loves large caches.
There is only one real reason why Itanium has been such a flop so far, and that's x86-64. Intel had no intention of bolting 64 bit tech onto the x86 architecture. If you wanted 64 bit computing you were meant to go Itanium. End of story. That was the way Itanium was going to get its market share, and large volumes were going to drive the costs down. Intel either didn't see AMD coming, or didn't see what they were doing as a threat until it was too late. The x86-64 bomb shell, when it hit, threw Intel into complete disarray. Not only was x86-64 way cheaper than Itanium, but it out performed it and it offered seamless backward compatibility. The Itanium volume market plan was doomed from that moment on. As a consequence Intel had to scrap their x86 road map and re-draw it with their own 64 bit implementation, i.e. EM64T. They've been playing catch up ever since.
A side effect of the Intel's change in direction and focus has been a change in where they've put their resources. Itanium got starved of the resources it was originally planned to have and as a consequence Montecito is way late and isn't quite the kick ass design it was meant to be. Intel's partners like HP have suffered as a consequence.
Never the less Itanium is not going away, and even though Montecito is late, the current crop of Itanium chips are no slouch. When Montecito arrives it's going to give a much needed boost to HP Itanium sales. That's what they hope for anyway.