Unleashing the Power of the Cell Broadband Engine
An anonymous reader writes "IBM DeveloperWorks is running a paper from the MPR Fall Processor Forum 2005 explores programming models for the Cell Broadband Engine (CBE) Processor, from the simple to the progressively more advanced. With nine cores on a single die, programming for the CBE is like programming for no processor you've ever met before."
The Cell Architecture grew from a challenge posed by Sony and Toshiba to provide power-efficient and cost-effective high-performance processing for a wide range of applications, including the most demanding consumer appliance: game consoles. Cell - also known as the Cell Broadband Engine Architecture (CBEA) - is an innovative solution whose design was based on the analysis of a broad range of workloads in areas such as cryptography, graphics transform and lighting, physics, fast-Fourier transforms (FFT), matrix operations, and scientific workloads. As an example of innovation that ensures the clients' success, a team from IBM Research joined forces with teams from IBM Systems Technology Group, Sony and Toshiba, to lead the development of a novel architecture that represents a breakthrough in performance for consumer applications. IBM Research participated throughout the entire development of the architecture, its implementation and its software enablement, ensuring the timely and efficient application of novel ideas and technology into a product that solves real challenges; More...
"I just want to draw a flowchart and have the compiler and realtime scheduler distribute processes and data among the hardware resources. If we are getting a new architecture and new "programming models", and therefore new compilers and kernels, how about a new IDE paradigm."
Bingo, sir.
"Derp de derp."
Damn you marketing droids! This has nothing to do with broadband at all.
So yes, I want a Cell-based devkit now, 'cuz this sounds like _fun_ :-)
Regards,
John
Falling You - beautiful
from the article and if the ps3 cell cpu is even half the processor than this monster is i say that game companies will need a lot of real programmers to make real good games (as if they cared).
And I have prayed unto You, O Lord U**X in the time of the Will of Linux.
It's Saturday night and I'm all alone here, cut me some slack...
Its when you take old code from previous things and then try to do a direct port that you will see some issues in performance hits. But if designed from the ground up in terms of the code for a cell environment (or ANY CPU architecture), it is all in the hands of the few top level software design architechs to properly structure the overall workings of the game's code. Once the structure is correct, sending the bits and pieces that need to be made to the rest of the code monkeys is no problem, they just need to follow the UML or whatever other design docs they are specifically suppose to implement.
We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
The PS3 has 512M of memory by default. It is half Rambus XDR and half GDDR3, but both segments of memory can be addressed by both the processor and the GPU.
Anonymous Luddite: "What do you think of the dehumanizing effects of the Internet?"
Andy Grove: "Not Much."
Just cut Sony out of the loop, and have IBM do the work. They could re-revolutionize the desktop PC market.
What do you think Page 0 was for?
Damn, nothing gets me fired up on a Saturday night like the thought of a nine way!
can it do infinite loops in 5 seconds?
Waaaaiiit a minute. This is the same DRM the heck out of everything Sony we are talking about here right? There is no chance they are going to allow a linux distribution to run easily on this platform. They are probably encrypting everything like Microsoft is doing with the XBox360.
People keep forgetting that Sony and Microsoft are in absolutely no way interested in providing you with a cheap computing platform for your linux cluster endevours at their loss. They make money off of selling games for these things. If people find ways of loading their own software on these boxes you better beleive they are going to start filing lawsuits. Not that I agree with that, but that will be what happened.
Every PS3 hard drive is shipping with Linux onboard.
... of the promotional material for the Sega Saturn from a few years back?
I remember right about the time it came out, there was a lot of hype about it's architecture. Two main processors and a bunch of dedicated co-processors, fast memory bus, etc., etc. I don't remember any more specifics, but at the time it seemed very impressive. Of course it flopped spectacularly, because apparently the thing was a huge pain in the ass to program for and the games never materialized. Or at least that's the most often spoken reason that I've heard.
Anyway, and I'm sure I'm not the first person to have realized this, Cell is starting to sound the same way. The technical side is being hyped and seems clearly leaps and bounds ahead of the competition, but one has to wonder what MS is doing to prevent themselves from producing another Saturn on the programming side.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
IBM will also be releasing Cell-based Blade servers next year, so pick one up if you're serious about development!
The Cell machines are about equally painful to program, but because they're cheaper, they have more potential applications than the nCube did. Cell phone sites, multichannel audio and video processing, and similar easily-parallelized stream-type tasks fit well with the cell model. It's not yet clear what else does.
Recognize that the cell architecture is inherently less useful than a shared-memory multiprocessor. It's an attempt to get some reasonable fraction of the performance of an N-way shared memory multiprocessor without the expensive caches and interconnects needed to make that work. It's not yet clear if this is a price/performance win for general purpose computing. Historically, architectures like this have been more trouble than they're worth. But if Sony fields a few hundred million of them, putting up with the pain is cost-justified.
It's still not clear if the cell approach does much for graphics. The PS3 is apparently going to have a relatively conventional nVidia part bolted on to do the back end of the graphics pipeline.
I'm glad that I don't have to write a distributed physics engine for this thing.
You can run a 68000 or 80386 emulator in each of the SPUs, or just run lots of native processes in parallel.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Both Sony and MS realized they couldn't make a single true general-purpose CPU with the performance they wanted for a price they could afford to sell in their consoles.
Sony went to a CPU, GPU and 7 co-processors (Cell).
MS went to a 3 CPUs with vector-assist and a GPU.
Both companies are going to need to spend a lot of time and money on developer tools to help their developers more easily take advantage of their oddball hardware, or else they will end up right where Saturn did.
I guess the good news for both companies is that there is no alternative (like PS1 was to Saturn) which is straightforward and thus more attractive.
PS2 requires programming a specialized CPU with localized memory (the Emotion Engine) and it seems to get by okay. So developers can adapty, given sufficient financial advange to doing so.
http://lkml.org/lkml/2005/8/20/95
Time to port the Lego Mindstorms development environment to the Cell processor!
I've wondered where the spammers have been hiding all this time. You fuckers spammed the hell out of my guestbook while I made no promises not to delete any messages. Here you have Slashdot, one of the world's most popular discussion forums and they do not delete (practically) any comments. Sure you will be moderated to -1 in no time, but the message will still be there for everyone to see, if they choose to.
Usually you are much more inventive. What the hell took you shitheads this long this time?
Note to moderators: the user "5, Troll" likes to cut and paste posts from other sites to gain karma. This one was found on the DeveloperWorks site with a quick google search.
Isn't this basically the dataflow paradigm ?
I think it should be possible to make any programming language automatically spread its work across multiple processors simply by analyzing which operations depend on which to generate "wait until operationx X is complete" points (essentially the same as current processors do to feed their multiple pipelines from a single stream of instructions, but might work better since the compiler has more info than the processor); how efficient this kind of system would be is another matter.
Or simply include support for constructs like
where the keyword "future" tells the compiler to perform "operation()" in a separate thread if possible, and block at "doSomeThingElse()" if result is not yet available (or, better yet, block in doSomeThingElse() at the line where result is actually needed). This would allow for extremely easy thread programming - but it would propably still be neccessary to include a separate threading facility.
You can do something like this in Java with the Future interface, FutureTask class and the Executor interface, but it requires a lot of extra work for the programmer. It would be nice if the compiler could generate the code for me. It would be even better if the JVM included support for such constructs, since it would allow the exact same bytecode to be run as a single-threaded application in uniprocessor machine without any overhead, and still take advantages of multiple processors when available.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
The first core could be the main processor, handling processes, and the second core, could just be there to be interrupted by dedicated threads executed on the SPEs, and communicate with them. The main problem would come from memory bandwidth used by the core which handles the 8 SPEs, it should be designed to minimize the impact on the first core.
A solution to this could be to have a cell processor and a traditional single-core processor, both of them using HT to improve memory bandwidth. But it seems to be complicated. Anyway, this Cell processor could be interesing as a threads management unit.
Another point should be to double memory to each SPE, and prefetch context switches while another thread is running on it, and once, the context switch is done, retrieve data from the previous thread: this could me managed by the PPE. And if you combine this solution with a non-synchronized timer interrupts on each SPEs, I bet you can get a pretty good improvement on memory bandwidth consumption made by a cell unit...
With all those basics ideas, I think that there is plenty of room to use efficiently those cell processors
IANAGP (game programmer), but it would seem to me that physics and lighting calculations should be easily parallelizable. Each processor can compute the physics for a separate set of objects / pixels / etc. Same for AI for each agent, if the companies actually bothered to put some effort into gameplay over graphics. On the other hand, I would guess that things like fluids (i.e. Far Cry) would be more difficult to do in parallel, due to the less local nature of the interactions.
...and don't forget to give everything a new, academic, name.
Mod me down if you wish but I think the CBE architecture is bound to fail. The reason is that you don't design your software model around a new processor. It should be the other way around. You first come up with a software model and then design a processor optimized for the new model. This way you are guaranteed to have a perfect fit. Otherwise, you're asking for trouble.
The primary reason that anybody would want to devise a new software model is to address the single most pressing problem in the computer industry: unreliability. The reason that software is unreliable is that it is based on the algorithm. Switch to a non-algorithmic, signal-based, synchronous model and the problem will disappear. Unfortunately current processor architectures, including the CBE, are optimized for the algorithm. Click on the link below for details on a new software model designed to solve the reliability problem.
if every ps3 was networked and sony rented out your redundant core to the DoD, how fast would the worlds most powerful super computer be?
It's called the Revolution.
Twinstiq, game news
the first is that they don't deal well with resource contention. No language, or any other thing for that matter, does.
When you fork N processes on N objects and you have N-M processors, it costs you computationally, which translates into efficiency.
Its one thing to think of this situation as a bunch (N) of ball-bearings going a bunch of holes (N-M) with each ball-bearing having its state information local to it. (Any kind of concept of a sieve can serve as a 'gedanken' experiment.)
The situation becomes hopelessly confused when there is any dependency on external data or process sources.
The mechanisms for handling that confusion are all basically ones of reducing the many threads down into a single thread and meting out the shared resource piece-meal.
A sufficiently evolved schema is capable of handling replication of a shared 'read-only' resource but, despite the efficiencies inherent in that situation, it merely shifts the burden of resource access up one level. There will be a stiffer computational penalty to be encountered when 'access starvation' is reached.
Hopefully the replication penalty will be acceptable, and there are ways to mitigate the computational cost of that penalty, but the trade-off is an instance-level, existential sort of thing and exists at run-time and can only be guess-timated at algorithm/method design-time.
The second fault is one of design of the languages themselves.
They are not designed to operate within a schema. Actually no language is so the efficiencies to be gained from using a schema are bolted on to the application and not an inherent part of it.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
Although Nintendo isn't even talking about the hardware specs, so we can't be sure.
But I didn't include the Revolution because Nintendo is saying the same thing they did with the Gamecube, that they don't need 3rd party developers. Revolution seems largely like a platform for Nintendo to sell you their older games again. Additionally, if Revolution is sufficiently underpowered compared to the other two, it may be that 3rd parties just plain cannot port their games to this platform, or else have to "dumb down" their game in such a way which might make the game uncompetitive with games that don't work on Revolution.
So, basically, N is downplaying new development so much on the Revolution that I simply left it out as a platform which would attract developers who were fed up with the other two. But probably I shouldn't have done so.
By the way, with all of this, I want to mention I'm a huge N fan. I have three GBAs, a DS and a Gamecube, plus all their other consoles back to the SNES. I just think that N is concentrating on 1st/2nd party development more than 3rd party development.
http://lkml.org/lkml/2005/8/20/95
I haven't really done much programming since college and none of those programs have been multithreaded, so maybe I don't have the right background to comment. But, all I can say is wow. This is crazy compared to the Sparc processors that I learned assembly on. As somebody pointed out, not only do these processors have multiple cores, but apparently each one has 128 registers?! Processor design has come a long way.
That said, I see a lot of comments reflecting on how hard it will be for programmers to adjust to programming on this architecture. While I agree that there may be some learning that will have to take place, shouldn't most of the optimization take place on the compiler level? I mean, that's partly the point of languages such as C/C++: write a minimum ammount of architecture specific code and let the compiler do the rest.
Anyway, I find this new architecture very impressive and can't wait to see devices take advantage of this hardware.
If Murphy's Law can go wrong, it will.
Since most of the inter-processor "interconnects" would be consumer-grade DSL/Cable links, it'd have phenomental capacity to process chunks of data but serious latency issues in distributing work units. Commercial cluster data-processing units probably use gigabit ethernet or faster connections to get around this.
I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU
The part of Sony that has been providing Linux kits for the PS2 since 2002.
The console homebrew scene is rather big, and Sony and Microsoft can do nothing about it.
Frankly, I don't see why they couldn't just use flash memory instead...everyone's doing it these days.
"programming for the CBE is like programming for no processor you've ever met before"
Which is exactly why it will never take off.
Another horrible early processor was the TMS9900, which pretended to have 16 16-bit registers but they were just mapped memory. And that too didn't have a proper subroutine call and return. It really wasn't better in the old days.
Pining for the fjords
Real geeks don't use IDEs! ;)
I'll use an IDE when they invent one that is fast, flexible, and chrome free as a normal bash command prompt. I have hated every IDE I've tried due to slowness and their insistance on making an ugly bloated interface. I don't need the IDE to try to guess what I'm typing, to offer code debugging while I'm typing, to FTP my files for me, etc.
Basic project management, checking my code for errors (when I ask.. not constantly), good seach and replace (with regular expressions and across multiple files), and a built-in reference guides for everything I'm doing (language, protocols, etc) would be good enough. Possibly an interface builder if it works well and stays out of the way when not needed. Much more than that and you're just letting wasting your time playing with an interface replace actual coding time.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
Enjoy your wallow in the 1970s. My coding experience, which started there, includes typing hex machine code into the Apple ][+ machine language monitor. And it also includes working with customers, graphic artists and mathematicians. Which is why productivity is clearly highest throughout the cycle with flowcharts of reusable schematic objects. We've got lots of native topological intuition and skills we almost use while coding and debugging, but which we're not skilled in operating by typing.
I want to keep the lexical tools you love, especially for geeks who can't use the multidimensional tools, but also just to keep their proven value. So what I'm looking for is a callgraph visualizer for your stuff, and a flowchart compiler that produces procedural code like C and Java. Then we can each use the machines for maximum productivity among all our individual idiosyncracies.
--
make install -not war
Individual indiosyncracies are nice. I hate being forced to use a given enviroment to work on something. I keep working on my ideal enviroment which is of course ideal only to me. Fancy tools with low overhead. ;)
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
I like subclassing and embedding :).
--
make install -not war
My biggest fetish is making lots of extensible small programs/libraries and building other programs out of them.. so I guess I like subclassing and embedding too. I was once accused of creating my own custom programming language for everything I programmed. I do that less now but do like to expose the internals in ways that let my programs be modifed in all sorts of interesting ways.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
cellular broadband coverage is spotty at best in my area, and the damned providers charge too much per minute for the airtime. :)
help me i've cloned myself and can't remember which one I am
Your programming example would remove the need for mutexes, but that's about all. Spawning a thread, while faster than an entire process, has cost.
In your example, the context of the current thread would have to be duplicated (how does the compiler know what data operation() will use). They can't run from the same data - what if doSomeThing() modifies a field that operation() uses?
Java's FutureTask requires extra work, but it protects you from concurrent data modifications errors. (Or, it appears to. I'm not familiar with it...)
Your idea is good. We're eventually going to need a revolutionary way to parallelise computation on a larger scale (larger than instruction-level, anyway). But don't forget - at some point, all abstractions leak.
Even if we could split all function/method calls out to separate threads, there will still be waste. Just as instructions must wait for their inputs, so must your functions/methods. To get a well-optimised program, a software engineer will still be stuck with managing the critical path of execution.
Well, it seems like it is (almost) your lucky day today.
The PPE is already hyperthreaded, posing as a dual core.
send + more == money?
"Every PS3 hard drive is shipping with Linux onboard."
I sure hope that's true, but I'm pretty sketical. Sony makes butt loads of cash off the software sales for the PS2 so if Sony actually does give us Linux it's going to have some strings attached. For example, I very much doubt that Sony will give us access to the GPU.
What gives me the most hope for a relatively unencumbered PS3 Linux distro is the Blu-Ray format. All Blu-Ray players will have a Java layer for interactive crud, which should be enough for stupid little games( i.e. Tetris, Puzzle Bobble, Metal Slug ), Sony almost has to concede the the non-3D market.
I'm personally really excited about the BD-J stuff, and the homebrew scene that will grow up around it.
No, you'd still need mutexes. My example is simply a clean and simple way of expressing "this task can be run on a separate worker thread".
Then you obviously need to synchronize accesses to those fields.
You seem to have misunderstood my basic idea. The idea is not to make different threads not require synchronization. The idea is to make an easy, simple and clean code construct for marking tasks for being executable on a background thread - a bit like the "for (SomeType t : someCollection) { t.doSomething(); }" construct for iterating over a Collection (it gets expanded into something like "for (Iterator i = someCollection.iterator(); i.hasNext(); SomeType t = i.next()) { t.doSomething(); }" by the compiler).
It would be very nice for something like loading data in the background (especially if they're loaded over the Net) while doing some other initialization tasks on the main thread.
So no, it's not a magical nonsynchronized thread, it's just a bit less painfull interface for using threading. Just a compiler macro, not unlike autoboxing; a little something to lessen the need for typing and make code a little easier to read.
Which is why we get these things called brains, to figure out which are usefull abstractions and which aren't :). But I'm not really sure that this is an abstraction, more like a way to express your intent to the compiler in less typing, and make code cleaner and easier to read at the same time.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.