IBM Releases Cell SDK
derek_farn writes "IBM has released an SDK running under Fedora core 4 for the Cell Broadband Engine (CBE) Processor. The software includes many gnu tools, but the underlying compiler does not appear to be gnu based. For those keen to start running programs before they get their hands on actual hardware a full system simulator is available. The minimum system requirement specification has obviously not been written by the marketing department: 'Processor - x86 or x86-64; anything under 2GHz or so will be slow to the point of being unusable.'"
But does it run Linux?
Oh. Well, okay then.
A B A C A B B
Well, we know the answer to that. Next we want to know, will it kill Intel?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Not knowing too much about the cell processor I read the wikipedia article. I came across this: "In other ways the Cell resembles a modern desktop computer on a single chip."
Why?
What if the entire Universe were a chrooted environment with everything symlinked from the host?
Just to clarify.
No. In our insanely litigious society, a company has graciously allowed another to create and market a different processor by the same exact name.
Yup, it is.
My favorite quote from TFA...
...the Cell processor is an upcoming PowerPC variant that will be used in the PlayStation 3. It's great at DSP but terrible at branch prediction, and would not make a very good Mac. If you want to know full tech specs, Hannibal is da man.
Thats great news, but as an embedded systems designer and eternal tinkerer, where will I be able to buy a handfull of these processors to experiment with? Without having to dismantle loads of games machines ;o)
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
As the Cell is basically a PPC processor I find it strange that the SDK is for x86 processors. Fedora Core 4 (PowerPC), also known as ppc-fc4-rpms-1.0.0-1.i386.rpm is listed as one of the files you need to download. Maybe it's just because of the large installed base of x86 machines.
It'd be nice if IBM released a PPC SDK for Fedora, it would have the potential to run much faster than an x86 SDK and simulator.
infested with jello like fishes no melotron wishes
The software includes many gnu tools, but the underlying compiler does not appear to be gnu based.
Is this any surprise? My understanding was the Cell's a vector process, and despite the recent upgrades to GCC, it's still fairly awful at autovectorisation.
Can anyone clarify?
Why Fedora is so often considered the default target distribution I don't know. Even the project page states it's an unsupported, experimental OS, and one now comparitvely marginal when tallied.
Must be a case of 'brand leakage' from a distant past, one that held Redhat as the most popular desktop Linux distribution.
Shame, I guess IBM is missing out on where the real action is.
I dunno - telling people they have to upgrade their PC to run the SDK for a new PC architecture seems like a marketer's job.
--
make install -not war
I not get mine run. Please send exact instruction how downloaded PS3 games play can?
which would be why he responded in such a sarcastic manner, and why everyone accused him of "trolling"
That I am in the UK, although I dont think that will make much difference :o)
But I would like to know.
Mike.
Open Source Drum Kit, LPLC deve board - mjhdesigns.com
Give me a nice clean distro like Gentoo anyday. I can't stand that a Fedora install requires 5CDs and installs some 600 packages that I will never use. Why do I need so many text editors, etc? I get lost in the and nervous in the Applications menu. Sure, I tried 30 text editors before I found the one I wanted, but that's all I install on my box durring reinstall or upgrade.
BTW, this parent might be offtopic, be he is no troll. Shame on you mods!
The real question is whether the the PS3 will have an Linux hard disk option like the PS2. If that is the case, it may be the cheapest way to get actual development hardware.
HPC for Primates. Read Cluster Monkey
How does one get a hold of a real CBE-based system now? It is not easy: Cell reference and other systems are not expected to ship in volume until spring 2006 at the earliest. In the meantime, one can contact the right people within IBM to inquire about early access.
By the end of Q1 2006 (or thereabouts), we expect to see shipments of Mercury Computer Systems' Dual Cell-Based Blades; Toshiba's comprehensive Cell Reference Set development platform; and of course the Sony PlayStation 3.
OK, so what they're saying is "it's slow to emulate a PPC variant on an x86 variant". Duh.
But Apple seems to have cooked up something wonderful (or at least licensed something wonderful) in this vein in the form of Rosetta, the tech that lets Mac OS X for x86 run Mac OS X for PPC binaries very fast.
Sony has several metric fucktons of money. Can't they license the Rosetta technology, or pay for it to be basically "ported" from its current state of PPC-on-x86 to Cell-on-x86? Cell is PPC-based, so it shouldn't be so hard, no?
With spending like this, exactly what are "conservatives" conserving?
I wonder if it'll take advantage of multi-core chips? Might make sense to do so, especially since that's also (sort of) similar to the hardware being simulated.
Weaselmancer
rediculous.
In Cell and other IBM PPE designs, there is no dynamic prediction hardware, so the CPU makes the same guess every time, even when it's obvious to a human that it's wrong. This costs in performance for code like AI, where the chances of taking the jump vary while the game is running.
I appear to have a blog. Odd.
Imagine that running on a beowulf Cluster of Cell Processors, running Bochs to run... nevermind
Wow, you could not be more wrong. See the wikipedia article on branch misprediction. You should probably read up on exactly what RISC means as well. I have the "SPU assembly language" document here from IBM (can't remember where I got it from, sorry). The branch instructions (not JUMP) can jump to any location stored in any 32-bit register, minus the two least significant bits. It is a RISC CPU after all. Or it can branch relative to the current PC using an 18-bit direct value. Considering the first generation of Cell's have 256KB of local addressable memory per SPU, that's half the available memory in a relative jump. And most of that memory is probably going to be used by data anyway. So no, JUMP's do not have to be small. This is not your dad's SIMD computer, this is a pretty general RISC processor with vector extensions.
Has any CPU ever had a mechanism for the user to hint to the CPU whether or not the branch will be taken? Perhaps just another branch instruction that hints to the CPU that it is very likely for a branch to be taken.
This way the compiler could insert the appropriate optimization depending on the situation (or we could even allow #pragma type statements so a programmer could tell the compiler which way to hint!)
Granted most of the time the compiler could decide; or it doesn't matter so you could just use the same simple rules that the CPU might use (or just defer to the CPU as we do now). However, we've all be stuck in a situation where we end up with an if statement inside of a for loop. It'd be nice to be able to tighten a loop like that up in the rare situation where it matters.
.plan!! what plan?
The cell processors can do DMA to and from main memory while computing. As IBM puts it, "The most productive SPE memory-access model appears to be the one in which a list (such as a scatter-gather list) of DMA transfers is constructed in an SPE's local store so that the SPE's DMA controller can process the list asynchronously while the SPE operates on previously transferred data." So the cell processors basically have to be used as pipeline elements in a messaging system.
That's a tough design constraint. It's fine for low-interaction problems like cryptanalysis. It's OK for signal processing. It may or may not be good for rendering; the cell processors don't have enough memory to store a whole frame, or even a big chunk of one.
This is actually an old supercomputer design trick. In the supercomputer world, it was not too successful; look up the the nCube and the BBN Butterfly, all of which were a bunch of non-shared-memory machines tied to a control CPU. But the problem was that those machines were intended for heavy number-crunching on big problems, and those problems didn't break up well.
The closest machine architecturally to the "cell" processor is the Sony PS2. The PS2 is basically a rather slow general purpose CPU and two fast vector units. Initial programmer reaction to the PS2 was quite negative, and early games weren't very good. It took about two years before people figured out how to program the beast effectively. It was worth it because there were enough PS2s in the world to justify the programming headaches.
The small memory per cell processor is going to a big hassle for rendering. GPUs today let the pixel processors get at the frame buffer, dealing with the latency problem by having lots of pixel processors. The PS2 has a GS unit which owns the frame buffer and does the per-pixel updates. It looks like the cell architecture must do all frame buffer operations in the main CPU, which will bottleneck the graphics pipeline. For the "cell" scheme to succeed in graphics, there's going to have to be some kind of pixel-level GPU bolted on somewhere.
It's not really clear what the "cell" processors are for. They're fine for audio processing, but seem to be overkill for that alone. The memory limitations make them underpowered for rendering. And they're a pain to program for more general applications. Multicore shared-memory multiprocessors with good cacheing look like a better bet.
Read the cell architecture manual.
This is exactly what the Cell SPE's have. The SPE compiler uses "branch hints" that are put in by the compiler using the GCC pragma "__builtin_expect_". Take a look at the "SPU C/C++ Language Extensions" document that was released a while back by the Cell team.
Most of the other posters have no idea what they are talking about. The PPE is a fully PowerPC compliant two-way SMT processor and absolutely has a branch predictor. It is the SPEs (SIMD vector units) that do not have branch prediction, but they do have branch hints. A tacit assumption in the SPE design is that the vector code used in the SPE's will not have too many branches to begin with.
Zigbee Central: A Zigbee weblog
I wonder if anyone has considered using cell processors to run large neural network simulations, the SPEs would churn though node calculations at an incredible rate. You wouldn't need any greater accuracy than single precision.
It would be an interesting application.
Once again, the cell is not a PPC processor. It is not PPC based. The cell going into the playstation 3 has a POWER based PPE (power processing element) that is used as a controller, not a main system processor. Releasing an SDK for Macs would not give any advantage over an X-86 based SDK because you are still emulating another platform.
Wiki
But the PowerPC core does have a more limited pipeline and branch prediction logic than some of the other Power chips. I believe this simplification was to make room for other "stuff."
Agreed, the PPE core only has a 4KB by 2-bit BHT(branch history table). Note that the PPE pipeline depth is only 23 stages (i.e. branch misprediction penalty is 23 cycles), so a misprediction penalty is comparable to designs that run at far, far slower clocks. I am not sure if the main motivation was making chip real estate available for other things: The recent IBM Journal of R & D paper by Kahle et al. is an excellent read to gain insight into the design decisions they took, and I believe they were confident that a more sophisticated branch predictor was unnecessary considering the elegant PowerPC core that they had in their hands (23 FO4 PPE pipeline depth, 11 FO4 SPE pipeline depth, can run at 4GHz plus!).
Zigbee Central: A Zigbee weblog
I'm very excited about this project, even spec'd out a new dell to handle it. But before I can lay down the cash, I just wonder: why?
why? Is the cell processor expected to go anywhere past PS3? There is obviously no OS port planned, and I have no access to PS3 game SDK. I have read some pretty awesome posts regarding the technical details of cell vs. x86 or Mac architectures, but none that would encourage me to download, install, and play around with this with the hope of ever making a buck.
Here's to finally giving Bush his exit strategy in November
the "Cell" well, as far as I am concerned. They seem to be totally unremorseful regarding their music CD DRM (aka rootkit). At one point I considered the purchase of a PS3 in order to gain experience with the Cell Processor. Today, I would not consider the purchase of ANYTHING with Sony's name on it, regardless of how "geeky" it might be.
Purchasing IBM's (or perhaps Mercury Computer's) reference CBE-based platform are now my only choices. Sony's NRE for the PS3 might make their platform a "best buy" price-wise because of the manufacturing volume. But between their heavy involvement in the MPAA, the RIAA, and this DRM issue that makes customer's computers extremely vulnerable, there is no longer any compulsion to give Sony anything other than a "loud, wet rasberry".
I'd hardly call a CPU with a 23-stage pipeline and no out of order execution 'elegant'. Maybe to a hardware guy, but all I see when I look at Cell is absolutely atrocious integer performance.
A deep unwavering belief is a sure sign you're missing something...
I am a hardware guy, and the design is far more elegant and simpler than most of the competing CPU's out there; mainly as a result of the push to get it to work at the 4GHz+ frequency range.
I think it's very early to talk about the integer performance of Cell. I have been working on Cell for a few months now, and all I can say is that the integer performance of the PPE core is on par with the competition; and it beats them handily using hand-written code to take advantage of the SPEs.
Zigbee Central: A Zigbee weblog
On par with what? I'm a Lisp guy/compiler enthusiast. I like processors with out-of-order execution that don't care about code scheduling, have excellent branch prediction, have low memory latency, etc. Basically, my ideal processor is an Opteron. It's all about perspective, hence my criticism of your use of the word "elegant".
A deep unwavering belief is a sure sign you're missing something...
SCEA press release:
SONY COMPUTER ENTERTAINMENT INC. AND NVIDIA ANNOUNCE JOINT GPU DEVELOPMENT FOR SCEI'S NEXT-GENERATION COMPUTER ENTERTAINMENT SYSTEM> .
TOKYO and SANTA CLARA, CA
DECEMBER 7, 2004
"Sony Computer Entertainment Inc. (SCEI) and NVIDIA Corporation (Nasdaq: NVDA) today announced that the companies have been collaborating on bringing advanced graphics technology and computer entertainment technology to SCEI's highly anticipated next-generation computer entertainment system. Both companies are jointly developing a custom graphics processing unit (GPU) incorporating NVIDIA's next-generation GeForce(TM) and SCEI's system solutions for next-generation computer entertainment systems featuring the Cell* processor".