IBM to use Cell in Blade Servers
taskforce writes "IBM announced on Wednesday that it would be putting versions of its Cell processor inside its increasingly popular low-power blade servers by this summer. From the article: 'For Cell to gain wide acceptance, IBM needs to spur outside programmers to write software that takes advantage of Cell's prowess. That could prove more challenging than usual because Cell's architecture is so different.
IBM hopes this summer's release of the Cell-based servers kick-starts work by third-party programmers.'" Also covered in a PCPro article.
That could prove more challenging than usual because Cell's architecture is so different. IBM hopes this summer's release of the Cell-based servers kick-starts work by third-party programmers.'"
Deja vu?
Take a peek at http://www.research.ibm.com/cell/patents_and_publi cations.html to see the patents and whitepapers for cell technology. One interesting point is the Online Game Prototype white paper on there.
http://religiousfreaks.com/Sun Microsystems has decided to include the Gohan chip to combat IBM's Cell chips.
The only change I can believe in is what I find in my couch cushions.
It being command-line compatible with (or simply a back-end of) an existing compiler like gcc is even better.
Add a port of a good OS, and your platform is suddenly incredibly attractive to developers.
In Soviet Washington the swamp drains you.
Considering they've already got Linux on Cell and a proposed model for making userland apps to take advantage of the SPUs, and have had these since last summer, I wouldn't be surprised if some open source code is already in the process of being ported.
Anyone know of any specific server apps?
My blog
Juhi Jotwani, IBM's director of Blade Center and xSeries solutions, holds the company's new Cell processor during a presentation yesterday in New York.
She said, "Come on, juh know jouwant it!"
He who knows best knows how little he knows. - Thomas Jefferson
As I understand it, the various pipelines of the Cell chip tend to be more specialized than the Coolthreads technology Sun is using on their new T1 processor. However, even with 32 full-blown pipelines, Sun is also concerned about whether their chips will be put to good use or not.
I'm not quite sure what IBM is planning to do, but Sun has started a contest to see who can build the coolest program that takes advantage of their new Coolthreads technology. The prize is a cool $50,000, so Sun seems to be serious about this. The results of the contest may very well prove whether the new parallel technologies have a future or not.
Javascript + Nintendo DSi = DSiCade
Blades in Cells are usually a Bad Thing. Apparently Cells in Blades are a good thing! Go figure...
Official Heretic from the "Church of Global Warming". Proven right thanks to whistle blowers. AGW = Flat Earth Theory
It's a hell of a paradigm shift for programmers to go from writing code that targets one CPU to code that deliberately splinters tasks across a bank of specialized processors.
It's fun to bash the Cell as a general purpose CPU when no one has actually suggested it's designed for that.
All of the above being true, it remains to be seen what gains IBM's POWER/Cell system actually offers above present architectures -- RISC was the next big thing, too, until Intel internalized part of it into the x86 architecture.
Flyover landscape graphics demos are a shopworn rabbit pulled out of a threadbare hat: convert fractals into craggy vertical displacements with extremely primitive lighting/mapping. Show me an architecture that can *realtime* render Incredibles-caliber cloth/hair simulations and I'll get a hard-on while ATI and nVidia executives slit their wrists.
"Made up/misattributed quote that makes me look smart. I am on
This probably means that the PS3 will either actually make its "spring" release or that it is hampered by problems with the Blu-Ray drives/disks instead of a Cell shortage because otherwise I couldn't imagine that Sony would allow IBM to use even one Cell for something that's not a PS3 for the first 3 months.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Marketing people are quick to describe the 0 to >0 transition as "increasingly popular". Of course the rest of the world considers it statistical noise. : )
We've had blades with Cell cpus on them for quite a while. They're a lot different than any other architecture... resembling the pSeries layout more-so than others. One thing I don't like about the prototypes is that the Cell cpu's along with the bga memory they use are fused directly to the logic board. They're were a few pictures released to the public about a year ago on the Register but I can not find them now. Other than that they are seriously fast and very clusterable.
||| I still can't believe Parkay's not butter.
IBM has opened the spec for their blade chassis design. Does anybody know if somebody is trying to make a 'desktop' blade chassis? Rather then buying a huge box that holds 14 blades, something that might only hold two.
This doesn't mean make a desktop out of a blade, because as I understand it, so far the JS20s (IBMs PPC 970 blade) don't even have video cards. You have to set them up over the serial port, and run them over the network.
But does anybody have a development sized unit you don't need a server rack and new power circuits for?
When I'm done, I pull it out and play.
i didn't need to know that.
Sun's new processor is designed for many-connection business server applications. Web stuff.
The Cell is designed for image processing and other high-volume number crunching.
The design decisions both companies made were heavily influenced by their target markets for these specific processors, and those target markets are very different.
These are apples and oranges.
They can't use that name, freescale (motorola) already has it, and killed the line. jsp?nodeId=0162468rH3YTLCvL2v
http://www.freescale.com/webapp/sps/site/taxonomy
if you knew this, then fwoosh went the joke over my head
In my opinion, this thing will run well games, but that's about it. I've seen so far 2 presentations by IBM about the Cell processor (at (micro-)architecture conferences). Both times, the question on everybody's mind was "How do you program these things?". The answer was pretty much a hand-wavy "oh hmmm, well, blah blah blah manual"
The Raven
Won't the Cell reception be poor inside the metal cabinets?
*looks bright*
you hand optimize (and design) your program for the cell.
Every parallel architecture I've ever programmed for had nice APIs for offloading and directing tasks to the various available processing units. There shouldn't be much 'hand-optimization' involved in the sense you're implying.
Developers who write code that takes advantage of GPUs in modern gaming PCs are already familliar with this style programming, and the ones that understand the architecture instead of memorizing the APIs or program out of a cookbook should have no trouble adapting.
Itanium offers such a giant increase in performance (for some applications) compared to rival RISC products that you will see people investing time and money to take advantage of it. In addition, with Intel, SGI and HP all with product plans and thus the related volume and eco-system surrounding development tools, etc., I think the Itanium is positioned far better than Alpha to succeed.
D'oh.
Actually, the bigger difference is in how the architecture changed. Cell processor is more along the lines of multi-core DSPs. The instruction set is different than general computing cores and there are many of them. The key is that these cores are disjoint. You can run one application on one core and another application on another core.
The Itanium is different than this in that it required instructions to be passed to the CPU as "bundles". Any of the instructions in a bundle could be executed in any order, but these instructions were all from the same application. Thus, in order to extract speed from the Itanium, the compiler was forced to extract parallelism from within functions. This is very difficult since most programming is fairly sequential. The Cell, on the other hand, allows you to execute different tasks and so puts this control back on the programmer instead of extra work for the compiler.
Itanium was (is) a great idea from compiler theory perspective, but doesn't work out all that well (yet) in the real world.
To the programmer, communicating with the SPU is abstracted to file i/o operations. Go check out IBM developerworks pages for lots of info.
Developers who write code that takes advantage of GPUs in modern gaming PCs are already familliar with this style programming,
:)
But you can probably count on your fingers the number of developers who are using GPUs for anything other than rendering pixels, or at most some simple vectorizable simulations like water or cloth.
Taking an arbitrary program and turning it into something that would run well on a GPU (or a Cell SPU) usually requires a significant redesign of the algorithms and data structures as compared to what you would naively and straightforwardly do in C...or it won't get anywhere near peak performance and may even run slower. It's certainly possible to do, but you won't be re-using any of that originally written code, and it's a different way of thinking from what 95% of programmers are used to. I'm speaking from experience as someone who earns his living by being in the remaining 5%.
As the original poster said: you hand optimize (and design) your program for the cell.
Why go with SPEs anyhow? The whole problem with coding for the Cell involves the differences between the PPE and the SPE. The SPE doesn't have branch predictors, making it virtually useless for any sort of flow control.
Why didn't IBM just pack in a lesser number of PPEs? The PPE already seems to be a very lightweight general purpose processing core, unless I'm missing something. It is about the same size as an SPE. So why not just put 9 PPEs on a Cell chip instead of 1 PPE and 8 SPEs?
If you had 9 PPEs on the chip, any multithreaded code (servers for example) would see massive benefits without having to rewrite it to try to find aspects of the program that could run on what is effectively a DSP. While everybody else was fooling around with 2-core processors, they'd have a 9-core processor on the market. Sure, slower per-core, but 9 of them, with that number going up in the future.
Or am I missing something here?
PPEs are bigger. Also, a dedicated slave processor doesn't have to worry about interrupts and context switches and OS crap, it can spend all its cycles on number crunching. Cell SPEs are all about moving large amounts of data and doing a whole lot of compute on that data. They're simpler and more efficient at what they're designed for.
Start Running Better Polls
You're close to correct. The Cell processor does have a bunch of cores that are basically DSPs (no virtual memory, etc.) BUT there's also another core that's basically a full-blown Power processor. That core is meant to rule the others.
So while you do still have to program differently for a cell with 8+1 cores than you would for a computer with 9 Power processors, it's still not like being stuck with just 9 DSPs.
Actually, the bigger difference is in how the architecture changed. Cell processor is more along the lines of multi-core DSPs.
Standard computer graphics are RGB color at 24-bits per pixel [2^24 = 16777216], i.e. about 16 million colors.
Standard thinking in the graphics bidness is that: If our triangles will only be displayed in 24-bits worth of color, then why do we need to perform triangle-arithmetic in anything higher than maybe 32-bits worth of floating points?
Hence floating point calculations are 24-bit in the ATi world, and 32-bit in the nVidia and Playstation3/Cell world.
Boy, I hope they're upping that floating point number for these "server" chipsets, cause 32-bit single-precision floats are essentially worthless for even something as trivial as computing interest on a bank statement.
On the other hand, a "Cell" server CPU with a 128-bit FPU would be something to drool over. The problem, though, is that transistor counts on FPU's tend to increase as n^2, so each time you double the FPU bit-count [to 64-bits, then to 128-bits], your transistor count goes through the roof.
IBM needs to release two SIMPLE tutorials if they want programmers to bother porting code specifically to the cell:
1. A cell program that solves linear equations Ax=b efficently using SPE's. This would help those with data intensive problems.
2. A cell program that speeds up depth first search (a la for SAT,GRAPH COLORING, MAX-CLIQUE) by using the SPE's. This would help those programming CPU intensive problems.
bash-2.04$
bash-2.04$yes "Don't you hate dialup connections?"| write USERNAME
Parent not funny, it's insightful!
-nB
whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
They could come up with ATX or miniATX boards at real cheap prices, able to take your average DDR DIMMs, power supplies and IDE etc. Give it maybe 3 PCI slots... or 1 if its miniATX.
Sold for under $100, and theyre making money off it while spreading the love that will increase the developer market for the cell architecture.
It goes like this. Make a new architecture. Release a good compiler for free.. with awesome documentation and sample programs and libraries. Allow people to buy evaluation boards for low prices. Once you get people hooked enough, sell the chips themselves at high prices. Its the Microchip (tm) model. Their chips dont really do much for the high costs (compared to atmel, TI etc) but since everyone knows how to work them, they sell sell sell. Rabbit semiconductors however are trying hard to get into the market, and their dev tools are cheap. It'll take time.
IBM cant release a couple o PDFs and one tough software suite and expect the world to jump on it. Theres a reason why theres so much momentum behind the Power architecture, and the Cell is different.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
If you put the onus on the programmers, this chip won't get widespread acceptance.
If you can write a PC program that uses 10 threads, then you can write a program that uses the Cell processor's PPC and 7 DSPs. Trouble is that most computer science education in universities doesn't cover practical use of threads.
There were a couple that would be really helpful:
1. An implementation of zlib for the SPE architecture, with a speed comparison to the PPE. (Hopefully, the SPE is very fast...)
2. Examples of direct SPE-to-SPE streaming.
One Cell to rule them all, One Cell to find them, One Cell to bring them all and in the darkness bind them.