Prospects For the CELL Microprocessor Beyond Games
News for nerds writes "The ISSCC 2005, the "Chip Olympics", is over and David T. Wang at Real World Technologies put a very objective review of the CELL processor (the slides for the briefing are also available), covering all the aspects disclosed at the conference. Besides the much touted 256 GFlops single-precision floating point performance the CELL processor has 25-30 GFlops in double-precision, which is useful enough for scientific computation. Linus seems interested in CELL, too."
Sony so badly wants its next-generation game console to offer a super-realistic "virtual reality" experience, the company will design and build its own advanced 128-bit processor to realize this goal.
...
Processors inside game consoles usually toil away in anonymity, derided as as poor cousins to desktop chips such as Intel's Pentium line. But with Sony Computer Entertainment's ambitious plan, its chips could outclass the offerings of the world's largest chipmaker--if all goes well.
The system is so advanced, MicroDesign Resources analyst Keith Diefendorff wrote in a report that the system "has the potential to swipe a chunk of the low-end market from under the noses of PC vendors." He wrote that the platform may "signal the company's intention to move upscale from current game consoles, cutting a wider swath through the living room," with its abilities to function like a stand-alone DVD player and Internet set-top box.
Sony puts on game face with new chip
Published: May 5, 1999, 1:25 PM PDT
By Jim Davis
Staff Writer, CNET News.com
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Some time ago Chuck Moore proposed the 25x , a single chip holding a 5x5 array of simple processors. That's what this reminded me of when I first read about it. As Mr. Moore said in that Slashdot interview, "[...] the 25x is a solution looking for a problem." Cell theoretically has a lot of performance, and we're talking FLOPS not MIPS. It will certainly be useful or even revolutionary in televisions and game computers, as well as for scientific calculations. I don't see it making your desktop or server much faster though. Those don't need more FLOPS, they need more I/O bandwidth and faster peripherals, and perhaps more MIPS. I can see Cell workstations, but in the same way as you have SPARC workstations and laptops now: as development tools for the "real" hardware.
...might be used to run the PS3 (assuming this is true). Outside of a weighty OS (assuming you use Windows, Mac or a Linux GUI with that nVidia) they should do better.
Besides, 256 GFlops in single-prec. can't be too bad either...can it?
You can hold down the "B" button for continuous firing.
But what it can do is provide backup horsepower as a math co-processor.
I see great potential for the STI Cell Processor as a SETI@Home accelerator.
Seriously though, there may be good scientific uses for these exactly as you envisioned - in a coprocessor role. From folding proteins and weather simulations to cryptoanalysis, these could provide a great entry for distributed scientific computing.
I've been reading about the Cell processor for a few weeks now, and there is never any discussion about the operating system architecture necessary to get this thing to perform.
As I see it, its a Power PC of OK quality with 8 subsidiary processors optimised for operating a relatively simple task on a relatively small amount of memory.
So - port Linux to it? But how?. Relatively easily, to make use of the main processor, but what sort of subsystem do you build so that the subsidiary processors get used to their full potential. Perhaps part of X could be configured to run on these processors - but that would be a very manual tweak to make use of the architecture. And with the best will in the world, these processors would then sit around unused for most of the time.
What you need is a more general concept, probably at the programming language level, in which algorithmns can be expressed in such a way that the operating system can detect that they can be loaded into these subsidiary processors to be executed.
But there doesn't seem to be anything about that in the news out there. Presumably Sony are going to do something for the PS/3 - what? and is it going to be general purpose, since much of the benefit from their purposes will be a super motion graphics processor for games.
Until we understand what the software infrastructure to make use of the architecture of this new chip will be, then I can't see how we can make predictions of its success in the more general processor market. Before then its just marketing hype.
it will be lacking deep pipelines, caches and other bits
And that is the whole point of this processor. The G5 NEEDS those pipelines and caches in order to feed the multiple execution units, reorder instructions and avoid reading slow host memory.
The CELL on the otherhand will have the instruction ordering done in software. All those 'bits' you describe are replaced with software: a much smarter compiler.
Yes this processor will perform poorly with today's code. With appropriately written code it will scream.
This chip is not going to compete with other general purpose CPUs. It's going to compete with custom ASICs and FPGAs.
-S
I keep looking around expecting to see a school of their lawyers circling, biding their time until before a patent law suit frenzy.
I'd be more worried about that if they DIDN'T use Rambus's technology. Rambus can't sue someone who's licensing their tech... they can only sue someone they THINK is using tech too similar to theirs without licensing it. If cell used some sort of DDR or maybe an inhouse memory tech instead, maybe then Rambus would try to sue.
The real promise of these Cells is Internet MPP. IBM (and Sony) claim that Cell PCs will be able to cluster "natively" across Internet-latency TCP/IP networks, like broadband. If they deliver on that, then performance questions will revolve around interoperable network apps, not just the raw CPU HW.
Intel's Pentium architecture was built to accomodate 6-way direct CPU interconnects. The idea was to build "cubic" structures for MPP computers. It took until the P4 to really deliver any of those, almost 10 years after the architecture was released. And the software is still bleeding-edge, and hand-rolled for each install. MPP SW techniques have evolved a lot since then, so perhaps the Cell will actually deliver on these "distributed supercomputer" promises.
--
make install -not war
Transmeta isn't doing the low heat processors anymore. Quoted from http://arstechnica.com/news.ars/post/20050105-4501 .html .
Just because they aren't manufacturing anymore doesn't mean they're exiting the business entirely. There just might not be a "Transmetta" anymore. Instead there will be something like an "Intel Pentium 5 using lowerpower Transmetta Technology" (well probably not, but you get the idea.)
Transmetta will be doing R&D for low power processors for years to come, I'm quite sure.
Besides, 256 GFlops in single-prec. [realworldtech.com] can't be too bad either...can it?
Unfortunately single precision number ignore certain rounding conventions in order to boost the speed. You'll get super fast single precision results, but they won't be as acurate as on other systems. Probably won't matter for physics rendering in a video game (Sony's Emotion Engine did the same thing) but it could make a big difference when applied to general purpose situations.
5 years ago the "Emotion Engine" from Sony was supposed to "steal a chunk" of the PC processing market. Didn't happen. Won't happen.
But look at the graphics in PS2 games now compared to 1st gen titles. The improvement is incredible! The hardware hasn't changed: it's still just a 300Mhz cpu with 4MB graphics and no pixel shading. I think we'll see the same maturation process with Cell/PS3, where the 1st gen games don't live up to the hype but more and more of the Cell's enormous potential is realized with successive generations.
The question is whether Sony decides that part of the slow evolution in efficient PS2 programming was because of the small, exclusive development community. I would love to see Sony push a Linux PS3 similar to the version of Linux PS2 they released.
My view of the Cell chip is that it's actually 2 different kinds of chips put together. It has a general processor (the POWER5 core) core, and essentially co-processors that are optimized for a totally different class of programs. The POWER5 chip would let it run your normal office applications, but the SPEs allow the chip to do things like graphics processing, audio processing, simulations, etc. All those problems that lend themselves naturally to a vectorizes solution. Together, the 2 kinds of cores on a single chip has the potential to do a lot. But there has to be tools to allow developers to make use of the potential. Especially as vectorized programs are not easy to write and optimize, that makes the quality of the development tools very important in deciding the success of the chip.
There are 10 kinds of people in the world - those that know binary, and those that don't.
(1) fetching and prefetching (multiple P4 stages) because the extra processors on Cell can directly address their local 256KB of memory.
(2) decoding x86 instructions into microops - since the extra processors are running code directly rather than running kludgy x86 code on a non-x86 microcore
(3) branch prediction (since the load penalty is a lot lower due to local 256KB of memory and shallower pipeline, these stages are unnecessary)
(4) scheduling the microops isn't necessary as Cell will require that to be done in software during compilation (ala VLIW)
(5) retirement (since Cell isn't doing out-of-order execution, no reordering and retranslation from the microop to the x86 world is necessary)
So given that potentially half of the 20 P4 stages (later P4s have 31) are unnecessary, that saves a lot of logic and allows the same clock speed with less stages. There has (apparently) been a lot of architecture work here to think through what adds the extra hardware and how to avoid that... the result is the ability to use higher clock speeds without having the same types of penalties the IA-32 processors encounter.
The PS3 should not have nearly the problems that the PS2 had in regards to it's difficulty of development (a.k.a. Lazy developers). Because Cell is a joint project by IBM, Toshiba and Sony it will have a much larger install base. Rather than being a specialized chip for a specialized system, it is to be a general chip useable in many systems. These means more people will be programing for it, not just game developers which are notorious for there lack of desire to change (hence why the 68000, 6502 and z80 were so popular for so long). Cell chips should end up making it into systems designed for scientific computing, where developers (a.k.a. computer scientists) will be willing to take more chances and dig deeper into the architecture.
We will see some of the typical ramp up time in cell programs but being as the cell, if you beleive what you read, is so far above and beyond other modern processors (and that lazy developers for the PS3 can always let the NVIDIA GPU carry the load in a more traditional fashion) we should see leaps and bounds in program performance fairly quickly.
It's worth noting that various research papers have done analysis to determine the optimum level of pipelining, and found about 6 to 8 FO-4 gate delays* per stage is optimal - Intel's cancelled Tejas processor was apparently around there and would likely have run at similar clock speeds to the Cell processor. Note that in the real world, you hit other limitations earlier - right now, the main issue is power: chips that fast just run too hot.
*a FO-4 gate delay is a "fan-out of 4 gate delay" - it's the amount of delay from one inverter (NOT gate) which drives 4 identical inverters as load.
My server
Most of you are thinking of today's applications...but what about things like eye/head tracking, voice recognition, face recognition, telepresence, real-time cinema-quality CGI, etc...those are tasks requiring large-scale numerical computation, and they all might appear on your desktop in the not-too-distant future thanks to chips like CELL and its future ancestors.
All is Number -Pythagoras.
My apologies,
I am the editor of Real World Tech, and I tried to warn the folks at our hosting company, but apparently they got caught with their pants down : )
A good slashdotting never gave us any trouble before...but with our new hosts, something gave out...
Check it out, it's a damn good read.
David Kanter
Editor
Real World Technologies
Well, considering that there's going to be a dedicated graphics chip from nVidia in the PS3 too, I'd imagine that the SPUs are designed specifically with all that stuff in mind...
Advanced users are users too!