Prospects For the CELL Microprocessor Beyond Games
News for nerds writes "The ISSCC 2005, the "Chip Olympics", is over and David T. Wang at Real World Technologies put a very objective review of the CELL processor (the slides for the briefing are also available), covering all the aspects disclosed at the conference. Besides the much touted 256 GFlops single-precision floating point performance the CELL processor has 25-30 GFlops in double-precision, which is useful enough for scientific computation. Linus seems interested in CELL, too."
This is a very positive review for the cell processor. It does seem like a really exciting new piece of technology. It promises a lot, and if it will do everything people say it will do, it really has the possibility to give the entire industry a big leap forward.
That being said, I think it's important not to get too excited about it... it's hard to say if it will live up to everything that people have written about it. I'm a bit skeptical. Until I see some production units doing amazing things, I'm cautiously optimistic.
I store my recipes online (the way nature intended)
Why should Linus be interested in the cell when he has the Transmeta Crusoe?
That's just sick (I think). Even cooler for Mac users who'll like the "dual threaded" PowerPC core of it, no? Can't wait for that PS3...
You can hold down the "B" button for continuous firing.
Comment removed based on user account deletion
...playing The Game of Life.
Sony so badly wants its next-generation game console to offer a super-realistic "virtual reality" experience, the company will design and build its own advanced 128-bit processor to realize this goal.
...
Processors inside game consoles usually toil away in anonymity, derided as as poor cousins to desktop chips such as Intel's Pentium line. But with Sony Computer Entertainment's ambitious plan, its chips could outclass the offerings of the world's largest chipmaker--if all goes well.
The system is so advanced, MicroDesign Resources analyst Keith Diefendorff wrote in a report that the system "has the potential to swipe a chunk of the low-end market from under the noses of PC vendors." He wrote that the platform may "signal the company's intention to move upscale from current game consoles, cutting a wider swath through the living room," with its abilities to function like a stand-alone DVD player and Internet set-top box.
Sony puts on game face with new chip
Published: May 5, 1999, 1:25 PM PDT
By Jim Davis
Staff Writer, CNET News.com
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Reason why would Linux be ported to a gaming platform or scientific platform. (Current PS2 runs linux)
.. Imagine playing Cell Games on a cell (based) game.
Why
Because they can.
Depending on Sony's marketing, think of the DBZ tie ins
The marketing got you.
What the Cell has is a "PowerPC Processing Element" which is a stripped down version of the Power5. It runs at a high frequency, but each instruction takes longer to process (more RISCy than the standard Power5).
It isn't a POWER5. It is more like a 64-bit variant of the 750VX with SMT, a chip that never appeared but otherwise looks rather similar to what has been described as the PowerPC portion of Cell.
But isn't the point of the cell processor a distributed model.
From the reviews I've seen they are touting it as if the cell communicates with other cells to handle all the processor intensive stuff.
so where one cell would not be as powerful as an x86 cpu two cells would be. And the way they have designed the things is as a seperate computer on a chip so you can basically upgrade your ?? just the same way you upgrade your memmory.
Or have I gotten the wrong end of the stick and they are designing these things for pointless fun.
Huhuhuh! Or, like, put it on a SPEEDBOAT or something...
Man, that would be SWEET!
I thought I read, within the last 2 days, that linux is the only OS they admitted to have running on these things. But I didn't find the source of that, does anyone know?
What keeps me going is my inertia.
Some time ago Chuck Moore proposed the 25x , a single chip holding a 5x5 array of simple processors. That's what this reminded me of when I first read about it. As Mr. Moore said in that Slashdot interview, "[...] the 25x is a solution looking for a problem." Cell theoretically has a lot of performance, and we're talking FLOPS not MIPS. It will certainly be useful or even revolutionary in televisions and game computers, as well as for scientific calculations. I don't see it making your desktop or server much faster though. Those don't need more FLOPS, they need more I/O bandwidth and faster peripherals, and perhaps more MIPS. I can see Cell workstations, but in the same way as you have SPARC workstations and laptops now: as development tools for the "real" hardware.
I'm a little concerend about the clock rate. How are they getting this sort of speed? There are ways to do this, but most of them would reduce the efficiency or increase size. Hardly seems to make sense to do this when it's a lot easier to simply add more processing cores.
Sheesh, /. might as well make a Cell image & category, they post so many articles about it!
From what I've seen, it will be rather low horsepower compared to the current G5s, since it will be lacking deep pipelines, caches and other bits that give the G5 much of it's speed. That's not to say that it's not really a G5, it sounds like it will support the full G5 instruction set (including Altivec) and be a true 64 bit processor core, just not a particularly fast one.
The role of the G5 cores seems to be to handle higher order logic that prepares and parses out tasks to the very fast vector units (SPEs).
So it probably does make more sense to have it as a coprocessor in a Mac, at least until compilers and software writers routinely target the cell's SPEs -- if that day ever comes. More likely specialized code will need to be written, and particular subtasks pulled out.
I suspect things like physics libraries, sound & video processing libraries, plus apps like SETI@home would be quickly written to use the SPEs, but most other software wouldn't be.
And a good one. Someone actually modded this person as Interesting. :)
Having said that, if the original poster of this thread truly does think its underpowered, one should provide a bit more elaboration besides a trollish reference to the IBM/Sony marketing machine.
I like the fact that the presenters didn't remember/know what all the acronyms were in the cell diagram. I like the interview technique too. Get em drunk and watch em talk.
I was wondering why the article was so in depth.
Quoth
"
After some discussion (and more wine), it was determined that the ATO unit is most likely the Atomic (memory) unit responsible for coherency observation/interaction with dataflow on the EIB. Then, after the injection of more liquid refreshments (CH3CH2OH), it was theorized that the RTB most likely stood for some sort of Register Translation Block whose precise functionality was unknown to those outside of the SPE. However, this theory would turn out to be incorrect.
"
Thankyou.
I've been saying something like this to my friends until I'm blue in the face, but it's amazing how well products benchmark on paper to those who have limited technical knowlege.
My prediction for the PS3 is that the games will look graphically gorgeous - pushing the bar in terms of animation complexity and polygon count - but that's going to be just about it. New look, same old games that play exactly the same as the ones from the previous generation.
...might be used to run the PS3 (assuming this is true). Outside of a weighty OS (assuming you use Windows, Mac or a Linux GUI with that nVidia) they should do better.
Besides, 256 GFlops in single-prec. can't be too bad either...can it?
You can hold down the "B" button for continuous firing.
Is it compatible with x86 in anyway?
Only in the same way that a G5 is. Through emulation.
What good is a new chip, no matter how fast it is, if you can't run anything on it?
No use at all. What's your point;) But seriosuly, we can expect to see softrware written for this. It has a lot of potential applications, and most serious number crunching hardware has a custom OS.
How fast will this chip be at general purpose stuff? Who cares if it can do 100GFLOPS on a couple operations.
That's a good question. Vector units are optimised for a certain class of operations - those where exactly the same set of operations are run on a large number of items. For a graphical application, with procedural textures we can expect very good performance, but this will fall off considerably for general purpose desktop application type stuff. This probably doesn't matter too much. These actually don't need the sort of performance modern chips can offer.
Yes, that is helpful indeed. Now, can you get me a few of these processors, so I can go port my own version?
What keeps me going is my inertia.
What good is a new chip, no matter how fast it is, if you can't run anything on it?
There is this really neat group of operating systems called Unix/Linuxes. They have a major advantage in that you only need a small amount of assembler to get going on a new chip, then the rest can be ported over in C/C++. This has been the situation for decades - Unix (and now Linux) has been the initial OS for almost all new chips.
How fast will this chip be at general purpose stuff? Who cares if it can do 100GFLOPS on a couple operations.
Reasonable point, but FLOPs are a good general measure of the speed, as they are pretty complex operations. We all used to measure speed in MIPS (Million Instructions Per Second), but as chips got so diverse, one chip's instruction could not be easily compared with another's (particularly if RISC chips were involved, where the instructions could be very minimal). FLOPs are a better measure, as a divide is a divide and a multiply is a multiply no matter what chip architecture you use.
It would be compatable with PowerPC software.
Which means that the vast majority software I use everyday would work just fine on it.
Although it would be slow... Cell isn't optimized for general purpose and the extra 'SPE's add another 128 registers to the PowerPC and VMX ISA's. Which wouldn't get used by normally compiled PowerPC code.
You would have to have GCC worked over to provide 'vectorized' code to use as much as these SPE's as possible for single threaded applications, and even then you wouldn't get much more performance out of it then a normal G4-class PowerPC proccessor.
Then you have memory managment problems to work out, probably thru a extensive firmware-based controller which would add to execution time and slow things down a little bit more.
The advantage would be if I was doing extensive multimedia or 3d work or special types of scientific research then I could use a familar enviroment (linux) as a platform to run special applications that themselves would benifit from the tremendious performance capabilities of a few of theses cells.
It would make a great chip for embedded multimedia player (at lower clockrates) and would be great for something like a non-linear video editor, but a Wintel killer it definately woudn't be.
Probably would be somewhat usefull for normal desktop usage as more and more applications are multimedia in nature, but it's not going to be substancially faster then a Intel or AMD proccessor to the end user.
But what it can do is provide backup horsepower as a math co-processor.
I see great potential for the STI Cell Processor as a SETI@Home accelerator.
Seriously though, there may be good scientific uses for these exactly as you envisioned - in a coprocessor role. From folding proteins and weather simulations to cryptoanalysis, these could provide a great entry for distributed scientific computing.
Just like games on PC's, mac's, Xbox, etc...
That has to do with poor imagination at the game-producers and nothing to do with the performance of the cell-cpu.
/.Mattsson - My native language is not English, so please don't whine over linguistic errors. (That's lame anyway...)
Here is a Coral Cache of the images page.
Then again, today's CPUs are way overpowered for the jobs they are actually doing. Most of the power is used for sometimes important, sometimes pointless stuff around the edges such as antialiasing fonts and making icons bounce up and down.
A chip designed to be able to cooperate with others should have an advantage in that kind of environment. If the CPU can concentrate on actually running the word processor, and efficiantly coordinate with others doing the peripheral activities for it, that should be a big win.
_O_
.|< The named which can be named is not the true named
Also keep in mind that the SPE's, the secondary smaller cores.. the 8 cores are called 'vector' but they are more general purpose.
They aren't realy realy vector, they just act like it. They are actually general purpose cores just very very 'risc'.
For instance they can run integer operations, too. And are SIMD and 128bits.. similar (but not compatable) to VMX/Altivec
One one core you can run 4 single precision floating point operations at once, OR 4 32bit-sized integers OR 8 word-sized operations, OR 16 byte-sized operations...
So this chip all at one time can proccoess on the SPEs a total of something like 256 32bit operations in a single clock... If your application was that parrellel, which is unlikely.
gameplay is not a function of hardware. It's a function of the game designer. So basically your saying the CELL will do exactly what they are saying it will? You just don't think the game designers are going to be designing games you want to play. This is cleverly offtopic.
If you see spelling or grammatical errors don't blame me. I tried to preview but IE here at work borked the CSS
I've been reading about the Cell processor for a few weeks now, and there is never any discussion about the operating system architecture necessary to get this thing to perform.
As I see it, its a Power PC of OK quality with 8 subsidiary processors optimised for operating a relatively simple task on a relatively small amount of memory.
So - port Linux to it? But how?. Relatively easily, to make use of the main processor, but what sort of subsystem do you build so that the subsidiary processors get used to their full potential. Perhaps part of X could be configured to run on these processors - but that would be a very manual tweak to make use of the architecture. And with the best will in the world, these processors would then sit around unused for most of the time.
What you need is a more general concept, probably at the programming language level, in which algorithmns can be expressed in such a way that the operating system can detect that they can be loaded into these subsidiary processors to be executed.
But there doesn't seem to be anything about that in the news out there. Presumably Sony are going to do something for the PS/3 - what? and is it going to be general purpose, since much of the benefit from their purposes will be a super motion graphics processor for games.
Until we understand what the software infrastructure to make use of the architecture of this new chip will be, then I can't see how we can make predictions of its success in the more general processor market. Before then its just marketing hype.
don't forget that this time ibm is part of the whole show. they aren't going to risk their reputation witch cheap tricks, that's their main business after all
does it run NetBSD? had to do it... sorry.
Unless you are computing digital orreries, whether it has 256GFlops or 256TFlops makes little difference if the memory bandwidth isn't substantially increased, and people don't increase the memory bandwidth because that has expensive consequences all over the system.
On the whole, my impression is that current mainstream CPUs have a pretty reasonable balance between CPU power and all the other system components. Changing just the CPU without making substantial (and expensive) changes to the rest of the system will not magically give you more performance.
it will be lacking deep pipelines, caches and other bits
And that is the whole point of this processor. The G5 NEEDS those pipelines and caches in order to feed the multiple execution units, reorder instructions and avoid reading slow host memory.
The CELL on the otherhand will have the instruction ordering done in software. All those 'bits' you describe are replaced with software: a much smarter compiler.
Yes this processor will perform poorly with today's code. With appropriately written code it will scream.
This chip is not going to compete with other general purpose CPUs. It's going to compete with custom ASICs and FPGAs.
-S
folks need to keep in mind these are max figures assuming software is perfectly written to take care of parallelization (does that word exist?). this means that most computer programs will hit no where near these rates, but super optimized versions of things like SETI-Home and an mpeg encoder/decoder could take advantage of it.
just remember how many developers complained about the Emotion Engine from the SP2 and how it was such a bitch to program for, this will be worse. it's first gonna require a special compiler or at least a tool to fill the code to all the independent mini-procs and reorder all the instructions to take advantage of it's little quirks. they seem to be a bit different from pipelines, but the some of the same concepts with regards to stalls will apply. so if you're working heavily on one set of data, it's quite possible only one of these mini procs will be used, and the rest will stand there and do nothing.
i think this is something that'll work much better on a video card and a maybe a soundcard than as a main processor, except in the cases where mostly only media processing is requird. settop boxes, game consoles, tvs, stereo systems, etc.
One one core you can run 4 single precision floating point operations at once, OR 4 32bit-sized integers OR 8 word-sized operations, OR 16 byte-sized operations...
All well and good, but they must be non-dependent. If operation 2 depends on the result of operation 1,or we have a lot of branching then you're dividing that performance by 4 or 8 or 16. This sort of result is not all that common for most applications that need this sort of performance, but it does happen.
It looks like the Cell will be remarkaly crappy for general purpose calculations (regular integer math, etc), but does that matter? Anything that can be vectorized and/or parallized will run really well on the Cell.
My school's 2000Mhz machine running XP feel slower to me than my old 200Mhz machine running 98. IF they write/pick the OS/software for the Cell appliances correctly I could see it making some headway as a desktop replacement. If most monitors/TVs are shipping with a good office suite and web browser then how many people are going to spring for a regular computer? I doubt it will happen but it's not out of the realm of possibility.
I did my own analysis: I think this is going to be a big deal. Get it one of the following:
P ro cessor.pdfr ocessor.pdfd f
http://homepage.mac.com/dke/.cv/dke/Public/Cell
http://www.mymac.com/fileupload/CellP
http://www.igeek.com/CellProcessor.p
The real promise of these Cells is Internet MPP. IBM (and Sony) claim that Cell PCs will be able to cluster "natively" across Internet-latency TCP/IP networks, like broadband. If they deliver on that, then performance questions will revolve around interoperable network apps, not just the raw CPU HW.
Intel's Pentium architecture was built to accomodate 6-way direct CPU interconnects. The idea was to build "cubic" structures for MPP computers. It took until the P4 to really deliver any of those, almost 10 years after the architecture was released. And the software is still bleeding-edge, and hand-rolled for each install. MPP SW techniques have evolved a lot since then, so perhaps the Cell will actually deliver on these "distributed supercomputer" promises.
--
make install -not war
If use of this processor became common, it could change the way we approach common problems, problems whose current solutions we take for granted. Instead of doing the same stuff faster, we might be doing the same stuff differently. Fractal based compression might come into normal everyday use, for example.
>> What good is a new chip, no matter how fast it is, if you can't run anything on it?
:
This is the good ol' anti-new-architectural speak. A new architecture is not necessarily a bad thing, provided
1) it's massively scalable to it's targeted size and hopefully beyond (either large or small)
2) it's easily portable
3) it's architecture doesn't have a super bottleneck (namely, x87 float point stack)
Apple managed to embrace MacOS from 68K to PowerPC.
HP wrote HP-UX for Itanium (non-emulation mode).
Digital went from VAX to Alpha.
also, just because an architecture can run everything on it doesn't mean it's successful. say, Transmeta. They're 100% capable of x86 execution, and promised support of multiple architures through virual emulation onto the native 256-bit Crusoe system. end result? a plain ol' x86 architecture with an emulation fat padded on.
Apple has good reason to embrace Cell, primarily because they wanted their machine to be a multimedia hub, and the Cell processor is perfect for that goal. Different cells will process different items of the system, and share idle resources. This doesn't mean Apple *needs* to switch MacOS totally over to Cell. Keep a generic PowerPC as the general purpose processor, but distribute multimedia code to different cells, thus freeing up the main PowerPC for non-vectorizable tasks.
Besides, 256 GFlops in single-prec. [realworldtech.com] can't be too bad either...can it?
Unfortunately single precision number ignore certain rounding conventions in order to boost the speed. You'll get super fast single precision results, but they won't be as acurate as on other systems. Probably won't matter for physics rendering in a video game (Sony's Emotion Engine did the same thing) but it could make a big difference when applied to general purpose situations.
5 years ago the "Emotion Engine" from Sony was supposed to "steal a chunk" of the PC processing market. Didn't happen. Won't happen.
All I can get from the links in the headers is a page in chinese.
The CELL on the otherhand will have the instruction ordering done in software. All those 'bits' you describe are replaced with software: a much smarter compiler.
Yes this processor will perform poorly with today's code. With appropriately written code it will scream.
Hmmm... seems like I've heard this before... oh yeah... Intel's Itanium.
IF they write/pick the OS/software for the Cell appliances correctly I could see it making some headway as a desktop replacement.
Which is the key, exactly. As Linus wrote in one of his linked form posts (from the blurb) it's gonna be a pain to program general purpose for those vector units (SPEs).
However, judging from the main review, it doesn't look like the PowerPC Element was casterated too much. It looks like it'll suffer from Pentium4 syndrome (boosting the frequency doesn't do as much as it used to) so it might not be as good as an equally clocked Power5 based processor, but I think you're looking too much at the SPEs when considering whether or not it'll compete with the x86 and Power5.
Right now, there aren't x86 and Power5 chips at 4+Ghz, and looking at Intel and AMD's roadmaps, there probably won't be for quite a while. Even if this thing is horribly inefficient for general tasks, it'll be great for Graphical/Video work, great for Physics/Scientific work, and probably at least as fast for everything else as a single core P4 3.8Ghz (which does a better job melting candles than it does holding them, most of the time).
Writing code for the secondary processors will most likely be writing microcode that will be downloaded to the processors along with the data to process. It will be completely different from writing your typical application code. I'm sure Apple, Adobe, and the large 3d software companies will have the ability to make use of it, but the only way most of us will make use of it will be through libraries providing very specific functionality. That is, of course, if they ever release enough tech details about the processors to allow for us "norms" to develop on it.
I think initially it will be libraries like the SIMD library mentioned here the other day (http://www.pixelglow.com/macstl/) that might make use of it. However, unlike AltiVec or Intel's SIMD functions, I don't think it will be possible for GCC to automatically make use of the extra processors. We could probably write an amazingly fast MP3 encoder, but if it's only single precision floating point, then maybe we won't.
Anyhow, don't get your hopes up that this magic CPU will make all your compiles go faster.
Yeah, that 25x reminds me of a CM-2 (ConnectionMachines), the main difference being that I (and others) actually wrote and ran code on the CM-2.
You may not like Michael Kanellos usually, but I think he's hit the nail on the head here.
This is a bigger, hotter, less stable chip with an exotic and hard to write-for architecture. That's fine for a gaming system with a dedicated revenue stream and no competition. It's not gonna make it outside that domain.
Substantial changes, maybe. Expensive? Perhaps not. This all depends on the base assumptions from which you operate. One of the fundamental assumptions in today's existing systems is that any and all work should be done to maximize the utilization of the CPU. However, when considering how to design other types of systems, such may not be true (it may make sense to minimize the memory footprint, for example).
If you've ever done some detailed algorithm work, you will quickly realize that there are many algorithms where you can make tradeoffs between memory and CPU time. The 'simplist' of these are the algorithms that are breadth first vs. depth first, which can trade off exponential in memory vs. exponential in time. [For a 'trivial' example, try forming the list of all operational assignments containing 6 variables and which use %, +, -, *, /, ^, &, ~, and ()... less than 50 lines of perl and you'll quickly blow through the 32-bit memory limit if written depth first, or take overnight to run breadth first]
The significant question which has been brought up - and which remains unanswered - is what software development tools will be made available. Once this is better answered, we will all be in a better position to determine what fundamental assumptions have been changed, and therefore how we can follow the new assumptions through to conclusions about the net performance of the processor and machine in which it is contained.
The ColdFusion server with linus's comments is down. Is this any surprise? What did he have to say?
what it says. Surely this is an implicit admission that Moore's Law has finally been laid to rest.
It's just not economically viable spending time trying to squeeze more power out of the current methodolgies.
Patriotism is a virtue of the vicious
Beware the coming SonyOS PC. If you've ever used any of Sony's PC software (bundled with Vaios, Camcorders, and such), you'll know exactly what I mean.
Same story I heard about Itanium. (It will be really super turbo ultra fast if we rewrite everything and toss out everything we own. Niche detected
No?
Please don't feed the trolls.
You may think me a tired, old, cynic. I'd have to disagree about the tired bit.
Since IBM is now involved, should it be called the PS/3 instead of the PS3?
My view of the Cell chip is that it's actually 2 different kinds of chips put together. It has a general processor (the POWER5 core) core, and essentially co-processors that are optimized for a totally different class of programs. The POWER5 chip would let it run your normal office applications, but the SPEs allow the chip to do things like graphics processing, audio processing, simulations, etc. All those problems that lend themselves naturally to a vectorizes solution. Together, the 2 kinds of cores on a single chip has the potential to do a lot. But there has to be tools to allow developers to make use of the potential. Especially as vectorized programs are not easy to write and optimize, that makes the quality of the development tools very important in deciding the success of the chip.
There are 10 kinds of people in the world - those that know binary, and those that don't.
SoCs with a general purpose core attached to special purpose logic are the future of computing. More and more companies are licensing ARM and PPC IP to put in FPGA fabric to control their own custom I/O, DSP logic, etc. Several things don't bode well for the Cell. It appears to be made to work only with Rambus memory - not a good sign. Size, heat and power consumption are the dominant factors when it comes to choosing a processor for embedded apps and the Cell looks like it's gonna have plenty of all three. Finally, there doesn't seem to have been any parrallel work on compiler technology to support this chip - just the standard "ohhh, we'll fix it in software, later" mindset.
Is it the first BOGOTIPS(1,000,000 BOGOMIPS) chip ever?
Being the chip that breaks the record of ammount of nothing done per second surely is sweet!
People seem to think this is leaps and bounds above everything else, but they're missing the details. In order to obtain that much performance, you'll need a task which parallelizes well so it can be broken up into chunks for the 8 SPEs. Graphics rendering falls into this set of tasks, but a lot of general applications just don't gain that much from parallel processors. Even when you have a task that does parallelize, writing parallel code is quite a bit harder than writing code for just a single thread of execution.
I've seen a lot of hype about having the Cell in your laptop talk to the Cells in your desktop, microwave, and TiVo, but you have to consider real-world limitations. When you set up a network like that (presumably wireless), you're going to be limited to around 100Mbps. In computer clusters and supercomputers, one of the main limitations of performance is the communcation bandwidth available between processors, and the latency of the network. To build a "home supercomputer", you not only need a task that parallelizes well, but one that doesn't require so much inter-node communication that it's held back by a slow network. You can't work around this problem with hardware magic - if the task you're working on requires lots of communication bandwidth, you're going to be held back.
So how much beyond a modern PC is 250GFLOPS anyway? Not much! A GeForce FX at 500MHz does 200 gigaflops. An AMD Athlon's peak performance is 2.4 GFLOPS at 600 MHz... if we scale this up to 2.2 GHz (high-end Athlon), that's 8.8GFLOPS (note: As we're talking about theoretical performance, nonlinear factors like bus speeds can be ignored). Basically, if the Cell dedicates most of its power to graphics rendering, you'll have computation power in the same range as a fast PC of today. Given that we're not going to see any products based on the Cell for a while, this isn't going to be the end of the world for Intel and nVidia (let alone the fact that Cell isn't x86).
Consoles using the Cell will have the advantage of only having to render for TV resolutions - at most 1080 lines, while PCs will be rendering at up to 1600x1200, but if you look at recent history, you can compare the xbox to a then-good PC with a GeForce3 (which came out at around the same time) - the xbox looked better, but PCs did catch up and surpass it's performance and it didn't take all that long. Consoles have to be very high-end when they're released, because the platform doesn't change for 2-3 years, and they still need to be "good enough" after a couple years, before the next generation is released.
My server
...in the form of PC games.
Anyone else notice a bit of a drop in the number of titles being produced for the PC in the last few years (or at least being severely delayed to bring out on PS2 first)?
Visual feedback and font quality may be unimportant to you, but to a large number of people not running the applications from text-only terminals, they are very important elements. Along with thing like alpha channels and drop shadows. People want these things.
Imagine if you will something like the proposed PS3, but built with only peripherals for which there is excellent Linux support. Now imagine the cost savings of building millions of them, perhaps with Walmart backing the project. If this chip lives up to the promise, then we could very well have our killer commodity desktop computer.
POWER5 is not the same as PowerPC 970 (G5). POWER5 is a really really expensive high performance mainframe chip. G5 is a server/desktop chip.
I actually thought immediately of Cellular Automata when I read some of the specs on the new Cell, and the name may just be a coincidence, but maybe not. It would be interesting to see a Cell architecture where there are 27 Cell sub-processors, because my Life is more than two dimensional.
Letter To Iran
I am dumping my MSFT shares this afternoon before Bill dumps even more of his shares. I have friends in Redmond who indicated me two weeks ago already that the rumor kitchen is cooking that STI(Sony, Toshiba, IBM) have declined Microsoft's request of collaboration in porting Windows to the Cell platform. The Cell has the potential to become the big new thing and even if Bill does not want to admit it he will try to get out some his money in time.
This weeks I, Cringely also touts cell processors as the NextBigThing.
http://www.openmp.org
The CELL SPUs might be wonderful for all of the graphics processing, but what about the other stuff? Are they at all suitable for collision detection, AI, pathfinding? If all of those tasks need to remain on the central processer, then games will just become limited by the central processor in AI code, instead of polygon rasterizing.
I agree that the potential is exciting and have a further thought. Could the development of the Cell processor have anything to do with IBM Opens Their Patent Portfolio to Open Source? This would seem to foster porting to the new arch...
How much will a PS3 cost to manufacture?
If I was a computer company, I could buy them without the game-specific stuff, load on linux, and sell them as cheep alternative computers.. but that's just me. (assuming linux and friends are compiled for CELL in the next few months of course).
The problem wiht that is a ps3 won't be anywhere close to a GP machine. It's going to require a lot of driver tweeaks, a load of hardware reconfiguration, defeat the drm. By the time someone figures hwo to do that cheap, computers will already be more powerful at a similiar cost so theres no incentive except nerd prestige.
"There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."
The article also points out that the SP floats aren't truly 754-compliant, as they round-toward-zero on cast to int.
As far as I remember from implementing the spec years ago, the rounding mode can be varied. Indeed there are C runtime functions on many platforms that set this and other properties for floating point operations.
My exception safety is -fno-exceptions.
If you're going to rip the links out of one of my Ars news posts and submit them to slashdot (in the same order in which I linked them, no less), then at least credit your source.
Senior CPU Editor | Ars Technica | http://arstechnica.com/
A FLOP is a FLOP, except that the SPE's don't quite round in IEEE approved ways in order to decrease logic complexity... so it is a *little* different.
Although this looks awesome on paper now, we all have to remember that this won't be out for some time. I have no doubt this will be fast, it all will be decided by it's scalability and price in the end.
/. that say dual cpu/core amds available soon? Better than any more of this sony hype.
By the time the cell processor is on shelves, has a operating system that can somehow thread off or pipeline everything effective through it, it'll be 2 years. In 2 years, there will be dual core/quad core/and probably 8 core/ and there will still be SMP. So we could very well have anything between 2 and 16 cores in the future from AMD or Intel.
I personally like the idea of having lots of chips that are full of cores that can handle complex tasks. The competitors feel the heat, and what cost Billions to develop years ago, now costs millions. Sony made the step, but Amd and Intel have never felt bad about copying each other or ibm in the past, and trust me, by the time cell is a commercial threat, there will be plenty of reasonable competition.
Where are the
This might not be an issue for a game console, but for a workstation, wouldn't the Cell's context-switching overhead be rather huge? From what I have read so far, it seems each SPE has 256 KB of local storage, plus another 2 KB for the main register file (128 16-byte registers) and whatever other state information it needs. Since there are 8 of these things, we're talking over 2 MB of state to swap in and out. Would that still be considered a drop in the bucket for most scheduling schemes?
Can you please post a non borked, slashdotted link to this please? Christ, the top fucking like, 5+ stories are all non-mirrored and screwed up, just my luck when there is actually some shit that's worth reading on this little DoS operation disguised as a news for nerds portal.
Stanford may be in a tizzy over their "stream processor" concept but people will have to learn to grasp the reality that not everything will benefit from that kind of architecture.
How many programs, even parallel ones, have small algorithms that work on large streams of data? Even supercomputers need large amounts of RAM per CPU.
This cell is more reminiscent of older DSPs. Great at what they do, but you aren't going to accelerate M$ Word and destroy Intel with it.
Linus in interested in giving anal to Miko Lee, too, but that doesnt mean anything is going to happen because of it.
...if it does work with iLife.
Most of you are thinking of today's applications...but what about things like eye/head tracking, voice recognition, face recognition, telepresence, real-time cinema-quality CGI, etc...those are tasks requiring large-scale numerical computation, and they all might appear on your desktop in the not-too-distant future thanks to chips like CELL and its future ancestors.
All is Number -Pythagoras.
This chip is not going to compete with other general purpose CPUs. It's going to compete with custom ASICs and FPGAs.
No, it won't. Who uses an ASIC or FPGA to make just a processor? No one, that's who. Processors are often embedded in ASICs (and sometimes even FPGAs) along with lots of other goodies. If you just need a processor, you buy an off-the-shelf ARM, VR, or any of the dozens available. You don't spend the bucks to make an ASIC. This may compete with off-the-shelf processors and some ASSP (App Specific Standard Product) but not ASIC.
everything in moderation
"Cell was co-designed by IBM which has an interest in selling workstations etc with that chip..."
That's conjecture... IBM makes money designing, and fabbing chips more than in PCs, as the selling of the division attests. But, could Sony be one of the PC outfits interested in licensing a compatible version of OS X for the living room? Network workstations running the beast might be of interest to IBM however. Does your cash register really need Windows?
How hot will this thing be? From TFA:
One unconfirmed report claims that at the extreme end of the frequency/voltage/power spectrum, one sample CELL processor was observed to operate at 5.6 GHz with 1.4 V Vdd and consumed 180 W of power.
I'm not sure about the die dimensions or what kind of cooling system will be used, but isn't that a lot of heat to dissapate? I may have to trade my P4/toaster oven in for a Cell if they're hot enough.
I've yet seen an article on how this "supercomputer on a chip" could play a role in the supercomputer market (extra-supercomputer? hypercomputer?) If you can have 25-30 GFLOP/chip, what is System X-like cluster like? The raw number is 25 GFLOP/chip*2chips/server*1000 servers = 50 TFLOP. Assuming 80% efficiency, that is still 40 TFLOP for a low price supercomputer and low power requirement. What if IBM actually build supercomputers (not clusters) using this chips?
Any comments from supercomputer experts? Does Cell make good supercomputing processors?
I'm not going to claim I completely understood the XDR memory controller section, but limiting the whole chip (SPEs + PPE) to 256MB seems to effectively rule out any high-end workstation in the immediate future.
Additionally, although there is much speculation about what the processor can run, it is pretty obvious it will not run x86 code. So in order to compete (or take over) in the PC business, it will have to do what every other new architecture in its position has failed at -- overcome the x86 existing compatibility requirement. How could that happen... Hmmm...
1) have a better instruction set -- uh, no, other processors have essentially the same instruction set. And while the relative beauty of an instruction set is often a matter of preference, few people find x86 beautiful.
2) have a higher clock rate? Well, maybe, but not really. Due to a nearly complete (albeit intentional) lack of branch predictive hardware with an 18 cycle flush penalty, it seems clear that a P4 4.0Ghz will smash an individual SPE at 4.0Ghz.
3) be cheaper and cooler? Here, we have a cell processor at 4Ghz and about half the power consumption. A definite potential win. Die size seems comparable, so production costs will be, too, in all likelihood. Packaging might be more complex for the cell, I'm not sure.
4) be more parallelizable? this is the only area in which cell can stomp every other chip. Sure, an 8-way opteron system will beat a single cell processor, but who builds 8-way opterons? And who can do it for the price of one cell system?
I don't think anyone doubts the potential power of these processors. But I think there is a long software bootstrap process to be undertaken before we see mainstream cell desktops. Businesses won't write cell code for consumers unless consumers have cell machines, and consumers aren't going to buy cell machines unless the machines run their apps. The catch-22 that has doomed every new and superior instruction set since the x86 original (backwards compatibility with current software) will likely hold back the cell as well. However, the cell brings something new to the table -- the promise of more raw power.
I think that this bootstrapping process will therefore most likely be driven by the big vendors who need the processing power. Alice and Bob can instant message just fine without one, but if you're building a renderfarm, the power and saved time is maybe going to justify the costs of porting your key apps.
As a side note, if it really does run linux soon after release, one must wonder if the killer app some have been looking for for years to move people from windows to linux may in fact be a killer processor. If you can beat the performance of any x86 windows box by an order of magnitude with a cell linux box, that's an argument that really hasn't been made before.
But right now, it looks like the only real software development market for the cell is going to be the high end workstation / performance chasers, and they need more memory than cell can deliver. I'm not holding my breath for the moment.
Either way, it'll be interesting. :)
In Soviet Russia, us are belong to all your base.
It sounds like an ideal candidate for a propietary hobby operating system with real time multitasking, preferrably coded in asm.
No more time slicing the CPU, let 1 SPE do the network, 1 SPE for sound and so on, all in parallel and real time. Leave out 1 SPE to be time sliced between all other non-important background programs.
8 simultaneous processes is much for a Personal computer.
My apologies,
I am the editor of Real World Tech, and I tried to warn the folks at our hosting company, but apparently they got caught with their pants down : )
A good slashdotting never gave us any trouble before...but with our new hosts, something gave out...
Check it out, it's a damn good read.
David Kanter
Editor
Real World Technologies
I believe he is selling so that AIDS patients in Africa can live longer and spread the disease even further, allowing the virus to mutate and becoming more virulent in the process.
He's very clever that Bill Gates.
The Opteron has stunning performance --except that it's floating point unit is lackluster. Instead of having a dual core Opteron, you could have a single core Opteron and replace the second core with a cell processor, giving a processor with outstanding I/O performance, 8 way SMP (without glue chips), and an integrated Northbridge chip (memory address controller). With the addition of being able to gang processor registers together (much like IBM's VIVA) so as to provide a Virtual Vector processor (like IBM's), you could get oustanding 512 bit Vector performance too (instead of relying on SIMD or streaming simd extensions (SSE).
Is there a Linus watch somewhere so mere mortals can try learn from the master, by following his web contributions to forums and presentations and email lists?
& w=2&r =1&s=Linus+Torvalds&q=a
The closest I know is:
http://marc.theaimsgroup.com/?l=linux-kernel
A Richard Stallman watch would be good too.
An rss feed, or maybe just a microphone pinned to him, though the keyboard clicks would get annoying, maybe a video feed of his screen....
No joking matter, it is only a matter of time, even if it will be a distraction from the business of coding!
Be Free: Free Software Tuition
The Cell is going to mean the death of x86. The Cell will also, IMO, be the chip where the Mythical Convergence actually happens. I'm not trolling. Everything starts out small in the beginning. If you read the Cell specs, you will see that the chip can run MULTIPLE OS's at the SAME TIME. Also, Microsoft owns VirtualPC which is an x86 emulator for the PPC Architecture. Microsoft is also using IBM iron in their new XBox 2, which is incidentally developed for on a G5. Chew on that for a while, then go back to the IBM literature on the Architecture.
Actually, Forth can be ported faster then C.
Thanks, are there any other forums, mailing lists and the like that Linus Torvalds or Richard Stalman contribute to?
t p://marc.theaimsgroup.com/?a=105701892400001&r= 1&w=2 [linux kernel list]
Besides:
http://www.realworldtech.com
and
ht
Maybe I am just too lazy to trawl through google or slashdot search but I have not seen this information pop up on my standard web graze...
Thank you for your time.
Be Free: Free Software Tuition
It actually makes economic sense to put vastly more processing power in a chip than it has bandwith to supply that power ... even if most of the time that processing power isnt being used, the fact that a small percentage of time it will be still makes you come out ahead.
... but their relative costs need to be taken into account when determining that balance, just balancing for the average workload is a very naive approach. See the Merrimac design paper, it makes the point better than I can.
Processing power and bandwith should be balanced
So we have this neat modular CPU/CELL thing, which is rather faster than anything available for game hardware right now.
But I understand that CELL partners with the lastest and greatest Nvidia graphic chipset so the real video performance we will get will reside as much in Nvidia's technological capability as the new base CELL.
So why is nobody talking about Nvidia's role in this hardware and how this will translate in real life performance?
If you've ever done some detailed algorithm work, you will quickly realize that there are many algorithms where you can make tradeoffs between memory and CPU time.
No, sorry, it doesn't work that way. It isn't the total amount of memory that the algorithm uses that matters, it's locality of reference And people already maximize locality of reference in their performance critical implementations because it already pays back handsomely on current processors.
I see where you're coming from, as there are reasons why locality helps (cache lines fetched in their entirety, bursts of memory from DRAM to caches, charging of different areas on a DRAM) but I disagree that there are not tradeoffs that can COMPETELY trade memory access for CPU time, regardless of the value of locality. In addition, locality becomes vitally important on Cell, where the additional processors primarily address their 256K of local memory. It will also become more and more important elsewhere as the penalty of going to main memory increases (and you can't achieve a 100% hit rate in an L1 cache).
Let me expand my original example to show you two (potentially extreme and thus stupid, but none-the-less illustrative) implementations: /. I will ignore everything else, even though this is not exhaustive (since without parenthesis, it is subject to a language's order of operations). [Note this is actually a 'real' example, in the sense that I was part of a project where were exhaustively testing a compiler for these simple assignments.]
Let's assume we want every assignment a = b _ c_ d _ e where each _ is either +, -, * or
One algorithm might calculate all possibilities of x _ y (there are 4) and then substitute b for x, c for y and d for x and e for y for each of the permutations (4 x 4 = 16 total permutations). The storage of each of the "sub equations" is done in memory, and while this fits in an L1 cache for this trivial example, with more operators and variables, it doesn't (especially if you go to 8 variables and are therefore caching permutations of w _ x _ y _ z)
Another algorithm will treat this like counting. let + = 00, - = 01, * = 10 and / = 11. Start at 000000 and increment up to 111111, at each number using the first pair of numbers for the sign between b and c, the second pair for c _ d, the third pair for d _ e.
If you expand these examples to a much larger example (do 8 variables instead of 4; N symbols + parenthesis instead of 4), you will quickly realize that storing all of the permutations by doing the precalculation (in the first example) not only blows out the memory, but also (just due to size and the need to merge the permutations) destroys any ability to do the calculation locally in memory. However, the second example, at the expense of recalculating the same thing (specifically, the patterns x + y, x - y, etc.) COMPLETELY eliminates any duplicate reference to memory (since there is one access to 'store the answer' and what this does use - the counter and the code - would all fit in an L1 cache), but this is done at the expense of run time (it will take MUCH longer to run... in the real example I cite above, it took about 100X the runtime, but about 1% of the memory, since we wrote the final results directly to disk in both cases). Yes, I realize there are middle cases that are probabily 'more ideal', but that requires one think about the target architecture, programming language, and lots of details not valuable here.
This - as pointed out in my first post - is certainly a trivial example. However, there are many other places where this type of tradeoff is valid. Geometric applications and work on graphs [graph theory] (which is where I spend my programming time now) can make a lot of effort to cache values to avoid lots of calls to sin, cos, and tan (geometry) or paths through the graph (graph theory). However, with a surfeit of computing power, should that be done, or should the memory footprint be minimized (to hopefully get better locality of reference)? Those tradeoffs are actively being made with what I am working on. After all, what's the value of code profiling tools if you don't use them :)
There is a lot of confusion in the discussion. First, Cell won't be used for graphic performance. Nvidia is developing the next generation GPU for PS3. Second, the 250 GFLOPS archived by the 'magic' Cell at 4.7 Ghz can be archive by overclocking the current GeForce FX at 500 Mhz a bit higher. GeForce FS at 500 Mhz already can reach 200 GLOPS. The excitement about Cell is probably the high communication bandwidth it will bring with Rambus technologies. All of these technologies are available today. High bandwidth switching bus has been used in high performance workstation and servers for year. The real contribution of Cell is probably to bring these wonderful to a mass market level with Sony's PS3 launch.
so it might not be as good as an equally clocked Power5 based processor
Man, you do realize the Power5 is F#$%ing FAST. The "PowerPC Processing Element" looks like it won't have much in common with the Power5 at all. I doubt you will get 1/100th the speed of the current Power5s for generic PowerPC programs on the Cell.
Cell processor does not yet exist, therefore the post was not Offtopic. You should have marked it Flamebait or Troll.
This - as pointed out in my first post - is certainly a trivial example. However, there are many other places where this type of tradeoff is valid
The bread-and-butter problems of high-performance computing usually deal with datasets that are much larger than the cache, and they need to access them repeatedly. That's true for dense matrices, sparse matrices, graphs, speech signals, images, etc. Problems whose runtime is dominated by calls to time consuming single-variable functions that are not themselves dependent on a lot of data are quite rare.
These relationships are not a lucky coincidence; rather, chip designers look at existing codes and they do simulations on them. Based on that, they decide how much chip real-estate to devote to multipliers, the evaluation of trig functions, cache memory, etc.
You want copyright on hyperlinks...
Take your medicine, seriously.