Cell Architecture Explained
IdiotOnMyLeft writes "OSNews features an article written by Nicholas Blachford about the new processor developed by IBM and Sony for their Playstation 3 console. The article goes deep inside the Cell architecture and describes why it is a revolutionary step forwards in technology and until now, the most serious threat to x86. '5 dual core Opterons directly connected via HyperTransport should be able to achieve a similar level of performance in stream processing - as a single Cell. The PlayStation 3 is expected to have have 4 Cells.'"
It's not like we haven't heard it before. It usually turns out to be halfish-truish for some restricted subset of operations in a theoretical setting, you know where you discount busses, memory and latencies.
Yeah, but can my inkjet print them?
Be relentless!
a DBZ reference: "Part 4: Cell Vs the PC"
..it probably is.
was the ps2 the supercomputer it was said to be...?
the author goes on to suggest that cell workstations would smoke x86 counterparts.. but says at the same time that there probably wont be that many of them.
wtf? though in-between the lines you can read at the end that he also thinks a single g5-cpu workstation would 'smoke' x86's...
world was created 5 seconds before this post as it is.
Something that has always confused me in gaming consoles is that, despite incredibly powerful hardware (processors, graphical chips, etc.), the system developers seemingly always neglect to put in enough RAM for most games to perform to their potential. Many PC ports often have portions compromised due to the lack of RAM, and system speeds also suffer because of this.
Seeing how RAM is increasingly becoming cheaper, is it possible that new systems like the PlayStation3 might be able to provide RAM that actually allows games to reach their potential along with this new cell hardware?
I'll believe it when I see it. Sony made outrageous claims with the PS2 in the year or so before launch, I see no reason to believe this will be any different.
On paper an Emotion Engine was supposed to destroy everything, but achieving maximum throughput was difficult and other contraints such as I/O and memory hampered performance. Programmers had to learn a very different way of programming to make full use of the processor and it's two vector units.
A Cell might be a killer chip on paper, but real-world hardware with I/O latency and memory contraints will bring things down to a more reasonable level. Don't forget that multiprocessor programming is *hard*.
Hopefully, developing software for Cell chips will be easier then the early days of the PS2, Sony has already said as much a few months ago.
Quotes from article:
"GPUs will provide the only viable competition to the Cell but even then for a number of reasons I don't think they will be able to catch the Cell."
Did this guy forget that NVidia is designing the GPU for PS3? If Cell is so almighty, why does Sony uses NVidia GPU instead of using more Cells for graphic prosessing?
"There is another reason I don't think Nvidia or ATI will be able to match the Cell's performance anytime soon."
Of course, Cell based products won't be available anytime soon either. According to the current rumors, PS3 will be available in Japan in Spring 2006 and elsewhere in Autumn 2006. One and half years equals a generation in the GPU world...
I love this kind of articles where some future products are compared against current ones and declared as a clear winners...
"No Apps"? Try every single video game publisher in the world.
.. Sony or MSFT... I'd say its absolutely no contest. Sony would crush MSFT. They have better interface design, fewer conflicting platform goals, and they'll put a PS3 in your living room for a fraction of what MSFT could.
And besides, this isn't about "Office" style apps. Its about games, and more importantly: its about home media centers. I think the Windows MCE is going to have its rear-end handed to it by the PS3.
When you consider that a cell-based PS3 could have a computational power of *several times* a 3 GHz Pentium...
You have to ask, what's more likely: that Intel can get around IBM/Toshiba patents in time for Windows to conquer the living room with a faster box? (That's if they can even build a secure, stable OS with a decent UI). Or that Sony, now armed with the worlds fastest consumer-computing platform, an enormous user base and years of TiVO experience, will own the living room media center market.
If I had to bet on who builds a better media-center PC
------ The best brain training is now totally free : )
listen to the wise old man:
With great power comes loads of software
http://www.livejournal.com/users/metricmusic
This sounds like a little PVM-cluster-on-a-chip. It also sounds like it's a pain to program and will, in the short term, suffer from the same problems that Intel's Itanium suffers from: it tries to push too much work on the compiler or software developer.
In the long term, it's nice that companies are exploring these kinds of architectures. It's not nice that they are trying to monopolize what are pretty straightforward architectural choices with patents. This may be a new CPU, but there is little that is new about having a bunch of fast processors interconnected via a reconfigurable network; these just happen to be on the same chip.
Well, perhaps "cool!" is not the correct response...
-------
Warning: Slashdot may contain traces of nuts.
Or does the logical extension of this chart:
t ributed.gif
http://www.blachford.info/computer/Cells/Cell_Dis
Make it look a little more like a HAL than a Cell?
Indeed, sounds to me like Sony's marketing behemoth is getting into top gears promoting cell in any way possible. Although this might not be directly connected to Sony. Wild claims and theorecal performance papers have been wrong in the past when it came to yet another product with mind blowing specs(Crusoe anyone).
* 4.6 GHz
* 1.3v
* 85 Celcius operation with heat sink
In toasters.. ovens..
End Communication.
Now if the can be made very fast and have only a few (2-8) coupled together...well,as it was said, that is what a nice Opteron machine does anyway nowadays.
One question which was not addressed fully in the article was how do you compile/test programs for this thing.
The potential of parallel architectures has never been in doubt since the early days of the Cray monsters - but how to compile code to use all the features efficiently has.
I don't believe that we see the full advantage of these types of architecture exploited without some similar break-through in software tools.
Mind you the hardware rocks...
Sig (appended to the end of comments you post, 120 chars)
No matter how well a processor or group of processors can run tasks concurrently it will always come down to the fact that most tasks are serial in nature and will not scale to a concurrent processing architecture. Aside from this developing multi-threaded software is extremely difficult and is rife with problems. Just ask any developer about the hardest problem to find/debug. It is pain incranate and some MT bugs can take 5+ days to find. People design serially, because a lot of tasks are essentially serial in nature, and until this design paradigm gets a major shift and we design parallel only software [LOL] then cell has no future.
Who cares? Mac OS X and Linux will provide all the applications required. Windows apps will be likely be available under emulation. The Windows market will still dominate but there will be a gradual migration when people realise there are cheaper/better realistic alternatives available at last.
It's not crap; we produced release versions of our graphics software for Windows on x86, PowerPC, MIPS and Alpha at one point. Shipped some, too. We had machines for all four architectures (still have them, in fact, though the Alpha and PowerPC's are mothballed), development tools, and working Windows OS's on all of them, and they all ran Windows NT, approximately the same version. Perfect, definitely not -- but Windows under x86 isn't perfect either. It worked well, certainly no worse than the x86 versions. We still use one of the MIPS machines as a backup file server. It refuses to die.
Now, I'm no fan of Windows, but if you think MS couldn't port Windows to another architecture beyond x86, you're only fooling yourself. They can any time they want to, they have already, three times that I know of for certain, not counting whatever credit you want to give Windows CE ports, if any, and there you have it. For all I know there may have been ports to 68k archtectures... I wouldn't be in the least bit surprised.
You have to consider that MS has more money than anyone, and if they decide to go this route, there is no reason to think they cannot do it. I doubt there is any market force, including Sony and the largest governments in the world, that could put a serious roadblock in front of them in this arena.
I've fallen off your lawn, and I can't get up.
I'm sorry, but Sony can kiss my ass.
This is from the company that said the Playstation 2 would have Toy Story quality graphics, and be able to render FF8 quality FMVs in real time (thus making FMVs no longer required). It was essentially that bullshit hype that killed the Dreamcast... so yeah, now they're at it again.
Maybe I'll be proven wrong, but I doubt their system will be able to do anywhere near what they say it can in practical application.
I'm willing to believe that a 4.6 GHz chip with 8 ALUs and high bandwidth memory would be fast, but even in bulk, there's no way they can afford to put 4 of them in a sub-$500 game console.
I've been reading PR about the Cell for years, and nothing I've ever read has seemed even remotely plausible. Is there any objective information that even comes close to substantiating any of these claims?
i didn't understand any of the document, but damn it looks fast
Nothing costs nothing
It only performs like 20 opterons in highly parallelisable tasks. Which excludes almost every task performed on the average PC, with the exception of some gaming graphics tasks (which, incidentally, are performed on specialised GPU's which vastly outperform x86 cores for their tasks anyway). Most of the time, a single cell core will perform pretty much identical to the single Power chip that controls it.
Wonder if IBM looks into the future and doesn't see PCs anywhere? Intriguing possibility.
While I tend to agree the Cell is an impressive architecture, this article is a steaming pile of B.S.
No cache for CPUs? A breakthrough? Hello! Both PSone and PS2 have the so-called scratchpad, which is what the Cell seems to have: a cache which has to be managed explicitly by the programmer. Breaking news: This is a royal pain in the ass. And calculating bandwidth when reading from this tiny scratchpads makes about as much sense as calculating the speed at which a x86 processor can execute MOV EAX, EBX.
Magically "the OS solves everything", and, in an obvious attempt to automatically get OSS-crowd support (is that "slashdot-trolling" or "slashdot-baiting"?) the triumph of Linux is predicted, because it's portable. Good luck getting the Linux kernel and GCC compiled, let alone running well on a massively parallel array of tiny CPUs without cache.
You have not read it. It will be on a specific class of tasks. It is similar to modern GPUs. They are faster then 10 opterons on a specific task.
Back to the article. The guy seems to understand hardware, but he does not understand shit about software. Once he got past the first 3 parts he started babbling. Linux on cell, so on, so fourth. If he just read his previous parts he should have hit himself on the head. The only type of linux this can run is mcLinux. There is no memory protection as such. So no Linux, no Windows past 2000, no MacOS past X, so on so fourth.
Similarly, it is all nice and well about cell software beasties making herds by themselves and cooperating on a task. I am going to be a spoilsport and ask a nasty question: Err.. What about a security model? Memory protection? Privilege model for communications? So on so fourth...
To continue on this, the power of a modern general purpose OS is the task switching. How long does it take to load and store the context of the vector processing units? Doing so requires moving their dedicated memory to main memory. This will take ages.
Overall, this is a design similar to Cray 1 initial design. Cray initial design smashed the IBM, DEC (and lesser fish) monopoly on big computing iron to bits. Unfortunately the next thing the people buying the Cray asked for was "can we share this resource between two people?". The answer was provided eventually, but by the time Cray could do all the nifty time sharing and memory management tricks necessary to do this its advantage was no longer phenomenal. And all people who could use Crays for single tasks with manual scheduling actually continued to use it that way. But it did not even dent the general purpose big iron market.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Since the main goal of the chip is to pump through graphics, regardless of what device its in, a GPU is better grounds for comparison.
From TFA: "Existing GPUs can provide massive processing power when programmed properly, the difference is the Cell will be cheaper and several times faster."
Its supposed to do 250GFlops when? 2 years from now? Apparently the Geforce 6800 Ultra will do 40GFlops and thats today.... extrapolate with some doubling here and there it seems a lot more reasonable.
So the big thing is that it comes down to programming. It came up a few times in the article "Doing this will make it faster but will make for one hell of a time for the programmers" It may have a huge potential but may take a while to get everything efficiently as Sony would like. Reminds me of when the GF3 first came out and was beaten by the GF2U in some tests. IIRC it took a while for games to come out that took advantage of its programability. It'll be interesting to see how well the programmers can fair between now and Cell's release.
AROS probably could run on it.
Change is certain; progress is not obligatory.
There are several assumptions that lead to tremendous theoretical performance figures. The simple fact is that like the Itanium, the Cell processor depends on some rather complicated software that will solve issues like parallelism, coherency etc. The article clearly states that the Cell architecture is a combination of software and hardware (1st page). This is good because performance can always increase (via a better OS or microcode) but it is also bad because it means that initial versions may not stand up to their performance claims.
;-)
Also, let's not forget that developers will be unable to keep up, unless some highly sophisticated libraries and languages are made available. I really don't expect the majority of developers to be able to cope with massive parallelism from the beggining (not just 2x SMP or hyperthreading, this needs a totally different mindset).
To sum this up: the hardware will deliver, but the software is a critical unknown in the equation. I have faith in IBM
P.
I read all five sections at once, intending to stream each chapter through separate phases from character recognition to criticism. Unfortunately, every time the article used "it's" in a predicative sense, everything ground to a halt.
Fortunately, cell reading meant I hardly noticed the claim that hardware would compete with the x86 because, unlike the x86, cell computers need all their software written for the specific hardware.
I like how "hardware-specific" becomes "OS-independent". Great I can plug my HDTV into my G/Fs "electrically powered adult novelty device", and harness the extra computing power to find out we are really alone in the world. Of course, no firmware will stand in the way.
I'm also surprised that, in pandering to all the OS underdogs in the slashdot crowd (Great day for Apple, since they like G5s; Great day for Linux, since many obsessive-compulsive coders work on Linux projects anyway), he left out a true lightweight OS designed from the ground up for just this sort of multitasking: Amiga OS 4.0. To get something like this to actually work, you'll need more than iPod huggers, OSX preachers or Linux fans. You need genuine madwomen and madmen. You need AmigaOS.
There is memory protection. Read the whole thing. What I think bit you was the fact he said there was no virtual memory... well even then his wording is confusing as virtual memory is just swaping out pages of memory as you need more. This can be done on the Cell. What I think he is talking about it adress translation. Paging hardware must not implement a full LogicalAddr==>LinearAddr==>PhysicalAddr paging/segmentation unit(I have not read the patent myself). He mentions that during runtime the adress must be physical/real and that, when running on an APU, they may be given access restrictions. I must regress though and tell you that I am no expert either. The OS is in for quite a bit of work when dispatching apulets as i can see adjusting addressing and other things will be as interesting (or more) as different scheduling mechanisms are today in current systems. To get a secure system out of this will require protected memory and if i remember correctly the Cell may be capable of running multiple OSs in parallel VMs. This can be explained by considering that IBM has their own software layer that ones OS would talk to (at least the article made it seem that way). Its amlost like having a micro kernel (or exokernel in some ways) that then have real things atached to it. Like linux for example. Linux can already be run in user mode and even ontop of the L4 micro kernel. Linux has shown to be portable enough (along with most good modern software). I would not have any doubt in seeing this happen with IBM.
ruby -le"32.times{|y|print' '*(31-y),(0..y).map{|x|~y&x>0?'
Maybe you (and others) haven't noticed, but the desktop PC is a deer in the headlights. Game machines will take over before you can say 'service contract'.
Pft. People have been saying this every time a new console generation is coming. When the upcoming Playstation 2 was hyped, some people were claiming it would easily emulate a PC at many times the speed of an x86. When it came, people couldn't take full advantage of the hardware. When they could some years later, PC hardware had surpassed it. Besides, people value the flexibility of a PC. In other words, bs then, bs now.
Being bitter is drinking poison and hoping someone else will die
Nicholas Blachford is an idiot. Do not read any of his articles. Just to give you the best of Nicholas, read his antigravity article and visit his web site:
;)
http://www.blachford.info/quantum/gravity.html
Also, look at the nose pictures of him
http://www.blachford.info/other/me.html
Seriously, the guy has burned most of his sane braincells.
For serious laugh, read his article series 'building the next generation' from osnews. I really got good laughs from that 4 part series.
Also, it didn't take long to spot a totally idiotic statement from todays slashdotted article:
> Parallel programming is usually complex but in this case the OS will look at the
> resources it has and distribute tasks accordingly, this process does not
> involve re-programming.
Here Nicholas misses the core problem of parallel programming. The program algorithms _always_ have to made parallel. The OS can't do it.
This part I agree with. His statements regarding abstraction are just flat out incorrect. Is this going to be programmed in assembly only? I think not...and if not there is significant abstraction involved. The thing that's closest to his point is that multiple *layers* of abstraction tend to add significant overhead. That doesn't mean that program-level abstractions do.
Once he got past the first 3 parts he started babbling. Linux on cell, so on, so fourth. If he just read his previous parts he should have hit himself on the head. The only type of linux this can run is mcLinux. There is no memory protection as such. So no Linux, no Windows past 2000, no MacOS past X, so on so fourth.
There is memory protection if the PU is in fact "something like a G5". IBM would have to be insane not include a MMU, and it has already stated that it's going to build workstations based on the Cell architecture.
All in all, interesting stuff...we'll see how it plays out. :-)
To continue on this, the power of a modern general purpose OS is the task switching. How long does it take to load and store the context of the vector processing units? Doing so requires moving their dedicated memory to main memory. This will take ages.
This, of course, depends on how many cells are in the box (with 8 vector units per cell) and how many tasks need vector units. The main purpose of the vector units in an interactive workstation will be multimedia processing. How many multimedia applications can you view at once? For me, the answer is one. The vector units may be useful for other things like engineering simulation and pattern matching, but once again how many different tasks using those features will be running at once? Plus if the processors are cheap enough to put 4 in a Playstation, one hopes the workstations will have 8 to 32 of them.
Overall, this is a design similar to Cray 1 initial design. Cray initial design smashed the IBM, DEC (and lesser fish) monopoly on big computing iron to bits. Unfortunately the next thing the people buying the Cray asked for was "can we share this resource between two people?". The answer was provided eventually, but by the time Cray could do all the nifty time sharing and memory management tricks necessary to do this its advantage was no longer phenomenal. And all people who could use Crays for single tasks with manual scheduling actually continued to use it that way. But it did not even dent the general purpose big iron market.
Two points. First, this is based on an already successful processor - the Power series. It already multitasks :-) and is used in a wide range of applications. Second, this will be a low-cost part. Crays were a super high-end system, which cost millions of dollars. Your analogy doesn't work.
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Secondly: anyone that buys a PC to play games on has more money than sense and is quickly parted from the latter.
TWW
"Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
There are two operating systems Microsoft have developed called Windows. DOS/Windows, the original one, was based on an x86 clone of CP/M that Microsoft bought. The first version, "Windows 1.0", was released in 1985. The last version, called "Windows Me", was released in 2000, IIRC. This OS was always x86-only, originally ran on archaic CPUs without memory protection and never supported full protected memory, symmetric multiprocessing or other (now) basic OS features.
The second OS developed by Microsoft that's marketed as Windows is Windows NT (now just called "Windows"). It was started in 1988, and never had any relation to DOS/Windows, except insofar as it can (to some extent) emulate it for compatibility reasons (including an x86 emulator on hardware that can't natively execute x86 code). Windows NT was developed on the MIPS platform, not the x86. The original plan had been to use the Intel i860 (an LIW architecture completely different from the x86) as the development platform, but the i860 hardware never met its promise, so MIPS was chosen instead.
The first version of Windows NT was released in 1993, and called "Windows NT 3.1" (3.1 was used for marketing reasons, since that was the latest version of DOS/Windows at the time). Like UNIX, it was mostly written in C, with assembly at the low level to handle hardware dependencies. At its release, Windows NT 3.1 ran on 32-bit MIPS (the development platform) and 32-bit x86 (the first port).
The second version of Windows NT (3.5) was released in 1994, and planned to add 64-bit Alpha (in a semi-crippled, 32-bit mode) and 32-bit PowerPC. However, IBM and Motorola ran into problems with the hardware (in part because of ongoing disagreements with Apple, who wanted to use their own, proprietary platform), so Windows NT 3.5 only added Alpha support. In 1995, after IBM and Motorola had managed to (mostly) sort out their problems (but with Apple declining to follow the IBM/Motorola PReP standard), the PowerPC port of Windows NT was completed, and released as version 3.51. At this point, the OS ran on MIPS, x86, Alpha and PowerPC.
In 1996, the user interface of Windows NT was upgraded to match the user interface of the popular 4.0 release of DOS/Windows (called Windows 95). Windows NT 4.0, which copied the user interface of DOS/Windows 4.0, ran on MIPS, x86, Alpha and PowerPC.
By the late 1990s, as Microsoft continued work on version 5.0 of Windows NT, the market had lost confidence in non-x86 systems for general-purpose PCs (apart from Apple Macs, which didn't follow the PReP standard, so couldn't run OSes ported to it, like AIX and Windows NT). As a result, Microsoft and the vendors of MIPS and PowerPC workstations agreed to cease development and marketing of NT 5.0 for those platforms. Windows NT 5.0 continued to be developed for the x86 and DEC Alpha architectures, into the beta releases.
DEC (which was taken over by Compaq) had continued to have hope for the Alpha as a general-purpose alternative to the x86, but financial difficulties led to the project being abandoned towards the end of the developent cycle for Windows NT 5.0 (marketed as "Windows 2000"). As a result, Windows NT 5.0, completed at the end of 1999, was the first version of NT that only ran on one platform (the x86).
A port of Windows NT 5.0 to the 64-bit Intel Itanium, including 64-bit versions of the Windows APIs (unlike the earlier Alpha port), was released in 2001, but only to select customers.
Windows NT 5.1 (marketed as "Windows XP) was also released in 2001, and again only ran on the x86, apart from another 64-bit limited release for Itanium (in 2002, IIRC).
Windows NT 5.2 (marketed as "Windows Se
One question which was not addressed fully in the article was how do you compile/test programs for this thing. The answer is OpenMP. OpenMP is mulithreading API wich can hide parallelization from the user almoste completly. It's embarassingly easy to use - only one line of code is enouth to parallelize a loop. All threads creation/synchronisation remain hidden from user. It's extremly efficient too - I was never able to achime the same level of performance if duing multithreading myself.
Actually, the CPU speed has a lot to do with graphics speed. If you look at recent performance charts for nVidia's high end GPUs in SLI setups you will find that their performance levels off unless you run the absolutely highest resoultion with top filtering and antialias settings. In fact, the high end cards are still CPU limited at the highest settings for many but the most recent games. [Tom's Hardware Guide]
In addition, programmers will always find things to do with additional CPU power. Ray traced occlusion culling to reduce the number of polygons sent to the GPU is one idea if you have extreme amounts of processing power just sitting around. That in turn would allow you to use extremely advanced pixel shaders as overdraw is almost eliminated. It would also allow you to add a few more polygons to every scene, knowing that most polygons are correctly culled.
http://www.siliconvalley.com/mld/siliconvalley/103 23259.htm
IBM has made the Cell for servers and embedded applications. I don't know much about the author of the article, but the Cell will change computing.
Here's my analysis on why Apple will use the Cell http://www.siliconvalley.com/mld/siliconvalley/103 23259.htm
My link to the analysis of Apple's use of the Cell was wrong. http://www.tweet2.org/wordpress/index.php?p=13
This was not a technology article. That was a "I for one, welcome our new cell processor overlords.." article.
I don't see anything in the cell arcitecure that would fundamentally make the same number of transistors at the same speed operate faster. I see lots of bottlenecks, IO overhead and wastet transistors. If there is some magical powerful thing that these can do SO much better than the current X86 instruction set and hardware, guess what, it'll adapt.
x86 adapted to RISC being "wildly faster" and, in the end, became better RISC than RISC was by translating more memory efficent X86 instruction onto a RISC backend. It adapted to SIMD (Single Instruciton, Multiple Data) efficiency issues by adding MMX/MMX2/SSD/SSD2 and 3DNow. It adapted to the reality of 64 bit address space and the need for more registers with the new X64 instruction set extensions. AMD and Intel could add cell hardware and instructions too if they offered anything special, which I highly doubt they will.
set softtabstop=4 shiftwidth=4 expandtab nocp worlddomination
Since the Cell processors are basically arrays of vector processors, quite similar to the shader units in GPU's, I suspect NVidia will just implement the specialized low-level 3D stuff and leave all the shader work to be done by the Cell processors.
So basically you'll have a fixed graphics core which isn't likely to change (since it hasn't for the last couple of years) and an extremely flexible and powerful array of shader units.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Considering how much IBM has invested and banked on Java, wouldn't you think they would try to design a virtual machine that would take advantage of this architecture? I wouldn't expect the JIT to be able to parallelize (???) everything, but I would think it would know how to detect and translate certain segments of code which are easy to translate to a parallel architecture.
I don't know about you, but when I first heard about cell processors (and that fact that IBM was behind it), I immediately began speculating how IBM would exploit this architecture in their server market. This sounds like the sort of thing that will enable them to sell 256 processor monsters running AIX, DB2 and J2EE.
Even if designing to take advantage of this architecture is terribly difficult, just porting your webserver, database server, and transaction will solve the scalability issues for most Web/Client/Server applications.
Wonder if they have also been working to optimize Linux for the CELL processor? I for one will be watching this very closely...
The NSA: The only part of the US government that actually listens.
I'm not actually surprised that so-called journalists, especially the technical kind, get good salaries. If you look at the painful clowns running the show at ZDNet, and most technical publications for that matter, including such wonder rags, such as the Register, you know that the Agenda is almost the most important thing. The actual realities of the tech world be damned as long as you have someone passing you your monthly wad of cash.
And this story is no different.
As many have noted, Sony did exactly this kind of hyping the last time around when the PS2, with its emotion engine, was supposed to be the future of all things computing. As everyone knows, the PS2 was a real pain to code for, and the actual performance was not better than the PC's of the day. The Cell will undoubtedly suffer from the same problems when it comes to coding real applications. Concurrency and parrallelism do not an easier coding experience make.
I have no doubt that this thing will be good, but I absolutely doubt that it will have much or any effect on the x86 world of computing. The G4 processor, when it came out with the Altivec SIMD processsor, which was apparently better than SSE at the time didn't turn Apple into the next Microsoft overnight either, did it?
So, I expect that the x86 world will continue to thrive and that Apple will stick some of these Cell processors, having as they do a PPC 970, aka G5, in their core, in some of their machines and will make the usual wild RDF claims about how hot it is while it will be used by only a small fraction of actual Mac developers in reality, the Mac having to maintain backward compatibility only slightly less then the x86 world does.
In other words, it'll be business as usual.
Lots of people have been working on auto-parallelizing compilers. The idea is to take existing code that isn't parallel and during compile time (or run time) make those decisions intelligently and speed up processing. So far, there have been zero successes at it without explicit user directives to tell the compilers where good targets for parallelization are and how to do it (specifically creating threads and/or marking loops that can be parallelized).
:))
If you (or anyone) can solve this problem well, you'd be famous and wealthy beyond the dreams of avarice (assuming you patent it and license it out
The current processing bottleneck, and the reason for caches in the first place, is the bandwidth between the processor and the memory. A "normal" memory bus cannot keep up. This is why you see so many attempts to speed this particular part of the system up. There is RAMBUS, DDR, even HyperLink.
What these guys are trying to do is move the processor to the memory rather than the inverse. Having fast expensive caches near the processor is an attempt to get the memory closer to the proc. What has been happening of late is that lots and lots of on-chip transistors have been spent on the cache. The Cell architecture is a step in the other direction. They want to spend those transistors on processors instead of memory.
At the limit of this idea you would see something like a super-granulated architecture with a processor on each memory chip. Imagine a PC with 32/64/whatever cell processors *and* no classic "processor socket" on the motherboard, just some DIMM-like "cell" slots. Each proc would have exclusive access to the memory on its own chip and all would communicate via some sort of bus or fabric of links. So, instead of one mega proc with tens of millions of transistors(perhaps half would be cache) at 4GHz with a 400MHz x 32, 64, 128, whatever bit width memory bus you'd have maybe 64, 128, 256? simple ARM-like procs at 400MHz each with something like 400MBs or more available memory bandwidth per proc.
Of course the extreme limit would be to have millions of 1 bit processors, but I don't think that anyone is proposing that just yet. Things do get more and more neuron-like as you approach this limit, interesting eh?
Good judgement comes from experience, and experience comes from bad judgement.
- W. Wriston, former Citibank CEO
Now you either need:
a) A really intelligent compiler
or
b) A really intelligent programmer
or
(c) A language and corresponding underlying concurrency theory that allows you to design and analyze complex interacting multithreaded systems with ease.
It is all a big misundertanding between Sony and IBM.
IBM told Sony it was going to "Sell" its PC busines. Sony has been telling everyone about IBM's "Cell" PC ever since.
Seriously though: For all we know, the PS3 may have four cells. (One CPU core, and three "APU" cells.) One APU for the boobs, one APU for animated low polygon count "hair", and one for inane dialog.
Maybe the new splice() based pipes in Linux can be used to move data between APUs.
Ever notice how much in common the Gamecube and the Mac Mini have in common?
In other words, media data and processing algorithms will be behind an impenetrable DRM hardware wall. "Cell programs" (the little vectorizable data manipulators) will be trade secretes. Outsiders that want to program something new will only be able to string together DRM approved cells. For example, there might be an approved MPG6 cell that will report meta-data found initially in a MPG6 stream but Rights Management interests will never permit any cell that exports all of the MPG6 data.
Why does the recommended single chip PE (processing element) include 8 DPUs? My guess is that a certified library of Cell Programs will not allow anything to be sent off chip that is not strongly encrypted. Thus one might have an 8 DPU chip where 3 are used to decrypt the input, 2 to do the actual processing, and 3 are used to encrypt the output. This off-chip disadvantage is a strong reason for putting multiple PUs and their 8 DPUs on one chip - If intercommunication between Cells cannot be detected externally then there is no need for the encryption/description stuff.