What's with the 'tude, do you work for Nvidia or something?
The truth is, if Nvidia wanted Cuda to be a standard, they should have opened it up to Khronos or whoever to make it generalized. They didn't and it will remain in the niche that it is in. I'm no Apple fan, but at least the recognized a huge hole and did something to fill it. Would a windows-only GPGPU standard really become the de facto standard? Maybe for games that may use it, but who does HPC on windows?
Central Banks and spending your way out of a downturn are Keynesian economic theories. Which certainly doesn't count "All Economists", although it has been quite popular over the last 50 years or so.
Keynesian economics is in stark contrast to the Monetarist, and Austrian schools of thought, which basically blame the central power as the root of any downturns and the more meddling, the worse we get.
In America, the Dems are strictly Keynesian, although it seems to stem, not as much from specific economic ideals as it is from a social elitist power trip.
Republicans flip flop on whatever theory seems to work best at the moment.
Either way, it's rather obvious that congress don't make their decisions based on any economic theories. They have interest groups to feed, afterall.
The Keynesians also believe in desperately kick-starting inflation as well in times like these, which explains all of the money being printed lately (the amount of printed money has *doubled* in the last year). This is an attempt to stave off evil impending deflation. Let's hope that doesn't backfire.
It all seems rather desperate. At least we'll find out if the Keynesians are right...
If you replace "PS3" with "Gamecube" and KZ2,Uncharted with Windwaker and Eternal Darkness, you start to have a very familiar story of a 3rd place console.
For what it's worth, there's something to be said for not being the console targeted for every lame shovelware game that's released.
'Brief' is right. Although they should have added "somewhat accurate hearsay" in front of it.
Seriously, the Itanium stuff is all things we've heard before, with no real details. The IBM section is completely wrong:
(Though [PowerPC] has been reincarnated as IBM's Cell processor that powers Sony's PlayStation and the architecture still powers IBM servers.)
wtf? No mention of POWER? Reincarnated as Cell? huh?
And who cares about Puma? Where are the ugly details of PA-RISC giving up the ghost because intel looked at them funny? What market are we even talking about? Desktop? servers? Very different chips are needed for different markets.
As long as we're being specific, for whiskey to be called 'Bourbon', it must be made with at least a 50% corn sour mash and must be aged at least 3 years in a new oak barrel.
It's funny how you call people stuck up for drinking Scotch, then go on to blab about stuck up ways to drink.
People can drink it however they like.
I was at a Bourbon tasting with Fred Noe (current CEO and decedent of Jim Beam) and he likes to drink with a small amount of water (enough to fit in a straw with your finger over the end) - the water opens the bouquet. He also said that anyone that tells your the "correct" way to drink is drinking for the wrong reasons. I tend to agree.
Maker's is actually a Jim Beam brand. The spelling is probably just a marketing thing, but who knows.
I love Whiskey/Whisky/Bourbon debates, though!
I lot of people I know prefer American Whiskey/Bourbon (I'm American), because they're not used to the smokey/peaty taste, but I like all kinds.
I agree about Talisker - very good. It's *really* smokey/peaty. I got to visit the Talisker Distillery last year. The tour guide lady was very interesting - we asked her what her preference was and she said that after Talisker, she liked some American Bourbons. She also said that any Scotch aged more the 12 years is too woody - which I agree with.
It seems like it would be cheaper to build a cluster out of commodity PC parts, and maybe use GPUs+CUDA to get more muscle without having to completely hand-roll your own accelerated computation code (since CUDA is roughly C).
I dont see how that would be cheaper since high end video cards cost as much as a PS3. Also, CUDA may be *rougly* C, but the code for CELL *is* C, just with familiar SIMD extensions a la MMX, altivec, etc. There's no crazy "load matrix / compute matrix" craziness like in CUDA.
Of course there is SPU management code which can be a pain -no one said it's perfect- but the point is that writing CUDA code is at least the same amount of hassle as writing Cell code. I think your only valid argument is the RAM limit.
A small handful of workloads for which the Cell chip does very well:
Modified Assignment (MASS, MASSV) various Fast Fourier Transforms (FFT) Basic Linear Algebra routines (BLAS) Image/Video Encoding Monte Carlo algoritms
It's worth noting that even with CUDA, workloads running on a video card have awful utilization. And general purpose CPUs are not set up to grind out embarrassingly parallel problems.
Ofcourse things are different with supercomputers. If you have a 1000 'processing units', where each PU would consist of say, 32 cores and some GB's RAM on a single die, that would create a memory wall between 'local' and 'remote' memory. The on-die section of main memory would be accessible at near CPU speed, main memory that is part of other PU's would be 'remote', and slow. Hey wait, sounds like a compute cluster of some kind... (so scientists already know how to deal with it).
It also sounds like you are described the Cell Processor setup. Each SPU has local memory on-die - but cannot do operations on main memory(remote). Each SPU also has a DMA engine that will grab data from main memory and bring it into its local store. The good thing is you can overlap the DMA transfer and the computation so the SPUs are constantly burning through computation.
This does help against the memory wall. And is a big reason why Roadrunner is so damn fast.
The funny thing is Microsoft probably couldn't be happier if ISO became irrelevant. That would mean no more pesky governments relying on some standard body to decide what to buy.
Hell, they probably tried their hardest to make the whole OOXML thing look like a sham. MS wins either way. They either get their standard in, or ISO falls and the smaller ISO replacements wouldn't have the clout to fight back.
McCain attacks Obama about his lack of experience and then picks Palin as his running mate, who is inexperienced, but young and female to atract some of Hillary's crew.
Obama attack McCain about being too "Old Washington" with no relation to actual folks, then picks Biden who has been around longer than McCain and is one of the biggest partisan 'attack-dogs' in the democratic party, but he brings experience and credibility.
That's politics for ya. Goes to show what each camp thought about the effectiveness of the others attacks.
Wow, I always thought Anand was a bit of an Intel fanboy, but does he ever gush of Intel in that one.
Some fun out of context quotes:
"adds a level of value to the development community that will absolutely blow away anything NVIDIA or AMD can currently (or will for the foreseeable future) offer."
"Larrabee could help create a new wellspring of research, experimentation and techniques for real-time graphics, the likes of which have not been seen since the mid-to-late 1990s."
"Larrabee is stirring up in the Old Guard of real-time 3D computer graphics, having icons like Michael Abrash on the team will help "
I read Anand quite a bit, but I think he missed the 'objective' aspect of journalism just a bit in this one.
"Lighting", normal mapping, etc. are short-cut approximations that are tweeked to look good with current graphics systems. You simply wouldn't use these techniques if you had a ray-tracing engine.
To get your beloved 'material fidelity' you would need a machine fast enough to ray-trace with a larger number of 'rays' and be able to 'bounce' and 'absorb' those rays per material.
What a rambling bunch of nonsense. While the 'negatives' you post are true, they in no way make the design a failure.
The chip was not designed to replace any type of general purpose processor in the Enterprise space. Cell is a large parallel FP number cruncher. So, no, databases or file servers and other solution that basically involves moves data around between storage and the network, is not a good fit.
Real time Video/Audio transcoding and distribution on the other hand, if needed in an Enterprise shop somewhere, would be a perfect fit, if done correctly.
The PPU is *not* a POWER5, and while DMA transfers do have inherent latency, it is not really a problem at all. Especially, if you schedule your DMAs correctly, you can almost completely recover the latency times. Message sending from PPU and SPU has no latency (their on the same chip!).
All your bitching about difficulty in programming, are grounded in fact, but compare these techniques to programming in Cuda or writing fake Opengl programs in order to take advantage of a GPUs floating point power and you'll realize that while Cell is difficult, at least it will run straight C - even on the SPUs - and you can optimize from there however much you want.
Larrabee could be more of a hedge against IBMs Cell and Nvidia's GPUs for high computational workloads. The addition of graphics being a page from Nvidias book to try and get gamers to fund their HPC conquests.
However, a WGA representative said, "The strike is NOT over -- as you know, we are under a press blackout, but I can tell you that the strike is NOT over."
I think you may be exaggerating a bit. While you're correct that the SPEs have a limited amount of memory and you have to swap data yourself, it's neither difficult or inefficient since you can copy data in while working on a different set.
Determining whether a specific workload will work well on Cell has more to do with overall system memory, as well as if the workload can be parallelized well and if it needs double precision.
I think the fact that you cant by the chips individually probably has more to do with profitability than with some secret pact with Sony.
What's with the 'tude, do you work for Nvidia or something?
The truth is, if Nvidia wanted Cuda to be a standard, they should have opened it up to Khronos or whoever to make it generalized. They didn't and it will remain in the niche that it is in.
I'm no Apple fan, but at least the recognized a huge hole and did something to fill it.
Would a windows-only GPGPU standard really become the de facto standard? Maybe for games that may use it, but who does HPC on windows?
You didn't include enough "cheekiness" in your comment there. You just coming off as an asshole.
Central Banks and spending your way out of a downturn are Keynesian economic theories. Which certainly doesn't count "All Economists", although it has been quite popular over the last 50 years or so.
Keynesian economics is in stark contrast to the Monetarist, and Austrian schools of thought, which basically blame the central power as the root of any downturns and the more meddling, the worse we get.
In America, the Dems are strictly Keynesian, although it seems to stem, not as much from specific economic ideals as it is from a social elitist power trip.
Republicans flip flop on whatever theory seems to work best at the moment.
Either way, it's rather obvious that congress don't make their decisions based on any economic theories. They have interest groups to feed, afterall.
The Keynesians also believe in desperately kick-starting inflation as well in times like these, which explains all of the money being printed lately (the amount of printed money has *doubled* in the last year). This is an attempt to stave off evil impending deflation. Let's hope that doesn't backfire.
It all seems rather desperate. At least we'll find out if the Keynesians are right...
It could be that Via's license is non-transferable, but who know when it comes to that legal mucky muck.
If you replace "PS3" with "Gamecube" and KZ2,Uncharted with Windwaker and Eternal Darkness, you start to have a very familiar story of a 3rd place console.
For what it's worth, there's something to be said for not being the console targeted for every lame shovelware game that's released.
'Brief' is right. Although they should have added "somewhat accurate hearsay" in front of it.
Seriously, the Itanium stuff is all things we've heard before, with no real details. The IBM section is completely wrong:
(Though [PowerPC] has been reincarnated as IBM's Cell processor that powers Sony's PlayStation and the architecture still powers IBM servers.)
wtf? No mention of POWER? Reincarnated as Cell? huh?
And who cares about Puma? Where are the ugly details of PA-RISC giving up the ghost because intel looked at them funny? What market are we even talking about? Desktop? servers? Very different chips are needed for different markets.
Sorry ya'll, this one's dead in the water.
As long as we're being specific, for whiskey to be called 'Bourbon', it must be made with at least a 50% corn sour mash and must be aged at least 3 years in a new oak barrel.
It's funny how you call people stuck up for drinking Scotch, then go on to blab about stuck up ways to drink.
People can drink it however they like.
I was at a Bourbon tasting with Fred Noe (current CEO and decedent of Jim Beam) and he likes to drink with a small amount of water (enough to fit in a straw with your finger over the end) - the water opens the bouquet. He also said that anyone that tells your the "correct" way to drink is drinking for the wrong reasons. I tend to agree.
Isn't it a bit early to be hitting the Scotch? I guess it's 5 o'clock somewhere...
Maker's is actually a Jim Beam brand. The spelling is probably just a marketing thing, but who knows.
I love Whiskey/Whisky/Bourbon debates, though!
I lot of people I know prefer American Whiskey/Bourbon (I'm American), because they're not used to the smokey/peaty taste, but I like all kinds.
I agree about Talisker - very good. It's *really* smokey/peaty. I got to visit the Talisker Distillery last year. The tour guide lady was very interesting - we asked her what her preference was and she said that after Talisker, she liked some American Bourbons. She also said that any Scotch aged more the 12 years is too woody - which I agree with.
Drink on!
It seems like it would be cheaper to build a cluster out of commodity PC parts, and maybe use GPUs+CUDA to get more muscle without having to completely hand-roll your own accelerated computation code (since CUDA is roughly C).
I dont see how that would be cheaper since high end video cards cost as much as a PS3. Also, CUDA may be *rougly* C, but the code for CELL *is* C, just with familiar SIMD extensions a la MMX, altivec, etc. There's no crazy "load matrix / compute matrix" craziness like in CUDA.
Of course there is SPU management code which can be a pain -no one said it's perfect- but the point is that writing CUDA code is at least the same amount of hassle as writing Cell code. I think your only valid argument is the RAM limit.
A small handful of workloads for which the Cell chip does very well:
Modified Assignment (MASS, MASSV)
various Fast Fourier Transforms (FFT)
Basic Linear Algebra routines (BLAS)
Image/Video Encoding
Monte Carlo algoritms
It's worth noting that even with CUDA, workloads running on a video card have awful utilization. And general purpose CPUs are not set up to grind out embarrassingly parallel problems.
Ofcourse things are different with supercomputers. If you have a 1000 'processing units', where each PU would consist of say, 32 cores and some GB's RAM on a single die, that would create a memory wall between 'local' and 'remote' memory. The on-die section of main memory would be accessible at near CPU speed, main memory that is part of other PU's would be 'remote', and slow. Hey wait, sounds like a compute cluster of some kind... (so scientists already know how to deal with it).
It also sounds like you are described the Cell Processor setup. Each SPU has local memory on-die - but cannot do operations on main memory(remote). Each SPU also has a DMA engine that will grab data from main memory and bring it into its local store. The good thing is you can overlap the DMA transfer and the computation so the SPUs are constantly burning through computation.
This does help against the memory wall. And is a big reason why Roadrunner is so damn fast.
The funny thing is Microsoft probably couldn't be happier if ISO became irrelevant. That would mean no more pesky governments relying on some standard body to decide what to buy.
Hell, they probably tried their hardest to make the whole OOXML thing look like a sham. MS wins either way. They either get their standard in, or ISO falls and the smaller ISO replacements wouldn't have the clout to fight back.
I use www.die.net for my linux/C ref needs.
McCain attacks Obama about his lack of experience and then picks Palin as his running mate, who is inexperienced, but young and female to atract some of Hillary's crew.
Obama attack McCain about being too "Old Washington" with no relation to actual folks, then picks Biden who has been around longer than McCain and is one of the biggest partisan 'attack-dogs' in the democratic party, but he brings experience and credibility.
That's politics for ya. Goes to show what each camp thought about the effectiveness of the others attacks.
And bunnies!
Speaking of Ray Tracing...
Check out this video showed at SIGGRAPH this week of the University of Virginia Rome model being ray traced in real time by a Cell Blade:
http://www.youtube.com/watch?v=YZnbMWy9A0Y
Nifty!
Wow, I always thought Anand was a bit of an Intel fanboy, but does he ever gush of Intel in that one.
Some fun out of context quotes:
I read Anand quite a bit, but I think he missed the 'objective' aspect of journalism just a bit in this one.
"Lighting", normal mapping, etc. are short-cut approximations that are tweeked to look good with current graphics systems. You simply wouldn't use these techniques if you had a ray-tracing engine.
To get your beloved 'material fidelity' you would need a machine fast enough to ray-trace with a larger number of 'rays' and be able to 'bounce' and 'absorb' those rays per material.
Um, you do realize the SPU have 128-bit registers, right? Which means you can do 4 32-bit floats at a time.
I'm not sure what kind of 'scientific' computations you were doing, but it sounds like you were not using the SPUs properly.
What a rambling bunch of nonsense. While the 'negatives' you post are true, they in no way make the design a failure.
The chip was not designed to replace any type of general purpose processor in the Enterprise space. Cell is a large parallel FP number cruncher. So, no, databases or file servers and other solution that basically involves moves data around between storage and the network, is not a good fit.
Real time Video/Audio transcoding and distribution on the other hand, if needed in an Enterprise shop somewhere, would be a perfect fit, if done correctly.
The PPU is *not* a POWER5, and while DMA transfers do have inherent latency, it is not really a problem at all. Especially, if you schedule your DMAs correctly, you can almost completely recover the latency times. Message sending from PPU and SPU has no latency (their on the same chip!).
All your bitching about difficulty in programming, are grounded in fact, but compare these techniques to programming in Cuda or writing fake Opengl programs in order to take advantage of a GPUs floating point power and you'll realize that while Cell is difficult, at least it will run straight C - even on the SPUs - and you can optimize from there however much you want.
You seem misinformed.
Larrabee could be more of a hedge against IBMs Cell and Nvidia's GPUs for high computational workloads. The addition of graphics being a page from Nvidias book to try and get gamers to fund their HPC conquests.
choice quote at the bottom:
I think you may be exaggerating a bit. While you're correct that the SPEs have a limited amount of memory and you have to swap data yourself, it's neither difficult or inefficient since you can copy data in while working on a different set.
Determining whether a specific workload will work well on Cell has more to do with overall system memory, as well as if the workload can be parallelized well and if it needs double precision.
I think the fact that you cant by the chips individually probably has more to do with profitability than with some secret pact with Sony.