Use bittorrent to distribute git blobs. They are immutable & append only; perfect for something like bittorrent. All you'd really need is a good means of syndication via Atom, & end users capable of understanding SCM.
AMD desperately needs to realize the synergy between the graphics and cpu factions. They've had plans for gpu-integrated cpus for a while and they must deliver in a good way. If they do, they'll be in a fantastic spot.
I'm confident AMD will hold up fine against NVidia. OpenCL should level the playing field that NVidia has dominated in GPGPU: AMD has >2x the double-precision fp performance; with a common spec for using it people hopefully will. AMD should do fine in the graphics space; they already have quite a lead in the mid tier with extremely cheap 4850's.
I'm really worried that AMD's given up on ATI's Imageon IP. The embedded world is adopting graphics hardware at an extrordinary pace; AMD bought themselves into one of the most important markets in the world and I'm worried they're going to squander it.
Intel's answer was to do what everyone else is doing: buy Imagation's PowerVR technology.
I dont understand why SMT has been so under-utilized for keeping wide cores fed. Sun's doing with with Niagra2 and getting amazing results; why isnt it an integral part of x86?
I fail to see the benefit of Explicitly Parallel Multiprocessing over SMT. Why decide in advance what has to get run together, when really all you want to do is keep all your functional units under use? I never saw the appeal in IA64. There's been Linux IA64 support for years, and aside from some number crunching HPC clusters no ones used it.
That being said, I would very much like to see x86 die.
I'd far prefer PPC; it upset me to no end that Apple went from giving up on PPC to nailing the coffin shut by buying PA-Semi & killing their PPC development. PA-Semi went ground up with their PPC design, whereas most of the players using PPC now are recycling and modifying the same ole legacy G3 & G4 designs as everyone else. PA was making a monster, and you could see the fear in the eyes of the other embedded players.
Unfortunately it looks like ARM is going to be the main contender. And I just dont see ARM as a viable desktop replacement.
Hmm, except Niagra2. Niagra2 has done just about everything right for a server play; huge interconnects, speedy specialized units, extremely super-fantastically wide processors... I'd love to see that catch on but for that to happen it would have to stop being the exclusive domain of Sun, and I dont see that as being likely. If Sun was serious they'd try to make it HT or some-such so other people could make their own mobo's & systems.
re: daoc, I cited games / dedicated gamers as a place where CPU is still important. also, the game is probably poorly coded if it affects lag. re: cpu with enough power for gpu, i think this is a common fallacy. silicon is free and getting free-er (cite: Mr. Gordon Moore). whats not free is power. why would you spend so much more power running data parallel computation on a batch of single-threaded monsters? a gpu is a simpler machine, but its one extremely tailored to data intensive ops, and its far more power efficient at them. re: oracle's flash move, again this is something for the gpu. flash is dumb as a brick and uses the cpu for everything. use something like hardware accelerated openvg and even the most basic intel integrated crap would eat it for brunch.
cpus are built for complex control flow. gpus are build for data processing. there will be a symbiosis of both eventually, but right now the cpu is over-used and the gpu is under-utilized. moving more data operations to data-processing-oriented processors is a huge boon for power & utilization efficiency.
Gaming is an exception here; theres a lot of control flow. But with regards to video editing and (almost certainly) speech input, they're both data intensive tasks that are far better tailored to the GPU. Video's already decoded on the GPU, and its only a matter of investing the necessary work to get encoding on the GPU there. Theres already some commercial solutions, and they boast incredible speed ups. Theres constant murmurings about the video card companies and/or apple getting prepped for releasing the tech for public consumption, and once that happens people will forever wonder why we spent so many years waiting for our slow as nails CPU to encode the video. Speech recognition has less incentivization, as its a real time process so putting it on faster hardware only has benefit if the software realizes the additional capability well enough that the consumer perceives the difference. Still, I'm almost certain that the data-intensive calculations of speech recognition would be better suited for the GPU.
Actually come to think of it, we have seen it happen again. We just didnt notice.
The modern PC is rarely used as little more than a thin client: a thin client to the web, where colossal server farms feed us and do our work for us. Sure its not VNC, but if you look where the cycles are getting spent, its typically far far away.
HT has a good part to do with the AMD's standing in the Top500 as well, I believe. And again, Intel has an answer-by-imitation response of QPI that we have yet to see the viability of.
Core i7's raw memory hypotheticals are almost legendary... 48GB/s with tri channel DDR3-2000. Thats very near echelons formerly reserved for video cards.
Intel's making up all the platform work AMD's been schooling them on for years. Virtualizations supposedly far improved on Nehalem too, but again like QPI I havent seen any numbers.
I'd like to de-emphasize the death of the desktop and highlight the ascendancy of hyperfunctional smaller-form-factor systems. Theres a good chance the thin client of the future may be your cell phone talking to your server, with UWB or HDMI and bluetooth keyboard interfaces.
Yes, his cunning plan was to let oems continue to ship absolutely craptacular Intel integrated video (as opposed to the mildly craptacular Intel integrated?) with stripped down Vistas, causing an uproar 18 months down the road, inciting Windows 7 to revert to software rendering skipping the video card altogether, thereby inflating the need for powerful CPUs. What a diabolical plan!
Your still talking about best-case chipsets that consistently drain 3x more power than the CPU running full tilt. And have the most atrocious graphics support imaginable.
Intel's solution to this ir-redeemable state of affairs was to buy PowerVR IP from Imagination and to ditch their own graphics core entirely. I have yet to see any 3d benches of the PowerVR kit, but I fully expect it to outperform Intel's solutions.
But, if Intels specs are to be believed, it would seem the overall system power budget nod I'd given to AMD does not exist, except for notebooks with the truly crap chipsets.
You're perspective's demented, because you think cpu performance still matters for end users. Cpu performance has always been a rat race; the difference is that its fast enough now.
Its not the numbers of computers or supercomputers you should be counting, its the number of cores. Google runs data centers with >50,000 computers; they're working on data center #20 in the states now. Yahoo, Microsoft, Sun, Ibm, Ebay, Amazon, Pixar... they all need these colossal systems to support their business. These are huge volume sales. Ask how much CPU any of these companies wants and they'll ask how much you can give them.
The desktop on the other hand is growingly irrelevant. The square-mm of the average desktop cpu are going to shrink considerably; Atom is Intel trying to cut room for x86 in clothes of devices of a much smaller size. Consumers wont need the 6 core or 12 core cpus AMD's putting out next; most can barely use the dual core they have now. In another decade I am 100% certain most desktops will have been subsumed into phones; phones with bluetooth keyboards and some hdmi-analog. Frame buffer limitations aside, we're almost at that power level already.
In the workplace, virtualization and increasing computing power will probably lead to thin clients again. Why give everyone a $900 workstation when $250 terminals and a couple heavily virtualized servers are easier to maintain?
What me and my grandparent are saying is, if you want to build big fast machines, you need someone who has a use for those super machines. And frankly I dont see any commitment aside from dedicated gamers and the businesses for whom computing is life.
And I find it hilarious: Intel consistently makes better mobile CPUs definitely but everything else they do in mobile space reeks to high heaven. To this day its nearly impossible to buy a Atom netbook without a Intel GMA based chipset: thats a 2 watt cpu and a 12-25 watt chipset. If you buy a normal laptop, its probably a 45w or 35w chip, even though the Pxx00 series is 25w and almost the same price, and again it comes with an absolutely worthless video card that sucks down >10 watts.
AMD certainly doesnt have as nice a processor offering. Their power is close (31w) but the performance just isnt as good. But in my mind they more than make up for it by always having power-thrifty chipsets boasting really good graphics capabilities. Amd's gone even further by offering PowerXpress and CrossfireX, allowing users to switch between integrated and discrete video cards or to use both at once (respectively). I'll take the un-noticable cpu speed hit for a huge power savings and good integrated video boon.
The biggest thing keeping AMD down in the mobile world is the systems. OEM's tend to slap together something in a cheap case missing half the plugs you'd expect when they put together Athlon systems.
I cant resist, on account of all the people who look at the Core i7 benches and think its all over; considering that the best "review" of a Nehalem EP (the dual socket variant) is a couple of guys who have a single screenshot for spec_fp, I'd say the battle's too early to call. All we've seen are single socket Nehalems-- & thats not been AMD's strong suit for some time.
Even considering that Intel's single socket game has been largely better for a while, there are some key areas AMD systems perform better. HPC, render farms, some web serving, virtualization... for all these places where people need a lot of cpus, AMD is has stayed in the runnings or maintained a lead (depends a lot on just what you're running). Unfortunately the benchmarks usually published dont factor in these kinds of workloads much at all. Cinebench is the only benchmark in the review anywhere near the above. I think if we ran some VMWare benchmarks, things would look drastically different.
But the real quesiton here is Intel: Intel is just now doing the infrastructure AMD did in early spring `03: QPI to AMD's HT, similar onboard memory setups... and thusfar aside from some spec_fp numbers, we have no idea whatsoever how well their implementation is going to work. Once Intel releases Nehalem EP for testing, we'll have an idea.
Using a single noise input and a single function is what I blame for proceedural generation getting a bad name: it makes terrain incredibly homogenous and bland in character. Real land is made up of faults and upswells and erosion and gulleys. Even quarter way decent proceedural generation needs the ability to mix and match different factors and features, otherwise you arrive at the same bland failure we've had for a dozen years.
proceedural isnt synonymous with homogenous and flat. you can seed different attitude & parameters for different areas. rather than imagining a world comprised of a single function, it'd be closer to a world of aggregated functions where each area has stronger or weaker impact from the various functions. each "function" just describes a particular flavor of terrain.
EVE online used a proceedural world generator. Sometime midway through development, they realized they had to tear it all down so that the answer (seed) to the life the universe and everything would indeed be 42. Thus EVE was reborn, and it was... well its EVE.
Was this prompted by Microsoft supporting Silverlight and Moonlight on 64-bit platforms from day one?
MS basically got it for free, since its all on the.net platform which runs non-native / virtual machine code from the start.
The pressure's been on for a while-- Adobe continually tries to market Flash and now AIR as "the" cross platform rich application platform, yet their software runs on vanishingly smaller and smaller amounts of machine architectures. Most people have 64 bit these days, and you'll see the same problem in 64-bit Windows with all browsers that dont merily downgrade your plugins to 32-bit if they so ask (see: IE). I'd say the demand far pre-dates Moonlight.
However, AMD already said it was backing OpenCL. I'm pissed as fuck I didnt hear anything about OpenCL this press cycle, but they're the only major graphic company to have ever stated they were getting behind OpenCL: I'm holding onto hope.
You're right: no one uses Brook. Trying to market it as any way part of the future is a joke and a mistake: a bad one hopefully brought on by a 2.50$ share price and pathetic marketting sods. On the other hand, I think people using CUDA are daft too; its pre-programmed obsolesence, marrying yourself to proprietary tech that one company no matter how hard they try will never prop up all by themselves.
OpenCL isnt due out until Snow Leopard, which is rumored to be next spring. Theres still a helluva lot of time.
Theres a lot of tall claims here, but the one that sticks out as most needing some kind of justification is that "The industry seems to have decided that the best approach to parallel computing is to mix two incompatible parallel programming models (vector SIMD and CPU multithreading)". GPU's mix these models fine and I havent seen anyone bitching about the thread schedulers on them or bitching about not being able to use every transistor on a single stream processor at the same time. How you can claim these models are incompatible, when in fact its the only working model we have and it works fine for those using it, is beyond me. You criticize the SIMD model, but the GPU is not SIMD: it is a host of many different SIMD processors, and that in turn makes it MIMD.
Moving on to what you suggest, I fail to see how superscalar out-of-order execution is not MIMD, and we've been doing that shit for years. The decoder pulls in a crap ton of things to do, assigns them to work units, and they get crunched in parallel. Multiple inputs, multiple data sources, smart cpu to try to crunch it all. Intel tried to take is a step further with EPIC explicitly parallel instruction computing and look how that fucking turned out: how many people here know what Itanium even fucking is?
The "how to solve the parallel programming crisis" link is pretty hilarious. Yes, lists of interlocking queues are badass. Unfortunately the naive implementation discussed at the link provide no allowances for cache locality. In all probability the first implementation will involve data corruption and crap performance. Ultimately the post devolves into handwaving bullshit that "the solution I am proposing will require a reinvention of the computer and of software construction methodology". This is laughable. Just because stream processing isnt insanely easy doesnt mean we have to reinvent it just so we arent burdened with dealing with multiple tasks. Even if you do reinvent it, as say XMT has done, you still have to cope with many of the same issues (xmt's utility in my mind is a bridge between vastly-superscalar and less-demanding EPIC).
Good post, I just strongly disagree. The GPU is close to the KISS philosophy: the hardware is dumb as a brick and extremely wide, its up to the programmers to take advantage of it. I find this to be ideal. I've seen lots of muckraking shit saying "this is hard and we'll inevitably build something better/easier" but a lot of people thought the internet was too simple & stupid to work too.
OpenCL is supposed to be it; its officially endorsed by Khronos (of OpenGL fame) and Apple. Release date is still unknown, drivers are even less known. Its promising a general purpose stream processing API that can inter-operate with OpenGL.
I've been scouring today's press releases for OpenCL, and thusfar I've been extremely disappointed to hear numerous promises about Brook+ (the proprietary stream api AMD originally backed which I dont give a crap about) and nothing about OpenCL. AMD better fucking not re-neg, or their hardware is going to be useless as all fuck.
Power efficiency is a good reason. If you want to play a video, your average Atom or ARM cpu might possibly have enough headroom to play your video, but it will consume close to its maximum draw to play it. A gpu on the other hand is specialized hardware that can do things like video playback with ease. Even rendering a media player visualization will heavily tax a cpu, but a gpu may hardly notice the load. Its easy to reduce these examples into something more all purpose: for nearly any task that is highly-parallelizable, there are enormous power gains to be had by putting it on special purpose massively parallel computing hardware.
I think right now you're locked in the "A gpu is an expensive pci-express addon card" mentality. In the future, the GPU will be part of the same die as the cpu. Actually, screw the future: the iPod has a PowerVR gpu built in. What the ipod does would not be possible without the gpu.
And you are further right that the implementation is relatively trivial... with a couple exceptions. I'm not sure what behavior the sychronous/asychronous flag needs to toggle, and I'm unsure what validation needs to be performed on realm. What if www.hackmyaccount.com says its realm is gmail.com?
The simple mechanics of what they propose is easy enough, but I dont feel like its well spec'd enough to be reliable or production worthy.
Thats a little extreme. Monitors overhead and split keyboards require less effort.
Use bittorrent to distribute git blobs. They are immutable & append only; perfect for something like bittorrent. All you'd really need is a good means of syndication via Atom, & end users capable of understanding SCM.
AMD desperately needs to realize the synergy between the graphics and cpu factions. They've had plans for gpu-integrated cpus for a while and they must deliver in a good way. If they do, they'll be in a fantastic spot.
I'm confident AMD will hold up fine against NVidia. OpenCL should level the playing field that NVidia has dominated in GPGPU: AMD has >2x the double-precision fp performance; with a common spec for using it people hopefully will. AMD should do fine in the graphics space; they already have quite a lead in the mid tier with extremely cheap 4850's.
I'm really worried that AMD's given up on ATI's Imageon IP. The embedded world is adopting graphics hardware at an extrordinary pace; AMD bought themselves into one of the most important markets in the world and I'm worried they're going to squander it.
Intel's answer was to do what everyone else is doing: buy Imagation's PowerVR technology.
AMD will be moving to HT3 in 2009: its a new socket. But yes, the initial release is socket compatible.
I dont understand why SMT has been so under-utilized for keeping wide cores fed. Sun's doing with with Niagra2 and getting amazing results; why isnt it an integral part of x86?
I fail to see the benefit of Explicitly Parallel Multiprocessing over SMT. Why decide in advance what has to get run together, when really all you want to do is keep all your functional units under use? I never saw the appeal in IA64. There's been Linux IA64 support for years, and aside from some number crunching HPC clusters no ones used it.
That being said, I would very much like to see x86 die.
I'd far prefer PPC; it upset me to no end that Apple went from giving up on PPC to nailing the coffin shut by buying PA-Semi & killing their PPC development. PA-Semi went ground up with their PPC design, whereas most of the players using PPC now are recycling and modifying the same ole legacy G3 & G4 designs as everyone else. PA was making a monster, and you could see the fear in the eyes of the other embedded players.
Unfortunately it looks like ARM is going to be the main contender. And I just dont see ARM as a viable desktop replacement.
Hmm, except Niagra2. Niagra2 has done just about everything right for a server play; huge interconnects, speedy specialized units, extremely super-fantastically wide processors... I'd love to see that catch on but for that to happen it would have to stop being the exclusive domain of Sun, and I dont see that as being likely. If Sun was serious they'd try to make it HT or some-such so other people could make their own mobo's & systems.
re: daoc, I cited games / dedicated gamers as a place where CPU is still important. also, the game is probably poorly coded if it affects lag.
re: cpu with enough power for gpu, i think this is a common fallacy. silicon is free and getting free-er (cite: Mr. Gordon Moore). whats not free is power. why would you spend so much more power running data parallel computation on a batch of single-threaded monsters? a gpu is a simpler machine, but its one extremely tailored to data intensive ops, and its far more power efficient at them.
re: oracle's flash move, again this is something for the gpu. flash is dumb as a brick and uses the cpu for everything. use something like hardware accelerated openvg and even the most basic intel integrated crap would eat it for brunch.
cpus are built for complex control flow. gpus are build for data processing. there will be a symbiosis of both eventually, but right now the cpu is over-used and the gpu is under-utilized. moving more data operations to data-processing-oriented processors is a huge boon for power & utilization efficiency.
Gaming is an exception here; theres a lot of control flow. But with regards to video editing and (almost certainly) speech input, they're both data intensive tasks that are far better tailored to the GPU. Video's already decoded on the GPU, and its only a matter of investing the necessary work to get encoding on the GPU there. Theres already some commercial solutions, and they boast incredible speed ups. Theres constant murmurings about the video card companies and/or apple getting prepped for releasing the tech for public consumption, and once that happens people will forever wonder why we spent so many years waiting for our slow as nails CPU to encode the video. Speech recognition has less incentivization, as its a real time process so putting it on faster hardware only has benefit if the software realizes the additional capability well enough that the consumer perceives the difference. Still, I'm almost certain that the data-intensive calculations of speech recognition would be better suited for the GPU.
Actually come to think of it, we have seen it happen again. We just didnt notice.
The modern PC is rarely used as little more than a thin client: a thin client to the web, where colossal server farms feed us and do our work for us. Sure its not VNC, but if you look where the cycles are getting spent, its typically far far away.
HT has a good part to do with the AMD's standing in the Top500 as well, I believe. And again, Intel has an answer-by-imitation response of QPI that we have yet to see the viability of.
Core i7's raw memory hypotheticals are almost legendary... 48GB/s with tri channel DDR3-2000. Thats very near echelons formerly reserved for video cards.
Intel's making up all the platform work AMD's been schooling them on for years. Virtualizations supposedly far improved on Nehalem too, but again like QPI I havent seen any numbers.
I'd like to de-emphasize the death of the desktop and highlight the ascendancy of hyperfunctional smaller-form-factor systems. Theres a good chance the thin client of the future may be your cell phone talking to your server, with UWB or HDMI and bluetooth keyboard interfaces.
Yes, his cunning plan was to let oems continue to ship absolutely craptacular Intel integrated video (as opposed to the mildly craptacular Intel integrated?) with stripped down Vistas, causing an uproar 18 months down the road, inciting Windows 7 to revert to software rendering skipping the video card altogether, thereby inflating the need for powerful CPUs. What a diabolical plan!
Your still talking about best-case chipsets that consistently drain 3x more power than the CPU running full tilt. And have the most atrocious graphics support imaginable.
Intel's solution to this ir-redeemable state of affairs was to buy PowerVR IP from Imagination and to ditch their own graphics core entirely. I have yet to see any 3d benches of the PowerVR kit, but I fully expect it to outperform Intel's solutions.
But, if Intels specs are to be believed, it would seem the overall system power budget nod I'd given to AMD does not exist, except for notebooks with the truly crap chipsets.
You're perspective's demented, because you think cpu performance still matters for end users. Cpu performance has always been a rat race; the difference is that its fast enough now.
Its not the numbers of computers or supercomputers you should be counting, its the number of cores. Google runs data centers with >50,000 computers; they're working on data center #20 in the states now. Yahoo, Microsoft, Sun, Ibm, Ebay, Amazon, Pixar... they all need these colossal systems to support their business. These are huge volume sales. Ask how much CPU any of these companies wants and they'll ask how much you can give them.
The desktop on the other hand is growingly irrelevant. The square-mm of the average desktop cpu are going to shrink considerably; Atom is Intel trying to cut room for x86 in clothes of devices of a much smaller size. Consumers wont need the 6 core or 12 core cpus AMD's putting out next; most can barely use the dual core they have now. In another decade I am 100% certain most desktops will have been subsumed into phones; phones with bluetooth keyboards and some hdmi-analog. Frame buffer limitations aside, we're almost at that power level already.
In the workplace, virtualization and increasing computing power will probably lead to thin clients again. Why give everyone a $900 workstation when $250 terminals and a couple heavily virtualized servers are easier to maintain?
What me and my grandparent are saying is, if you want to build big fast machines, you need someone who has a use for those super machines. And frankly I dont see any commitment aside from dedicated gamers and the businesses for whom computing is life.
Yes Intel laptop sales are better.
And I find it hilarious: Intel consistently makes better mobile CPUs definitely but everything else they do in mobile space reeks to high heaven. To this day its nearly impossible to buy a Atom netbook without a Intel GMA based chipset: thats a 2 watt cpu and a 12-25 watt chipset. If you buy a normal laptop, its probably a 45w or 35w chip, even though the Pxx00 series is 25w and almost the same price, and again it comes with an absolutely worthless video card that sucks down >10 watts.
AMD certainly doesnt have as nice a processor offering. Their power is close (31w) but the performance just isnt as good. But in my mind they more than make up for it by always having power-thrifty chipsets boasting really good graphics capabilities. Amd's gone even further by offering PowerXpress and CrossfireX, allowing users to switch between integrated and discrete video cards or to use both at once (respectively). I'll take the un-noticable cpu speed hit for a huge power savings and good integrated video boon.
The biggest thing keeping AMD down in the mobile world is the systems. OEM's tend to slap together something in a cheap case missing half the plugs you'd expect when they put together Athlon systems.
I cant resist, on account of all the people who look at the Core i7 benches and think its all over; considering that the best "review" of a Nehalem EP (the dual socket variant) is a couple of guys who have a single screenshot for spec_fp, I'd say the battle's too early to call. All we've seen are single socket Nehalems-- & thats not been AMD's strong suit for some time.
Even considering that Intel's single socket game has been largely better for a while, there are some key areas AMD systems perform better. HPC, render farms, some web serving, virtualization... for all these places where people need a lot of cpus, AMD is has stayed in the runnings or maintained a lead (depends a lot on just what you're running). Unfortunately the benchmarks usually published dont factor in these kinds of workloads much at all. Cinebench is the only benchmark in the review anywhere near the above. I think if we ran some VMWare benchmarks, things would look drastically different.
But the real quesiton here is Intel: Intel is just now doing the infrastructure AMD did in early spring `03: QPI to AMD's HT, similar onboard memory setups... and thusfar aside from some spec_fp numbers, we have no idea whatsoever how well their implementation is going to work. Once Intel releases Nehalem EP for testing, we'll have an idea.
Using a single noise input and a single function is what I blame for proceedural generation getting a bad name: it makes terrain incredibly homogenous and bland in character. Real land is made up of faults and upswells and erosion and gulleys. Even quarter way decent proceedural generation needs the ability to mix and match different factors and features, otherwise you arrive at the same bland failure we've had for a dozen years.
proceedural isnt synonymous with homogenous and flat. you can seed different attitude & parameters for different areas. rather than imagining a world comprised of a single function, it'd be closer to a world of aggregated functions where each area has stronger or weaker impact from the various functions. each "function" just describes a particular flavor of terrain.
EVE online used a proceedural world generator. Sometime midway through development, they realized they had to tear it all down so that the answer (seed) to the life the universe and everything would indeed be 42. Thus EVE was reborn, and it was... well its EVE.
Was this prompted by Microsoft supporting Silverlight and Moonlight on 64-bit platforms from day one?
MS basically got it for free, since its all on the .net platform which runs non-native / virtual machine code from the start.
The pressure's been on for a while-- Adobe continually tries to market Flash and now AIR as "the" cross platform rich application platform, yet their software runs on vanishingly smaller and smaller amounts of machine architectures. Most people have 64 bit these days, and you'll see the same problem in 64-bit Windows with all browsers that dont merily downgrade your plugins to 32-bit if they so ask (see: IE). I'd say the demand far pre-dates Moonlight.
Yes yes yes and yes.
However, AMD already said it was backing OpenCL. I'm pissed as fuck I didnt hear anything about OpenCL this press cycle, but they're the only major graphic company to have ever stated they were getting behind OpenCL: I'm holding onto hope.
You're right: no one uses Brook. Trying to market it as any way part of the future is a joke and a mistake: a bad one hopefully brought on by a 2.50$ share price and pathetic marketting sods. On the other hand, I think people using CUDA are daft too; its pre-programmed obsolesence, marrying yourself to proprietary tech that one company no matter how hard they try will never prop up all by themselves.
OpenCL isnt due out until Snow Leopard, which is rumored to be next spring. Theres still a helluva lot of time.
Theres a lot of tall claims here, but the one that sticks out as most needing some kind of justification is that "The industry seems to have decided that the best approach to parallel computing is to mix two incompatible parallel programming models (vector SIMD and CPU multithreading)". GPU's mix these models fine and I havent seen anyone bitching about the thread schedulers on them or bitching about not being able to use every transistor on a single stream processor at the same time. How you can claim these models are incompatible, when in fact its the only working model we have and it works fine for those using it, is beyond me. You criticize the SIMD model, but the GPU is not SIMD: it is a host of many different SIMD processors, and that in turn makes it MIMD.
Moving on to what you suggest, I fail to see how superscalar out-of-order execution is not MIMD, and we've been doing that shit for years. The decoder pulls in a crap ton of things to do, assigns them to work units, and they get crunched in parallel. Multiple inputs, multiple data sources, smart cpu to try to crunch it all. Intel tried to take is a step further with EPIC explicitly parallel instruction computing and look how that fucking turned out: how many people here know what Itanium even fucking is?
The "how to solve the parallel programming crisis" link is pretty hilarious. Yes, lists of interlocking queues are badass. Unfortunately the naive implementation discussed at the link provide no allowances for cache locality. In all probability the first implementation will involve data corruption and crap performance. Ultimately the post devolves into handwaving bullshit that "the solution I am proposing will require a reinvention of the computer and of software construction methodology". This is laughable. Just because stream processing isnt insanely easy doesnt mean we have to reinvent it just so we arent burdened with dealing with multiple tasks. Even if you do reinvent it, as say XMT has done, you still have to cope with many of the same issues (xmt's utility in my mind is a bridge between vastly-superscalar and less-demanding EPIC).
Good post, I just strongly disagree. The GPU is close to the KISS philosophy: the hardware is dumb as a brick and extremely wide, its up to the programmers to take advantage of it. I find this to be ideal. I've seen lots of muckraking shit saying "this is hard and we'll inevitably build something better/easier" but a lot of people thought the internet was too simple & stupid to work too.
OpenCL is supposed to be it; its officially endorsed by Khronos (of OpenGL fame) and Apple. Release date is still unknown, drivers are even less known. Its promising a general purpose stream processing API that can inter-operate with OpenGL.
I've been scouring today's press releases for OpenCL, and thusfar I've been extremely disappointed to hear numerous promises about Brook+ (the proprietary stream api AMD originally backed which I dont give a crap about) and nothing about OpenCL. AMD better fucking not re-neg, or their hardware is going to be useless as all fuck.
Power efficiency is a good reason. If you want to play a video, your average Atom or ARM cpu might possibly have enough headroom to play your video, but it will consume close to its maximum draw to play it. A gpu on the other hand is specialized hardware that can do things like video playback with ease. Even rendering a media player visualization will heavily tax a cpu, but a gpu may hardly notice the load. Its easy to reduce these examples into something more all purpose: for nearly any task that is highly-parallelizable, there are enormous power gains to be had by putting it on special purpose massively parallel computing hardware.
I think right now you're locked in the "A gpu is an expensive pci-express addon card" mentality. In the future, the GPU will be part of the same die as the cpu. Actually, screw the future: the iPod has a PowerVR gpu built in. What the ipod does would not be possible without the gpu.
Yes NX is opensourced and the most active developer is FreeNX. What I'd like to see though is documentation of the core protocol.
And you are further right that the implementation is relatively trivial... with a couple exceptions. I'm not sure what behavior the sychronous/asychronous flag needs to toggle, and I'm unsure what validation needs to be performed on realm. What if www.hackmyaccount.com says its realm is gmail.com?
The simple mechanics of what they propose is easy enough, but I dont feel like its well spec'd enough to be reliable or production worthy.