Nvidia's Chief Scientist on the Future of the GPU
teh bigz writes "There's been a lot of talk about integrating the GPU into the CPU, but David Kirk believes that the two will continue to co-exist. Bit-tech got to sit down with Nvidia's Chief Scientist for an interview that discusses the changing roles of CPUs and GPUs, GPU computing (CUDA), Larrabee, and what he thinks about Intel's and AMD's futures. From the article: 'What would happen if multi-core processors increase core counts further though, does David believe that this will give consumers enough power to deliver what most of them need and, as a result of that, would it erode away at Nvidia's consumer installed base? "No, that's ridiculous — it would be at least a thousand times too slow [for graphics]," he said. "Adding four more cores, for example, is not going anywhere near close to what is required.""
It won't work, since linux will never get the drivers for it going.
Pretty good read; interesting that this guy is talking to press a lot more:
http://www.pcper.com/article.php?aid=530
Must be part of the "attack Intel" strategy?
Everything will be integrated into one chip, and we will call it the PU.
I would never have expected nVidia's chief scientist to say that nVidia's products would not soon be obsolete.
CPU based GPU will not work as good as long as they have to use the main system ram also heat will limit there power. NVIDAI should start working HTX video card so you can the video card on the cpu bus but it is on a card so you put ram and big heat sinks on it.
From TFA> The ability to do one thing really quickly doesn't help you that much when you have a lot of things, but the ability to do a lot of things doesn't help that much when you have just one thing to do. However, if you modify the CPU so that it's doing multiple things, then when you're only doing one thing it's not going to be any faster.
David Kirk takes 2 minutes to get ready for work every morning because he can shit, shower and shave at the same time.
There wasn't a horizon given on his predictions. What he said about the important numbers being "1" and "12,000" means consumer CPUs have about, what, 9 to 12 years to go before we get there? At which point it'd be foolish /not/ to have the GPU be part of the CPU. Personally, I think it'll be a bit sooner than that. Not next year, or the year after; but soon.
Those who fail to understand communication protocols, are doomed to repeat them over port 80.
Graphics card man says that CPU's not a threat to his businees. I'm shocked!
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
..there's discrete chips, but on the low end there's already integrated chipsets and I think the future is heading towards systems on a chip. A basic desktop with hardware HD decoding and 3D enough to run Aero (but not games) can be made in one package by Intel.
Live today, because you never know what tomorrow brings
He then quipped, "Go away kid, ya bother me!"
"If the market wants to move away from GPU integration, let it, but we're not going to help it along..."
So I am going which ever manufacturer has the best drivers for my platform of choice, Linux. So if the future doesn't hold this for Nvidia, it doesn't really interest me.
"Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
Instead of 4 CPU cores on a quad-core chip, why not put 2xCPU cores and 2xGPU cores?
Right, come back in 5 years when we have multi core processors with integrated spe-style cores, GPU and multiple memory controllers.
NVidia are putting a brave face on it but they're not fooling anybody.
I was using the Old English sense of the phrase "away from", which actually means "toward".
... core processor? I don't understand the author's logic. Now, suppose it's 2012 or so and multiple core processors have gotten past their initial growing pains and computers are finally able to use any number of cores each to their maximum potential at the same time.
A logical improvement at this point would be to start specializing cores to specific types of jobs. As the processor assigns jobs to particular cores, it would preferentially assign tasks to the cores best suited for that type of processing.
truthfully only real application for the gpu/cpu hybrid would be in laptop use where they can get away with using lower end gpu chips
The more Nvidia gets sassy with Intel, the closer they seem to inch toward VIA.
This has been in the back of my mind for awhile... Could NV be looking at the integrated roadmap of ATI/AMD and thinking, long term, that perhaps they should consider more than a simple business relationship with VIA?
Your mind is clear / The things that you fear / Will fade with how much you / Believe what you hear
Yeah... so all you have to do is turn every problem into one that GPUs are good at... lots of parallelism and lots of data... but not all problems are like that (heck, the majority of problems aren't like that). GPU stream processors do fairly simple jobs compared to what a (general purpose) CPU does *and* what they do is extremely parallel (embarassingly parallel). All that OOOE, branch prediction, memory management, and all those other features take silicon to make fast. That's the reason general purpose CPUs have few cores per die.
Stream processors are very simple in comparison and don't require nearly as much silicon to implement, which is why we have over 100 of them on some chips. When you add the complexity that the general purpose CPU has to deal with to the GPU processors, you will eventually be in the same boat.
Or maybe perhaps NVIDIA has been showing their graphics cards running a database engine? or even an OS as we are used to using (memory protection, etc.) What about compiling source code?
The future is asymmetric cores on a single die. The DSPs and Cell are early forms of this but still too hard to deal with. OS kernels and compilers have to become smarter: the OS knows which cores can do what and the compiler can tell what kinds of things a program expects to do and puts that into the executable, the OS matchs the executable with the cores that best satisfy what the program needs (closest minimal match), perhaps even dynamically as different sections of a program are 'marked' by the compiler to let an OS know when to schedule the process for a different type of core.
Today, they are explicitly programmed... the 'main' CPU makes library calls, basically, that use the other cores to do stuff, more like coprocessors. All this stuff will eventually need to be done automatically.
...stink.
There's the sun reflecting off the cars, there's the cars reflecting off each other, there's me reflecting off the cars. There's the whole parking lot reflecting off the building. Inside, there's this long covered walkway, and the reflections of the cars on one side and the trees on the other and the multiple internal reflections between the two banks of windows is part of what makes reality look real. AND it also tells me that there's someone running down the hall just around the corner inside the building, so I can move out of the way before I see them directly.
You can't do that without raytracing, you just can't, and if you don't do it it looks fake. You get "shiny effect" windows with scenery painted on them, and that tells you "that's a window" but it doesn't make it look like one. It's like putting stick figures in and saying that's how you model humans.
And if Professor Slusallek could do that in realtime with a hardwired raytracer... in 2005, I don't see how nVidia's going to do it with even 100,000 GPU cores in a cost-effective fashion. Raytracing is something that hardware does very well, and that's highly parallelizable, but both Intel and nVidia are attacking it in far too brute-force a fashion using the wrong kinds of tools.
Because if processing power goes up way past what you generally need for even heavy apps, Nvidia still want you to believe that you need a separate graphics card. If that model were to change at some point it would be death for graphics card manufacturers. Of course, they could very well be right. What the hell do I know :P
The Long Now Foundation
The pattern set by the whole CPU / Math Co-Processor integration showed the way. For those old enough to remember, once upon a time the CPU and Math Co-Processor were separate socketed chips. Specifically you had to add the chip to the MOBO to get math functions integrated.
The argument back then is eerily similar to the same as proposed by NV chief, namely the average user wouldn't "need" a Math Co-Processor. Then came along the Spreadsheet, and suddenly that point was moot.
Fast forward today, if we had a dedicated GPU integrated with the CPU, it would eventually simplify things so that the next "killer app" could make use of commonly available GPU.
Sorry, NV, but AMD and INTEL will be integrating GPU into the chip, bypassing bus issues and streamlining the timing. I suspect that VIDEO processing will be the next "Killer App". YouTube is just a precursor to what will become shortly.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
How many more pixels do you think you need? I'm glad they are looking ahead to the point when graphics is sitting on chip.
(current high end boards will push an awful lot of pixels. Intel is a generation or two away from single chip solutions that will push an awful lot of pixels. Shiny only needs to progress to the point where it is better than our eyes, and it isn't a factor of 100 away, it is closer to a factor of 20 away, less on smaller screens)
Nerd rage is the funniest rage.
We already have multi-core systems and they rarely improve gaming. Why? Because almost all games are not coded to take advantage of a SMP environment.
Even Carmack himself will tell you that it is very challenging to develop a truly multi-threaded app, especially when a real-time component exists.
Most games operate in a synchronous state machine type fashion with rendering just another step in the cycle. This does not lend itself to parallelism. In order to truly take advantage of SMP, most of the big game engines in use would have to be re-written from the ground up.
Unless we move the engine to the GPU, then Intel would really be in trouble. Already moving that direction piecemeal... shaders (well sort of, dynamic code on the GPU counts in my book), aegia physx anyone?
Despite what some major 3D game engine creators have to say if real-time ray tracing comes sooner than later, at about the time an eight core CPU is common, I think we might be able to do away with the graphics card especially considering the improved floating point units going in next gen. cores. Consider Intel's QuakeIV Raytraced running at 99fps at 720P on a dual quad-core Intel rig at IDF 2007. This set-up did not use any graphic card processing power and scales up and down. So, if you think 1280x720 is a decent resolution AND 50fps is fine you can play this now with a single quad-core processor. Now imagine it with dual octo-cores which should be available when? Next year? I hazard 120fps at 1080P on your (granted) above average rig doing real time ray tracing some time next year IF developers went that route AND still playable resolutions and decent fps with "old" (by that time) quad-cores.
'He was a dreamer, a thinker, a speculative philosopher... or, as his wife would have it, an idiot.' - Douglas Adams
I'm more interested in the future of open gpu drivers... that fancy hardware is next to useless without them unless you are pretty technical and don't mind poisoning your system, or use that other OS.
ATI will win if nVidia doesn't follow suit. My next card will be ATI on OSS drivers unless nVidia does something similar to keep me interested.
-AC
Why not make one of the multiple cores a GPU, then the speed at which it communicates with the CPU will be at clock speed.
Problem solved.
Of course Nvidia will need to come up with a CPU.
Cheers
* Carthago Delenda Est *
I don't think you understand the difference between GPUs and CPUs. The number of parallel processes that a modern GPU can run is massively more than what a modern multi-core CPU can handle. What you're talking about sounds like just mashing a CPU core and GPU core together on the same die. Which would be horrible for all kinds of reasons (heat, bus bottlenecks and yields!).
;)
Intel has already figured out that for the vast majority of home users have finally caught on that they don't NEED more processing power. Intel knows they have to find some other way to keep people buying more in the future. How many home users need more than a C2D E4500? Will MS Word, web browser and an email client change that much in the next 3-5 years that will demand more horsepower that is available today?
Then again, you might need 32 CPU cores on a single die if you want to run that AT&T browser
Only Amiga made it possible! (Thanks to custom chips, not in spite of them.)
It doesn't seem likely that one generic item would be better at something than many specific ones. Sure CPU+GPU would just be all in one chip but why would that be better than many chips? Maybe if it had RAM inside aswell and that enabled faster FSB.
well, yeah, for sure. But I see that as only the first step. It's like the math-coprocessor step. My 32-core cpu has six graphics cores, four math cores, two HD video cores, an audio core, 3 physics, ten AI, and 6 general cores. But even that only lasts long enough to reach the point where mass production benefits exceed the specialized production benefits.
It'll also be the case that development will start to adjust back towards the cpu. Keep in mind, I don't think even one game exists now that is actually built to be dependent on even two cores. We're still dropping video frames as a preference. I await the day when other things get dropped. Imagine where AI gets dumber on a slower machine. Or sound FX are reduced. Or any number of other code paths are eliminated. Hey, no ones even reducing the video quality for 10 of 20 frames. All things that become possible with poly-core machines. Obviously raytracing takes that concept even further.
The trickle-down of core programming for many cores -- heh -- is the leaps-and-bounds concept that moves industries.
Either way, I'm back to my tried-and-true statement that what brought about the computer world in the first place was the concept of shared resources -- which includes the cpu. The same thing will happen again. Because the alternative is rediculous. Do you want to play a game on your GPU for graphics, your Xonar for sound, your physX for physics, your AIntelligence for AI (I made that up), and have your cpu do nothing but handle keyboard input? That's just silly. And it reminds me of the days when music was produced differently than sound fx. A computer is not a whole bunch of individual components in one box. It's about the box that can do anything. And when it can't, it gets a little expansion card. That expansion card usually handles some external component, or does something particularly unusual.
A GPU doesn't handle anything external, and certainly not something unusual. Every machine, always, at every moment, produces high-quality display elements. It's silly to make that a separate component.
Also, look at the prices. A decent CPU is $100 - $300, and a decent GPU is $150 - $400. There's a lot of money there when combined into a $250 - $700 device. It'll also be great to spend more on my CPU that on my hard drives. What a concept.
It is very ridiculous, because if you can put 8 cores in a single dye, then you can put a lot more Multiprocessors then a current GPU already have. And this GPUs are very scalable and the software the runs in it are very simple, so you need simpler threads.
And this is what happens. Current GPUs can run 512 threas in parallel. Suppose you have 8 core with Hyperthreading, you could run, squeezing everything, 16 threads top. And there isn't any 8 core for sale, isn't?
If you have Eight or X Cores, Couldn't one or two (or X-1) be dedicated to run MESA (or a newer, better, software GL implementation)? IIRC, SGI's linux/NT workstation 350's had their graphics tied into system RAM (which you could dedicate huge amounts of RAM for), and they worked fine.
Nvidia makes SIMD (single instruction, multiple data) multicore processors while Intel, AMD and the other players make MIMD (multiple instructions, multiple data) multicore processors. These two architectures are incompatible, requiring different programming models. The former uses a fine grain approach to parallelism while the latter is coarse-grained. This makes for an extremely complex programming environment, something that is sure to negatively affect productivity. The idea that the industry must somehow resign itself to an uneasy marriage between the two approaches is nonsense. Logic dictates that universality should be the main goal of multicore research. The market is crying for a super fast, fine-grain and easy to program, MIMD multicore architecture that can handle any kind of parallel computing task. Neither Nvidia, Intel, AMD or the others even come close to delivering what the market wants. And as we all know, what the market wants, the market will get. So my point is that Nvidia should not rest on its laurels because their technology is bound to become obsolete as soon as someone figures out how to make the right multicore processor and kicks everybody's ass in the process. Read Nightmare on Core Street for a good analysis of where the changing multicore landscape is going. In the meantime, I advise everybody in the multicore business to thread carefully. Big money is in the balance. And I mean, BIG MONEY.
I think the interviewer wasn't asking the right questions. His answer was for why you can't replace a GPU with an N-core CPU, not why you wouldn't put a GPU on the same die with your CPUs. I think his answers in general imply that it's more likely that people will want GPU cores that aren't attached to graphics output at all in the future, in addition to the usual hardware that connects to a monitor. I wouldn't be surprised if it became common to have a processor chip with 4 CPU cores and 2 GPU cores, and also have a graphics card with another GPU or 2 in addition to video output.
He is right that having a 16-core CPU won't do a number of common tasks efficiently, compared to a single massively-SIMD core.
I was under the impression that optimal bus design used to different but that was sort of going away with the move to multi core designs.
Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
The five year window might not be in the cards, but I've got two words for you: ray tracing.
Pretty much the only way to continue Moore's Law that I can see is via additional cores. If you had 128 cores, you would no longer care about polygons. Polygons = approximations for ray tracing. Nvidia = polygons.
Supreme Commander is the game that requires 2 cores (well, ok you can drop the frame rate, polygon levels and other fidelity settings of course. Nobody would ever release a game that couldn't be played on a single core machine)(not yet at least).
I think, considering the diminishing returns from adding cores, that adding specialised units on die would make sense. Look at how good a GPU version of folding@home is, and think how that kind of specialised processign could be farmed off to a specialised core. Not necessarily for graphics as I think Nvidia will continue to sell better and better graphics cards.
If the die had the co-processor on it, and CPU extensions to support it, then compiler writers would use it and some processing tasks could fly along.
And the reason why wouldn't they put these things into the existing CPU cores is probably complexity. A dedicated core must be easier to design and develop that bloating existing ones with added features and extensions.
I think a better question is "Why wouldn't we have a separate multi-core GPU along with the multi-core CPU?" While I agree that nVidia is obviously going to protect it's own best interests, I don't see the GPU/CPU separation going away completely. Obviously there will be combination-core boards in the future for lower-end graphics, but the demand on GPU cycles is only going to increase as desktops/games/apps get better. However, one of the huge reasons that video cards are a productive industry is that there are plenty of high-end graphical demands out there, from hardcore gamers to Autocad applications. Ever seen the number of cycles/graphical processesing power it takes to run a digital 911 map? Unbelievable!
;-)
Seriously, if there is anything that history has taught us, it's that there's room for the integrated (low-end) and dedicated (high-end) graphics at the same time, as they server different niches.
Oh, and never get involved in a land war in Asia
I think it's fairly clear that GPUs will stick around until we either have so much processing power and bandwidth we can't figure out what to do with it all, at which point it makes more sense to use the CPU(s) for everything, or until we have three-dimensional reconfigurable logic (optical?) that we can make into big grids of whatever we want. A computer that was just one big giant FPGA with some voltage converters and switches on it would make whatever kind of cores (and buses!) it needed on demand. Since we're not Buck Rogers and this ain't the 24th century, GPUs will probably be here for a while.
The real question becomes how all these cores will be connected to one another. The processes are getting finer all the time but clock rates are rising only gradually as are word lengths, and it seems highly likely that basically all computers will go multicore before we experience another quantum leap in performance that makes uniprocessor systems powerful again. So then, why not have two CPU cores and a GPU core on the same die?
I would assume that he's guessing that the integrated systems will continue to be mid-range at best, and that those systems will continue to have only one core. I disagree on both counts, especially since mid-range systems with crappy onboard graphics are here around $500-600 today.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
I've actually been suggesting to my friends for a while, that you'll end up with about four or five different major vendors of computers, each similar to what Apple is today, selling whole systems.
Imagine Microsoft buying Intel, AMD buying RedHat, NVidia using Ubuntu(or whatever) and IBM launching OS/3 on Powerchips, and Apple.
If the Document formats are set (ISO) then why not?
There will be those few that continue to mod their cars, but for the most part, things will be mostly sealed and only a qualified mechanic er technician would ever need to crack the case.
I suspect that in the next 15 years or so, this is what you're gonna end up seeing.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Modern real-time ray tracers can get respectable performance without doing any sort of GPU-hybrid trickery, or requiring any hardware other than a fast CPU. For instance, try out the Arauna demo. (Dedicated ray-tracing hardware would be nice, but I'm not aware of any hardware implementation that has significantly outperformed a well-optimized CPU ray tracer. With the resources of a major chip manufacturer I don't doubt it could be done, though.) Arauna and OpenRT and the like might still be a little too slow to run a modern game at high resolution, but they're getting there fast.
This is just plain ignorant. Naive, O(n^2) radiosity may take that long, or path tracing with a lot of samples per pixel, but a decent photon mapping algorithm shouldn't be anywhere near that slow to produce a rendering quality acceptable for games. Maybe "hundreds of seconds" might be a more plausible number. (Or less, if you're willing to accept a less accurate approximation.) Metropolis Light Transport is another algorithm, but I don't have a good notion of how fast it is.
Every time I walk out to my car I see raytracing.
Actually, you don't. Raytracing is a mathematical model that attempts to simulate light behavior in reality. And, as is true for most simulations, it is a gross simplification of reality. The mathematical model used for approaching realism is irrelevant, just so long as the result is closer to the perceived goal.
And, of course, we are assumng that modeling visual reality is the perceived goal, which it is not in many cases.
It would be sad if his comments about AMD folding did pan out. It would have been wonderful to have a CPU and GPU chip communicating by HT.
Hmmm. That's interesting.
You're right. Perhaps the CPU and the GPU are too different to play nicely on the same die.
A little simpler then. If CPU processing power does continue to increase exponentially (regardless of need) then one clever way to speed up a processor may be to introduce specialized processing cores. The differences might be small at first. Maybe some cores could be optimized for 64bit applications while others are still backwards compatible with 32bit. (No. I have no idea what sort of logistical nightmare this would be. )
Actually, you don't.
What, you're one of these heretics who doesn't realize that we're in an elaborate computer simulation?
I think you will see nVidia licensing their IP to other chip companies, because like it or not, the push is always going to be toward cost reduction, power reduction, smaller form factors, and so on. This is true at all performance levels. For low-end systems, the multi-core CPUs may eat their lunch. The only thing that saves them is that graphics and video are data pigs, so the issue is more managing high bandwidth data flow than overall horsepower. They will still be around, but they may go the way of SGI.
BitWorksMusic.com -- odd tunes for odd times
MESA is slow, and main RAM access is slow. More general purpose cores isn't the solution. That said, you could build onto the same silicon wafer one or more GPU cores with access to on chip graphics ram (which could double as extended cache or ram specifically for double/triple buffering).
"Intel has already figured out that for the vast majority of home users have finally caught on that they don't NEED more processing power."
I think the real big issue is that there are no killer apps yet (apps so convenient to ones life that they require more processing power).
I think there are a lot of killer apps out there simply waiting for processing power to make its move, the next big move IMHO is in AUTOMATING the OS, automating programming, and the creation of AI's that do what people can't.
I've been experimenting with automatic content generation for games and whatnot, and over time these same principles will spread into other areas, I doubt I'll see it in my lifetime but smart-systems are coming.
As many pixels as they can possibly throw at me, that's how many.
There are people who are perfectly happy with resolutions like 1024x768, good for them! Me, I was running that rez in the 486 days, and gaming it in the late 90's with Voodoo2 and the first GeForce.
The fact that GPUs have scaled faster and larger than CPUs is proof to me that GPGPU is a good idea. I have a beefy PC, and the bulk of what I do involves image processing. If the GPU can do it 10 times faster for less money, that's an epic win and I say bring it on!
-Billco, Fnarg.com
Eventually processors will have hundreds of cores, and will be the polar opposite of today's world of processor design.
I'm not saying that they should stick everything on one chip anytime soon, just that there is actually a limit somewhere in the medium term future where you start spending improvements somewhere other than raw performance. For casual users, that's really soon(because dpi isn't that important 3 or 4 feet away from your face, most people's eyes don't have the resolution for it to matter).
Nerd rage is the funniest rage.
Indeed. Procedural content is the future of gaming. That is so because the primary bottleneck for GPU's has always been memory bandwidth and will always be memory bandwidth unless a fundamentaly different (goodbye Mr Texture Bitmap) methodology is advanced. Cache misses got you down? No problem. Nextgen games wont be missing the cache on texture reads because nextgen games wont have textures. They will have shaders which sample a procedural function on a per-texel bases.
"His name was James Damore."
ICUP
My inner child needed release. Sorry.
I think by 2012 or 2020 or so, it would be far more likely that all code will be compiled to an abstract representation like LLVM. With a JIT engine that will continuously analyse your code, refactoring into the longest execution pipeline it can manage, examine each step of that pipeline and assign each step to the single threaded CPU style or stream processing GPU style core that seems most appropriate.
I don't think this will be done at a raw hardware level. I imagine the optimisation process will be far too complex, and be continually improved by researchers. I can't imagine being able to implement anything like this in silicon.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
Do it yourself, because no one else will do it yourself. [beta blockade 10-17 Feb]
Central Unit Processor - Complex Homogeneous Integrated Component Kernal System. ... so does mine.
The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.
well,there is a LOT of computing power on that shader units,much more than on a intel cpu for example
but using it is the main problem,as they cant quite execute x86 code
so how about using a specializated Dynamic recompilation cpu that do all the conversion/branch prediction/out of order execution/loop unrolling work by software,but with a highly specialized opcode set to do the job instead of a generic cpu?
even if you lose half of the raw power of the units,you will still have much more processing power than the current cpus on the market
truthfully only real application for the gpu/cpu hybrid would be in laptop use where they can get away with using lower end gpu chips
These kind of comments scare me, is everyone new, or just not paying attention.
The PCI Express 16x has amazing overhead even on the most hardcore gaming today, expecially when utilizing SLI/Crossfire configurations.
As for this ONLY BEING for LOW END? Did you ever read the PCI/AGP/PCI Express specifications?
Just because RAM sharing was ONLY used in low end on board GPUs doesn't mean that if system RAM is fast enough (or managed properly) that it can't be use in gaming.
This is where a normal Vista plug would come in, as people don't realize this is ONE OF THE GOOD things of the Vista WDDM (besides being the first OS driver model providing GPU preemptive multitasking). Vista also intelligently manages and virtualizes VRAM to system RAM, and uses AGP/PCI Express techniques so that the memory space is seen by all applications as available VRAM.
Vista then uses the new memory prioritizers for VRAM operations as well and will swap low performance needs textures between system and VRAM transparently. This not only works, but with the AGP/PCI Express concepts is fast enough that it gives games more VRAM room for better texture quality at the same performance.
This is EXACTLY how Vista can take a HIGH PERFORMANCE DEDICATED Video Card with 128mb or 256mb of VRAM and allow applications to use textures that exceed the VRAM of the card transparently with no performance loss. In addition, because of the multi-core/multi-GPU tasking nature of Vista, SEVERAL 3D applications can run at once expecting full control the Video card and VRAM at the same time, and not even realize Vista is making this happen.
The old 'shared system' RAM concepts were SLOW, but mainly because the low end GPUs paired with the mainboards providing this were SLOW.
Vista does this even with new NVidia 9800s and gives it higher texture quality and performance over XP, so this is NO LONGER a low end concept.
So I ask again, does anyone even pay attention to what is being done in the GPU/OS realm, or do you just ignore crap because MS created it? If so, no wonder the Linux composers and even the OS X composer and driver technologies are half a decade behind Vista, and people don't even 'get it' until the market notices this and Linux and OS X are screwed and trying to catch up... (This is such a sad pattern, as Linux could have been a 'leader' instead of copy follower of OS X and NT technologies, which is all that seems to happen beyond basic kernel optimizations.)
CPU based GPU will not work as good as long as they have to use the main system ram also heat will limit there power. NVIDIA should start working HTX video card so you can the video card on the cpu bus but it is on a card so you put ram and big heat sinks on it.
I agree that GPU/CPU will need to be integrated at a lower level than current technologies, but not in the near future as PCI Expres 2.0 doesn't even benefit yet.
However, don't discount System and VRAM becoming a unified concept. This has already happened with Vista, and is one of the things that gave NVidia and ATI such trouble with the WDDM, as OS handled VRAM virtualization was not something either company was use to coding for in high performance instances. (Also the preemptive multitasking GPU nature of Vista is another shiney new bell they had to hurdle.)
Vista (now with current drivers) shows that the WDDM and using memory prioritization even for Gaming and System/VRAM virtualization works, and works well. This allows users with low VRAM amounts to crank up texture sizes that exceed the Video card's RAM, without performance loss.
Additionally as multi-3D application concepts become more standard, system RAM virtualization will be as necessary as HD or other virtualization/paging techniques are in other aspects of computing.
Not a lot of users run more than one game on the screen at once, but with new application UI concepts like WPF.NET in Vista that inherently doest 3D, this will become common in all Oses. Even running a game in a Window on Vista while Aero is active, is using the multi-3D paradigm, and with the shared texture nature of the Vista composer it can do this with no FPS loss, even with more than one 3D game on screen active at a time (and even Exposed or Flip3D'd on the screen, something that would bring other composers or VIdeo subystem/driver models to a crawl in other consumer OSes.
Microsoft somehow realized this (actually the XBOX team realized this, and shoved to get these features to be a core of WDDM in Vista.)
I'm surprised others don't get this, and don't realize that besides the media/SlashDot Vista slam, Vista has done this right, and does it well. Putting it several years ahead of OS X, Linux, etc.
If people think NVidia or ATI are looking to the OS model of *nix, or OSX for future directions, they are fooling themselves. Microsoft with Vista and especially Microsoft's XBox graphic researchers are mapping the road of the future. The XBox 360 team already has done this with Vista, and the VRAM virtualization concepts, and the unified shader push, and the next big thing will be an extention of this and from them.
"What you're talking about sounds like just mashing a CPU core and GPU core together on the same die. Which would be horrible for all kinds of reasons (heat, bus bottlenecks and yields!)."
One would hope that bus bottlenecks would not be a problem if the cores don't have to go off-chip to access a common bus. The bus could be a wider, dedicated video bus if kept on on-chip, and would enjoy vastly reduced latency. It would use less overall power than having an external mobo chip coordinate the handshaking, not to mention the signals that have to be sent to a physical video card. The die itself would be larger than a 1 CPU die, but that's not a problem anymore.
The problem now is what the hell to do with all of the transistors we can cram onto a chip. At some point people won't notice a difference between a 16 core unit and a 32 core unit connected to the same memory. Might as well stick a GPU in there. Heck, throw in two for good measure. If there's a defect in one, you can disable that and still sell the part as the economy version. If that means 2-4 fewer CPUs per die then so be it.
It's not all as horribly problematic as you make it sound. There are some significant savings that can be reaped by putting the two together. Right now they claim to be doing it as a way of producing a cheaper system with slightly slower video-card performance, but there are some opportunities to actually improve performance by doing this, and the benefit of lower power consumption for the overall system is definitely in the plus column.
Actually, if anybody can find the "right" way of doing this, it will probably be AMD/ATI... and then everybody else will just copy what they did, tweak it slightly and give it a different name.
You can't send a takedown notice to an already printed newspaper.
"computers are finally able to use any number of cores each to their maximum potential at the same time."
The main problem is the software can't use an arbitrarily high number of cores, not the 'computers'. We could put out 64 core PC's (say 16 quad cores) but software just isn't written to take advantage of that level of parallelism.
Our friend from Nvidia mocks Intel for thinking it GPUs are easy even as he makes statements like "making a core isn't all that hard". And it's true: making a core isn't all that hard. Unless it has to perform well, be proven compatible to all kinds of bizarre existing legacy, be cheap, be reliable, be on time... and in short satisfy all the other requirements on a CPU design team for a mainstream product. This guy is completely talking out of his ass on that front. That said, it will be interesting to see whether Larrabee proves to be an innovative new take on how to do graphics, or in fact a CPU architect's warped view of how to solve a graphics problem. Either is quite possible. I think the conjecture that a Larrabee die is going to cost $1000 to make is probably ridiculous... and even if he's talking about a sale price it's pushing it. I think the observation that CPU and GPU will probably remain pretty separate except in low cost implementations is probably right. At the flop level we're talking about, memory latency and bandwidth as well as sheer transistor count makes it difficult to believe that partitioning the CPU and GPU logic in the same die makes sense. The dominant interactions are the memory-compute bandwidth, not the CPU-GPU logic bandwidth, so the current partitioning makes sense as far as I can see.
And then there were none.
"Vista also intelligently manages and virtualizes VRAM to system RAM"
It's still going to be than the real thing. Show me how fast Vista runs Crysis on a fast 256MB/512MB card compared to a fast 1GB card at high res with AA on.
And that virtual video RAM seems to mean that if you have 2GB of real RAM, Vista takes 1GB for the O/S, and 512-1GB for the vidcard and that leaves you with nothing much left over for the game.
As long as the O/S is still 32bit you'll also have the problem of only 4GB of easily addressable space. So no point installing 4GB of system RAM on your board. You might as well stick to 2 or 3GB system RAM and spend the extra money on a video card with real video RAM.
So it's a good idea to ignore that crap till 64 bit O/Ses become mainstream.
Lastly, I don't seem to have problems running multiple 3D apps at the same time on Win2K. So what's the big deal?
Yes, nVidia's binary drivers are available on linux and the recent one include CUDA support (even the Beta 2.0 is available on Linux).
And in fact Linux is a much better environment for developing CUDA. (Ability to setup headless server, numerous way to interact with said server).
That's what I'm doing at work currently.
*BUT*
No, there are still no decent open source drivers for nVidia yet. Thanks to the lack of collaboration from nVidia, Project Nouveau has to go through the difficulties of reverse engineering everything and because of that colossal amount of work, still hasn't produced drivers that support all nvidia chips + 2D acceleration + 3D acceleration + GPGPU support.
So there's no CUDA for you on linux if you aren't using anything beside the few supported architecture, and even then you may have to go through some patching if your compiler *minor version* isn't supported yet (CUDA 1.0 doesn't support GCC 4.2.x out of the box).
Want to play with CUDA on a Sun Niagara II or on a Tilera's TileExpress64 based computer ? Sorry. CUDA only supports x86 and x86_64 architectures. No SPARC and no MIPS for you.
* On the other hand *
Concurrents' GPU have limitations too.
Although Brook is open source, the CAL technology used by ATI as a backend isn't yet, and you have to rely on the OpenGL back-end. And not all ATI radeons have good open-source drivers supporting enough 3D functionality to get good support on Brook. (r300 driver fails with latest GLSL-based brook. radeonhd driver doesn't have 3D acceleration yet).
But at least, ATI/AMD is (slowly) trying to help the development of open source drivers.
Haven't tried Brook on Intel GPU yet.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
truthfully only real application for the gpu/cpu hybrid would be in laptop use where they can get away with using lower end gpu chips
You only need so much power for %95 of users. And thanks to the introduction of PCIe, most desktop systems come with an x16 expansion port, even if the chipset has integrated graphics. Further, there's a push from ATI and Nvidia to support switching to the IGP and turning off the discrete chip when you're not playing games, which cuts down on power used when you're at the desktop. Isn't technology great?
And further, there's more out there in the land of IGPs than just crappy Intel. AMD's new 780 chipset actually packs a real HD 3450 core onboard (not cut-down at all), and Nvidia has a new IGP on the horizon that's even faster. Both of these will be more than enough power to run Aeroglass, and you can even run recent games at low settings.
Now, what's the difference between affordably integrating graphics in the chipset and affordably integrating it in the same die as the CPU? Maybe one process revision, tops. And that's the beauty of it: integrated graphics will improve in performance over the years just like performance improves in IGP and discrete chipsets now; it's only a matter of time before ALL users have PCs that come with the power of quad-Crossfire/SLI ultra setups.
Man is the animal that laughs.
And occasionally whores for Karma.
It's still going to be than the real thing. Show me how fast Vista runs Crysis on a fast 256MB/512MB card compared to a fast 1GB card at high res with AA on.
Of course more VRAM gives games more room and Vista more room, who said it didn't? AA isn't always the best example though, as most implementations use selective AA, instead of full image sub rendering that requires large chunks of RAM.
(PS Crysis isn't a full DX10 game. When the game says DX10 only, then you will see the performance benefits of DX10, for things like 16X Full Image AA at resolutions higher than 1920x1200.)
And that virtual video RAM seems to mean that if you have 2GB of real RAM, Vista takes 1GB for the O/S, and 512-1GB for the vidcard and that leaves you with nothing much left over for the game.
Vista takes 1GB for the OS? Are you high or just painfully misinformed? Vista manages RAM between the OS/Applications/VRAM very well, and if the game is requesting more system RAM, Vista gives it up, no questions asked. You act like Vista is choking applications to do what it does. Wrong.
Also Vista don't chunk 1GB for the OS, and when running games virtually anything non-essential is paged out if necessary as the game/application requests RAM. Go read up on NT memory management, then read the Vista kernel additions on memory prioritization and additional scheduling.
As long as the O/S is still 32bit you'll also have the problem of only 4GB of easily addressable space. So no point installing 4GB of system RAM on your board. You might as well stick to 2 or 3GB system RAM and spend the extra money on a video card with real video RAM.
Again, who SAID not to buy a video card with the most VRAM you can? This is NOT THE POINT, and NOT THE SUBJECT HERE. Vista's ability to virtualize VRAM, is like OSes virtualizing System RAM to a pagefile. It is nice to have if needed, as the applications don't fail with a lack of memory. The only difference here is Vista's VRAM virtualization performance (if used) is very tiny, as it writes directly to the GPU via AGP/PCI Express concepts.
So it's a good idea to ignore that crap till 64 bit O/Ses become mainstream.
Um, most everyone I know runs Vista x64. It has more driver support than Windows XP 32bit, and is a virtual carbon copy of x32 Vista unlike x64 XP where several features were never added to the x64 version.
And this list of people includes everyone from large clients deploying Vista x64 on desktops to any serious gamer or 'tech minded' home user.
Do you not realize that there are more copies of Vista x64 running than all versions of Linux and OS X combined? (This isn't even touching x32 Vista.) So I would make a good argument that x64 Vista is a mainstream OS. Also unlike OS X that still has hybrid 32bit aspects, Vista x64 is a ground up 64bit OS, with all aspects of the WinXX subsystem being 64bit as well.
Also if you happen to google Vista x64, it outperforms even XP 32bit easily, and is the OS of choise for gamers, even running 32bit applications on the OS, as the 64bit OS, that can use the 64bit registers, address space, and driver memory transfers does out perform the x32bit version of Vista, even with 32bit applications.
Lastly, I don't seem to have problems running multiple 3D apps at the same time on Win2K. So what's the big deal?
Really? Ok, open 2, 3, or even 5 3D applications in a Window, so they are running side by side concurrently. Make sure a couple are hard hitting games, heck even Crysis. How well does that work for you? And use your choice of 3D technology, DirectX or OpenGL... Heck even set 2 of the 3D applications to 50% transparency using the GDI+ layers introduced in Win2K. How is performance now?
On any OS prior to Vista, 3D applications will starve for GPU time and GPU RAM. PERIOD. OpenGL TRIES to cooperative multi-task, but it all up to the applications to yield RAM and resources. DirectX also tries to yeild in a cooperat