Twilight of the GPU — an Interview With Tim Sweeney
cecom writes to share that Tim Sweeney, co-founder of Epic Games and the main brain behind the Unreal engine, recently sat down at NVIDIA's NVISION con to share his thoughts on the rise and (what he says is) the impending fall of the GPU: "...a fall that he maintains will also sound the death knell for graphics APIs like Microsoft's DirectX and the venerable, SGI-authored OpenGL. Game engine writers will, Sweeney explains, be faced with a C compiler, a blank text editor, and a stifling array of possibilities for bending a new generation of general-purpose, data-parallel hardware toward the task of putting pixels on a screen."
I just browsed the article and it looks like what he's saying is that as GPU's become more like highly parallel cpu's we will begin to see API's go away in favor of writing compiled code for the GPU itself. For example, if I want to generate an explosion, I could write some native GPU code for the explosion, and let the GPU execute this program directly... rather than being limited to the API's capabilities.
So essentially, we will go back to game developers needing to make hardware specific hacks for their games... some games having better support for some cards, etc.
API's are there for a reason... lets keep em and just make them better.
Sometimes the best solution is to stop wasting time looking for an easy solution.
For the last decade or so, it seems like the rendering side was abstracted away into either DirectX or OpenGL, but if the author is correct, those abstractions are no longer going to be a requirement.
While I don't know a lot more about the various other rendering techniques that the article mentions, it seems like there might be an opportunity emerging to develop those engines and license them to the game companies.
I suspect that game companies won't want to get into the graphics rendering engine design field themselves, but there's real possibility for a whole new set of companies to emerge to compete in providing new frameworks for 3D graphics.
--
Hey code monkey... learn electronics! Powerful microcontroller kits for the digital generation.
As soon as you start coding for a specific GPU you're going to be treating PCs like consoles. I don't care to have to buy multiple graphics cards to play various games.
I got the impression that they're expecting C++ compilers for all the GPUs, eventually, so then they'd simply have rendering libraries for each GPU. I also got the impression that they'd be waiting until most gamers had one of the compatible GPUs. Let's face it, most gamers usually don't buy the cheapest graphics card and now the two major players, ATI and nVidia have GPUs that are easily accessible. It won't be too long until you can't buy a video card from them that doesn't support CUDA and whatever the ATI equivalent is.
I think it's a pretty safe gamble for video game developers, certainly the 1st Person 3D game developers.
If we are going back to the "old" days...
Why can't we skip all this OS nonsense, and just boot the game directly?
After all that will make sure that you get the MOST out of your computer.
See Ivan Sutherland's Wheel of Reincarnation. The idea is that CPUs get faster and graphics move there; then busses get faster and graphics moves to dedicated hardware; rinse and repeat. http://www.anvari.org/fortune/Miscellaneous_Collections/56341_cycle-of-reincarnation-coined-by-ivan-sutherland-ca.html
--- Often in error; never in doubt!
I think the point of the article is that computing paradigms are merging. You won't have a CPU and a GPU. You'll have one thing that looks like both. In other words, you'll have a multicore, parallel, vector machine.
And that, absolutely, positively, will happen. Larrabee, or something like it, is the future. If you hold AMD stock, sell now, because Fusion doesn't sound anything like Larrabee and is going to seem positively draconian by the time it comes out.
In some ways, these new processors will look like a Cray YMP on your desktop. It's a rough analogy but suitable for illustration. Of course there are all sorts of differences in the way the memory systems will work and that's a huge part of the performance equation.
It seems to me that Tim puts a bit too much faith in compilers. He talks about language extensions but only in CUDA-like terms of "where things will run." A compiler needs a lot of information to be able to vectorize. The user often has to provide that information in a language like C because of its loose typing, aliasing and side-effect rules.
My prediction is that some APIs will go away, but many of the low-level ones will stay because it's often faster to call into a hand-coded library than rely on the compiler to have enough information to automatically optimize the code. Eventually compilers will start pattern-matching to these APIs. Higher-level APIs will be developed to save developer time, not CPU time. They will exist almost purely for code reuse purposes.
I disagree with Tim that hardware vendors will differentiate on performance. At least, in the way he's thinking. It won't be hardware gadgets, vector length or number of pipes that matter. It's going to be the compiler, programming environment and libraries. To the extent that the hardware supports those in its ISA, hardware will matter. But the bulk of the muscles of the chip won't matter so much as their placement and utility (by the compiler). The inflection point is leading to a world where software is king.
I disagree.
Direct hardware programming has always been the best in terms of performance. However, it is the worst in terms of compatibility. If you're programming consoles, this is just fine. If you're programming for PCs, not so much.
It will never go back to programming for specific pieces of graphical hardware. I'd say that each vendor MIGHT make a major chipset, and that those chipsets would be coded for, and everything else gets API'd, but even this is unlikely. If a company had to have two or three sets of programmers for their graphics, each team for a different major chipset, we'd see more expensive games or prettier games with crappier gameplay.
Even the OpenGL/DirectX split takes a heavy toll on programming resources for game developers.
Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
Except that you like everyone else reads "CPU" in the article to mean the Intel/AMD CPU and not think of it as current gen GPUs that are almost capable of massive parallel execution of general purpose code. With the advent of Shaders, more processing was able to be offloaded to GPUs and over the next couple of GPU generations. So his idea is that we'll see less of the OpenGL/DirectX specific API calls and everything being done in CUDA/Shaders. That way the folks who write graphics engines aren't limited to the current SGI implementation (here's a set of vectors describing my object -- draw it) and we'll see different rendering engines based on Ray Tracing for example (or whatever other methods the engine writers want to do it).
This isn't anything new here -- he's basically saying what Intel has already said... You'll see less OpenGL/DirectX and more CUDA/Shader based implementations for rendering engines.
"Not sure why Tim Sweeney gets so much flack, he is the lead developer for a pretty popular 3d rendering engine..."
Just because he's a good programmer doesn't mean his statements about other things will be true, each statement must be taken individually.
Tim sweeney said 10 years or so ago the GPU would be integrated into the CPU, it hasn't happened.
Not only that the bandwidth requirements are off the charts for modern GPU computing. Sometimes I wonder if these programmers are even aware of wtf it is they are saying. I know lots of programmers who know dick all about the relationships in hardware. Tim sweeney borders on being one of those types of programmers. It's like he's so focused on development he's not seeing the forest from the tree's.
Also game engine's are many man projects, tim sweeny would be just one single dude on a team, nothing notable IMHO.
So wait, your integrated graphics is *slower* than a low end discrete graphics card that's now 6 generations (or ~6 years) behind... and you find that acceptable?
At the current pace, it'll take them a decade to match a high end card from today with an integrated card, and video cards aren't standing still in the mean time.
You can point at the current integrated market for video cards, but that's not terribly interesting due to the fact that a) current integrated video cards barely do more than send signals to monitor(s), and (as a corollary to a) b) current integrated video cards aren't very fast.
Wake me up when integrated video cards are faster than discrete video cards.
Game! - Where the stick is mightier than the sword!
"Take a 1999 interview with Gamespy, for instance, in which he lays out the future timeline for the development of 3D game rendering that has turned out to be remarkably prescient in hindsight:
2006-7: CPU's become so fast and powerful that 3D hardware will be only marginally beneficial for rendering, relative to the limits of the human visual system, therefore 3D chips will likely be deemed a waste of silicon (and more expensive bus plumbing), so the world will transition back to software-driven rendering."
Nuff said.
"Ok so you state that memory bandwidth requirements for GPUs are off the charts. Where do you propose to get more memory bandwidth than on the CPU itself? Seems to me if you want memory bandwidth there is no better place to be than on the cpu die..."
Again you're missing the point, "the jack of all trades, master of none" problem, not to mention the space requirements. GPU's complexity is nothing like the old style co-processor units that were integrated into the core. They require ridiculous amounts of cutting edge ram to get that kind of performance, and they need a lot of ram to output the results of those calculations.
I don't see CPU's integrating 512MB to 2GB of ram in the near term future given heat and die size considerations, and we haven't even touched the extremely low bandwidth between modern cpu's and main memory in PC's (which is much much less then a modern GPU).
The GPU will play it's part for as long as is necessary. I don't rule out that perhaps one day it will be technically feasable but it is nowhere near that day, it's at least a decade or more away.
We've seen this time and time again, processors go through evolutions of integrating and seperating. We went from mainframes to PC's and with the net 'back to mainframes' but notice how each device play's their role, each one didn't totally obsolete the other, they just have become more specialized at their tasks.
"Only a research scientist would need that!"
Only a research scientist does need that. Meanwhile, uninformed consumers are being suckered into buying way more than they need to check email and type their documents.
Or, say, play Quake.
The nice thing about functionality moving into the CPU standard is that it opens up that functionality for a lot more applications than originally intended. FPUs may have mainly been for scientific work early on, but because it's essentially ubiquitous, now a vast array of programs do floating point computation... probably including your word processor.
The main reason I might have my doubts about this is that graphics performance has been advancing faster than CPU performance for a while. If that trend continues, I'm not sure people would want to be tied down to a single unit for a while. People on the cutting edge replace their CPUs a lot less often than their GPUs. In the long run, it's probably inevitable.
Not if AMD can help it. Obviously this is not their path or they would not have purchased ATI. They sell two products, one of them is enormously expensive, the other is reasonable. I cant see why any graphics card manufacturer would give up all that profit!?
Also consider that GPU upgrades are much more frequent that CPU upgrades. I don't think the dollars favor integration of the two. I don't think there is enough competition for force it either.
The only company that may achieve this is VIA. For completely different reasons.
Developers will use [future coprocessor cards] just like they use CPU's at the moment.
Which means they'll have to both be x86, or both be ARM, or both be some other architecture that NV and ATI can agree on. goodluckwiththat.
I want to hear what John Carmack thinks about this.
Does he agree/disagree and why?
I always like seeing two giants in the industry debate on high level topics. It gives some insight into trends... and I just plain dig gaming anyway...
Most people aren't thought about after they're gone. "I wonder where Rob got the plutonium" is better than most get.
He does make a good point and its a throwback to the old days before GPU's existed. Look back to the days of Doom or Duke3D, You had to hand code and entire rendering engine for the game. No API's or 3D hardware was available, only a single CPU and a frame buffer and compiler. With current and upcoming technology like multi-core and stream processors, there will be plenty of new options for developers to take advantage of. IBM Cell, Intel's Terrascale and Larabee are three of those technologies but I still highly doubt the GPU is done for yet.
3D rendering isn't the only thing going on in games or other programs. You have physics which is used in both games and simulators. We also have AI, which in games is still severely lacking. 3D sound is still simply how far is player from object and attenuate the sound as necessary. So those areas could definatly use those multi core CPU's or Stream processors.
We cant forget about really exciting stuff like real time ray tracing that is well suited for multi core and stream processing but its ways off. I don't think the GPU will disappear in the next five years but I do see it evolving to adapt to new rendering technologies. Instead of a very discreet GPU we will have a very fast accelerator chip that can do more general work. It not only will handle the 3D rendering pipe but also lend itself to tasks like video processing, DSP, audio processing and other compute intense tasks like physics. It is already happening with Nvidia's CUDA and ATI's Stream SDK. We still need a general purpose CPU to manage the OS and I/O but like the IBM cell it could very well move on die. Multi-core X86 CPU's are not the future. Instead we need one or two cores to manage resources and run legacy code.Then you let the stream processors (for lack of a better term) do all the compute intense dirty work. Hell we could even get rid of X86 and go with a more efficient and compact CPU like ARM or something else entirely.
Funny as I am typing this I realize I am pretty much describing the IBM cell processor and Intel Larabee. But still I doubt developers will be the ones holding the ball for interfacing with this new hardware. Hardware vendors and developers will (or rater should) come together and standardize a new set of API's and tools to deal with the new tech. If developers have a breakthrough that is better than ray tracing, they will definitely have the hardware to do it. Welcome to the future.
Yes, it's much better to be told "you're not allowed to do that" than "you are not being allowed to be doing that".
In other news...
A man whose company makes its money writing game engines says, "APIs are going to go away. It's going to be very, very hard to build a game engine in the future when you can't rely on the APIs anymore. So everyone'll have to switch to the few companies that build game engines instead. Like mine. I recommend you start now and save yourself the headache."
Hmm, I detect no bias whatsoever.
Well, almost as little as when nVidia tells the world that they have seen the future and it's in GPGPUs replacing CPUs. Amazing how everyone has seen the future, it supports their business model and the rest of us can save ourselves a lot of pain if we jump on what pays them well.
Call me stupid, but from what I saw from Larrabee it centers around a new specialized very wide vector unit to do most of the work. So far for a any plain old C compiler
Doesn't matter. There are basically 2 big point here:
1) The special-puropose GPU will morph into a more generalized co-processor for handling all sorts of massively parallel stuff.
This is just bleeding obvious, and I can prove that, because it is already happening. As more and more of it becomes programmable, it only makes sense that the built-in rendering microcode is replaced with libraries that ou may or may not chose to hack yourself.
2) Once the generalization is done, it will make sense to merge the two processors. Might sound weird, but we are already doing multicore designs in stead of separate CPUs, which would be the logical choice if density were such an issue. Turns out the benefits outweigh the cost of advanced cooling.
You are right, though, that it will not be a small "nice extra" tacked onto the CPU. It will be a very large part of thte CPU, and of its total working capacity.
sudo ergo sum
A modern GPU, such as the current flagship GeForce GTX 280, is a relatively specialised piece of hardware. It has optimised support for things such as matrix operations which are used a lot in graphics and yet has no real support for integers which aren't. It has a very fast and entirely dedicated memory bus capable of pushing 140GB/s at maximum for processing geometry and textures. It's highly parallel having two hundred and forty stream processors to process the millions of seperate shader runs which make up a single frame of a high-resolution 3d scene as quickly as possible. It is also a very complex piece of technology having 1400 million transistors compared to a Core 2 Extreme QX9650's 820 million processors.
There's no way that a current CPU could replace that in anything other than a very slow emulation mode even if fully dedicated to the task. If you do want to see the difference download either nVidia's FX Composer or ATI's RenderMonkey which are both shader authoring tools. These will allow you to load a shader and preview it using your 3d card and then drop down to software rendering. Watch as a smoothly-animated complex shader suddenly grinds to a halt, often taking literally seconds to render a single frame. It's not a pretty sight.
In the future with CPUs getting more and more cores and graphics programming getting more generic to support different algorithms without needing to work against the silicon's basic design it's likely we'll see them being more suitable to the rendering task as the article covers but for now trying to substitute the CPU for the GPU would be like entering a bulldozer into a Formula 1 race on the basis that it has a large engine.
404 Not Found: No such file or resource as '.sig'
Umm... what? Why the namecalling?
There's CUDA and FireStream which are programming the shaders w/o going through DirectX/OpenGL.
As far as D3D11 -- I think the only support there which is similar is the compute shaders which I'm unsure if it will apply. There's also the Apple OpenCL initiative which aims to accomplish a similar thing. AFAIK, none of the GPGPU bare-to-the metal APIs allow you to render to a texture so I think it might not be possible to accomplish a pure Stream based rendering engine (yet).
In any case, I think his original point stands -- that the monolithic SGI based APIs are going to still be used (OpenGL/DirectX), but the hardware under the covers will be more exposed to allow programmers to be more creative with how they want to utilize the highly parallel processing that the new chips provide. Thus allowing programmers to do things in software that were previously done with dedicated hardware.
In embedded system-on-a-chip stuff you're talking about putting a few components on a chip. The feature count is WAY smaller than a CPU (or a GPU). That can be done fairly cost effectively because the CPU makers have already led the way. Even then, many system-on-a-chip products are really system-in-a-package, where different bits of silicon are put together inside a single package. Here's a conference paper that talks about it: http://www.acreo.se/upload/Publications/Proceedings/OE00/00-COLLANDER.pdf
Merging a modern CPU/GPU is a whole different ball game. The various bits of the CPU has been integrated because that's the only way we could keep making them faster. The parts that are actually on the same piece of silicon with the CPU are there because they need high bandwidth to the CPU. The GPU does not, at present.
Why do we have 8 processor computers made up of two quad core chips? Because it's not yet practical to put 8 cores on one chip. We will, but we don't now. CPUs and GPUs might merge, but they'll be low power, slower versions of both. Nvidia and AMD won't be putting their latest and greatest GPU on a chip with the fastest processor out there.
Why wasn't the FPU integrated with the CPU right away? Because it didn't make sense. Then, as clock rates went up, the FPU had to be integrated to keep up.
The GPU and the CPU don't talk to each other that way. The GPU needs MEMORY bandwidth. It doesn't really care what's in the CPU registers.
I don't think this guy ever says that the GPU and CPU are going to merge. What he does think is that the GPU is going to go away, to be replaced by a general purpose vector/stream/dsp type processor. When or if that happens, THAT kind of coprocessor would probably be worth integrating into the CPU. In fact, it's already been done with Cell.
But a GPU? There's no reason to integrate a high power GPU with a high power CPU, and lots of reasons not to.
GPUs are also very expensive to make due to their large die size.
In fact die size may be the single biggest thing that SHOULD prevent the combination of the GPU and CPU. As the mm^2 of the die grows, the cost grows even faster.
In fact the biggest differences (other than a few tweaks to correct errors) between the disaster Radeon HD2k series and the vastly superior Radeon HD3k series was a die shrink - they're all R600 series units. It went from a total bomb (HD2k series) to a very competative card - the die shrink increased yields, decreased cost per unit, decreased voltage required and heat output. The HD2900 was 420mm^2 while the HD3800 was 192mm^2.
Simply shrinking from 80nm to 55nm turned an non-profitable card into a profitable one. Their new flagship product - the HD4870 based on the R700 core is also 55nm but has 50% more transistors than the HD3800. The differences in the core architecture resulted in more than 2x the performance for only a 50% increase in transistors. Saving their butts from the disaster that was the 80nm R600 by shrinking the R600 to 55nm gave them the time to develop the R700 without being buried by their competition.
If die size is that important and can be THAT expensive I doubt die size issues will EVER allow them to combine the CPU and GPU.
If you cannot keep politics out of your moderation remove yourself from the Mod Lottery.. NOW!