Twilight of the GPU — an Interview With Tim Sweeney

← Back to Stories (view on slashdot.org)

Twilight of the GPU — an Interview With Tim Sweeney

Posted by ScuttleMonkey on Monday September 15, 2008 @10:31AM from the steady-march-of-progress dept.

cecom writes to share that Tim Sweeney, co-founder of Epic Games and the main brain behind the Unreal engine, recently sat down at NVIDIA's NVISION con to share his thoughts on the rise and (what he says is) the impending fall of the GPU: "...a fall that he maintains will also sound the death knell for graphics APIs like Microsoft's DirectX and the venerable, SGI-authored OpenGL. Game engine writers will, Sweeney explains, be faced with a C compiler, a blank text editor, and a stifling array of possibilities for bending a new generation of general-purpose, data-parallel hardware toward the task of putting pixels on a screen."

16 of 286 comments (clear)

Min score:

Reason:

Sort:

For once ... by Qbertino · 2008-09-15 10:37 · Score: 4, Informative

For once I'm reading an 'xzy is going to die' article that doesn't sound like utter rubbish. Could it be that, for once, the one stating this actually knows what he's talking about?
My last custom realtime GPU was a Geforce Ti4200. I'm now using a Mac Mini with GT950. Mind you, Blender *is* quite a bit slower on the 950, even though it runs with twice the sysclock, but I'm not really missing the old Geforce. I too think it highly plausible that the GPU and the CPU merge within the next few years.

--
We suffer more in our imagination than in reality. - Seneca
1. Re:For once ... by juiceboxfan · 2008-09-15 11:18 · Score: 2, Informative
  
  No, that can't be it. Know why? Because...why would you put more processing and thus more heat in one place that already has problems with that?
  Cost. It's cheaper to build and install, on the MB, one chip with one die than it is to build two chips/dies on two boards (not to mention bypass caps, fans, RAM, etc.). With combined functions it may be that the one chip produces less heat than the total of the two, although it is all in one spot.
  
  And why install an overkill graphics processing unit inside the processor if most people won't use it anyway?
  Maybe two CPU grades? One with full graphics capabilities and the other with basic graphics (could be the same die one passes all graphics tests at full speed the other doesn't).
  
  And where would the VGA/DVI output go if there's no graphics card? If you put it somewhere else then why move the graphics processor further away from the outputs?
  You are already going to put a (relitively) long cable from the connector to the monitor, what's a few more inches?
2. Re:For once ... by Miamicanes · 2008-09-15 12:26 · Score: 5, Informative
  
  > Or how modems used to have their own signal processors? But now most use the CPU.
  Welllll... I'd say the move to HSP modems took place more because the ascent of DSL and cable internet relegated modems to the status of, "nice to have if I happen to need it once in a blue moon to send a fax or dialup in the middle of nowhere at some point over the next 2-4 years." Remember all those articles 3-5 years ago about how host signal processing absolutely DESTROYS CPU performance because it demands constant attention from the CPU, and the software overhead of having to keep stopping to service the modem caused the computer to run at least 20-30% slower? Well, not much has changed, except now with a multi-core CPU it can kill the performance of just ONE core instead of bringing the whole computer to its knees. But even with multicore CPUs, I can guarantee that if modems were still the primary way people got online, there would definitely be a thriving market for "performance" modems that offloaded at LEAST the signal-processing functions to a real DSP (like the Lucent "semi-Winmodems", that actually gave users the best of both worlds... offloading the stuff that really dragged the CPU down to its own DSP, but doing things like compression and error-correction that could be handled in discrete batches faster than even dedicated hardware could achieve).
  There's another thing to remember about discrete chips... in the early 3dfx days, the mainstream CPU makers (Intel, AMD, and Cyrix) had ZERO interest in giving even the slightest attention to 3D graphics. Unless you're IBM (who wasn't interested in 3D, either), building CPUs is probably way beyond your company's capabilities. HOWEVER, designing a 3D graphics chip with the complexity of the first ones used by 3dfx IS within the capabilities of a well-funded design company with the connections to get it manufactured. It doesn't even need a fab with the capabilities of one owned by Intel, AMD, etc. So discrete 3d cards were an elegant way to sidestep the deadweight lack of interest on the CPU side by shifting it to a chip that smaller companies could design and build. Now that "the big guys" have turned their attention to it, the smaller players don't have a prayer (ergo, the merger mania among CPU/mobo chipset makers and graphics chip makers).
  The same observation can be made regarding cache and memory controllers. In the First Pentium Era, volume manufacturers like Compaq (and their comrades at arms, Intel & AMD) regarded cache as a luxury the unwashed masses could live without, even if it only saved $5 and cut the effective performance in half. Hey, consumers only look at that "Mhz" number, anyway... Fortunately, performance-oriented mobo makers were able to take matters into their own hands, and once again do an end run around the CPU vendors' sloth and put cache directly on the mobo. Once CPU makers decided cache mattered, and put lots of it on-die, the marginal benefit of putting more, relatively expensive tertiary cache on the motherboard diminished. As for memory controllers, they got moved into the CPU because it was the only way to reliably achieve increased memory bandwidth (designing a 32-bit parallel interface for ANYTHING that has to run at 400+ MHz and communicate across traces on a circuit board is a hardcore engineering challenge; Serial is cheaper to implement and can be faster overall than a simpler parallel solution, but there's a point where you can't shove the bits any faster, and the only way to increase bandwidth is to go parallel. It's not a coincidence that PCI Express video cards communicate 16 bits at a time, but even the fastest fibre-channel disk or network interface is happy with a single bit.
  The sad irony, though, is that 5 years from now, games will probably have graphics about as good as you can get from the best and most expensive SLI solutions money can buy today... but overall performance will probably be less consistent (ie, if Windows decides that it might be a good time to reorganize its temp directory while y
3. Re:For once ... by Anonymous Coward · 2008-09-15 16:31 · Score: 2, Informative
  
  Apparently the author doesn't know much about computers.
  From Tim Sweeney's wikipedia page:
  
  Tim Sweeney is a computer game programmer and the founder of Epic Games, and is best known for his work on ZZT and the Unreal engine.
  Sweeney established Epic as a shareware company while he was a student majoring in mechanical engineering at the University of Maryland.
  From your wikipedia page:
  
  No content.
  
  Hmmmmmmm.... A mechanical engineer versus a guy named after a noodle. I wonder who knows more about computers...?
  (Seriously, mods, the parent got modded Insightful? The parent seems to know nothing about electrical engineering. Mods: if you know nothing about a subject, don't mod posts about that subject!!!)
4. Re:For once ... by ceoyoyo · 2008-09-15 16:33 · Score: 2, Informative
  
  It's very slightly (pennies) cheaper to put one chip on the motherboard rather than two. It's MUCH more expensive to merge two big CPU/GPU type chips into one. Manufacturing flaws become more common fast with bigger chips.
  If we could break CPUs into pieces now we'd do it, both for that and heat reasons. We can't because all the parts currently located in a CPU need to talk to each other very fast. The GPU is something that usually doesn't need to talk to the CPU much. So it's separate.
5. Re:For once ... by Dahamma · 2008-09-15 16:47 · Score: 2, Informative
  
  I'm sorry to be blunt, this post is almost entirely inaccurate! Informative, jeesh.
  1) There is no GT950 - Mac Minis use the Intel GMA 950.
  2) the GMA 950 has NOTHING to do with merging the CPU and GPU - it merges the motherboard chipset with the GPU.
  3) the GMA 950 is the same old special-purpose GPU concept as anything from NVidia or ATI (er, AMD) - just slower and using system memory instead of dedicated RAM.
  The article is talking about rendering graphics on a high-performance, parallelized general purpose processor, not a crappy GPU embedded on the motherboard. Think a future generation of the Sony Cell architecture, not the Intel GMA!
6. Re:For once ... by juiceboxfan · 2008-09-15 22:41 · Score: 5, Informative
  
  It's very slightly (pennies) cheaper to put one chip on the motherboard rather than two. It's MUCH more expensive to merge two big CPU/GPU type chips into one. Manufacturing flaws become more common fast with bigger chips.
  I don't think your estimate is correct for packaging and placing a single chip vs. two chips but in high volume manufacturing even pennies make all the difference. What about the cost of the second fan and other infrastructure for the GPU? There is also the issue of real estate - two chips take up more room than one so your etch routing becomes more of a challenge requiring smaller etch geometries resulting in a more expensive PCB.
  
  If we could break CPUs into pieces now we'd do it, both for that and heat reasons. We can't because all the parts currently located in a CPU need to talk to each other very fast. The GPU is something that usually doesn't need to talk to the CPU much. So it's separate.
  No, it will always be cheaper to integrate at the chip level. Look at system prices, dual core is cheaper than dual CPU (if you can even find one these days) of similar performance. The trend in embedded systems (even more cost sensitive than PCs) is to integrate everything into a single package (SOC). As others have pointed out cache and the FPU were once separate chips from the CPU they are now all integrated into one package.
Wrong summary by Anonymous Coward · 2008-09-15 10:38 · Score: 5, Informative

He talks about the impending fall of the fixed function GPU.
Re:Obvious by jacquesm · 2008-09-15 10:51 · Score: 3, Informative

gpu's aren't really parallel in that sense, they are parallel in the SIMD sense.

--
MP3 Search Engine
Re:I hope not! by Anonymous Coward · 2008-09-15 11:16 · Score: 5, Informative

In such a world you won't need APIs because you'll have libraries that you can include in the compile process.

A library you include in the compile process is an implementation of an API.

APIs reduce code bulk at the cost of reduced code speed, don't they?

No.
Re:Obvious by Hortensia+Patel · 2008-09-15 11:59 · Score: 5, Informative

gpu's aren't really parallel in that [traditional multithreaded] sense, they are parallel in the SIMD sense.
Actually, they're somewhere in between. Some current hardware can reallocate individual processors between fragment and vertex processing depending on the current workload profile. Even at the level of an individual processor lots of "threads" may be running simultaneously; this is to hide latency when a shader program blocks on memory (texture or framebuffer) access.
If you look at NV's descriptions of their 8xx-series drivers, they talk about *hundreds* of threads in flight at any given time. These aren't threads in the classical sense - there's no preemption, for a start - but they're much, much more advanced than SIMD-style "apply this instruction to all these values" parallelism.
You are quite wrong by Anonymous Coward · 2008-09-15 12:12 · Score: 3, Informative

You are very, very wrong. The history of computer hardware has been one where extra functionality is moved from the cpu for speed, folded back in a few years later for efficiency, and farmed out to an add-on card for speed some time later...
See http://catb.org/jargon/html/W/wheel-of-reincarnation.html for details.
1. Re:You are quite wrong by sricetx · 2008-09-15 16:18 · Score: 2, Informative
  
  The last PC I had that had a slot for external cache was a 486. This was around 1994, and even then COAST modules http://en.wikipedia.org/wiki/Cache_on_a_stick were a little difficult to find -- It's not like you could just walk in to the local Futureshop and pick one up.
  
  It will be good riddance to video cards if that functionality moves to the CPU as far as I'm concerned. Especially if Intel and AMD continue with their recent trend of developing open drivers for their chips. Unlike other companies in the market, who only release binary blob drivers and deny serious problems with their current generation of laptop graphics chips http://www.theinquirer.net/gb/inquirer/news/2008/07/09/nvidia-g84-g86-bad.
2. Re:You are quite wrong by Goaway · 2008-09-16 01:03 · Score: 2, Informative
  
  and what would be an example for this?
  How about, say, specialized graphics hardware being implemented in separate chips for home computers, and then later being discarded in favor of using faster CPUs to do the rendering instead?
  Like what happened in the 80s and 90s.
Re:I hope not! by jd · 2008-09-15 17:18 · Score: 2, Informative

At the deep RISC level, they probably wouldn't be. In fact, they'd certainly not be, or you'd simply have an SMP cluster with some emulation on top. If you're going for the migrating code, you'd need binary compatibility at the emulated mode (think Transmeta or IBM's DAISY project) but the underlying specialization would give you the improvement over a homogeneous cluster. If you're going for the totally heterogeneous design - basically the Cell approach but on a far, far larger scale - you need endian compatibility and bus protocol compatibility but nothing more.
This Cell-like approach gives the greatest room for innovation but also imposes the greatest development costs and greatest purchase costs. It also makes ABI backwards compatibility extremely hard or impossible, so you'd end up with a proliferation of builds of the same code for any binary packages (including all closed-source) and a far more complex build and optimization process for all source packages (gentoo users beware). It also makes bus design far more complex, as the more specialized the decentralized processor units (DPUs) get, the more synchronization headaches you will get.
A DPU cluster should logically give the best performance, for the same reason pure RISC outperforms pure CISC - fewer overheads, tighter logic and also (in consequence) more real-estate for optimizations, parallelizations, cache and other goodies. Distances would be greater between processing units, which will have an impact, but so long as the mean gain across the DPUs exceeded the mean loss due to extra distance and extra communication layers, you'd gain overall. This means a DPU computer cannot be flat beyond a certain scale. As SMP clusters cannot exceed 16 processors due to locking issues with shared resources, DPU computers cannot exceed 16 DPUs for a single resource and would probably avoid sharing resources if at all physically possible. This means a DPU computer must be heavy on duplicate resources. But for duplication to beat the deadlocking issue, bus bandwidth needs to be extremely high and bus latency needs to be almost non-existent.
Cell processors are much too basic to run into these sorts of problems, but if you wanted to scale the concept up by, oh, an order of magnitude and beat the design limitations in the Cell processor, you'd need to be spending serious time and money. I expect further "specialist" *PUs to be developed for some time, but the truly RISC, truly distributed DPU is unlikely to exist outside of theory or maybe a research lab or two for at least a decade and I don't expect DPU home machines for at least 30+ years.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:What about audio? by vux984 · 2008-09-15 18:57 · Score: 2, Informative

How long has audio been around? Have you ever seen an audio chip integrated into the CPU? Most of them are done by onboard chips, not on the CPU.
They've moved from being standalone cards to being predominantly integrated into the mainboard and using the cpu for processing... rather like HSP modems, really.