Slashdot Mirror


Twilight of the GPU — an Interview With Tim Sweeney

cecom writes to share that Tim Sweeney, co-founder of Epic Games and the main brain behind the Unreal engine, recently sat down at NVIDIA's NVISION con to share his thoughts on the rise and (what he says is) the impending fall of the GPU: "...a fall that he maintains will also sound the death knell for graphics APIs like Microsoft's DirectX and the venerable, SGI-authored OpenGL. Game engine writers will, Sweeney explains, be faced with a C compiler, a blank text editor, and a stifling array of possibilities for bending a new generation of general-purpose, data-parallel hardware toward the task of putting pixels on a screen."

6 of 286 comments (clear)

  1. For once ... by Qbertino · · Score: 4, Informative

    For once I'm reading an 'xzy is going to die' article that doesn't sound like utter rubbish. Could it be that, for once, the one stating this actually knows what he's talking about?

    My last custom realtime GPU was a Geforce Ti4200. I'm now using a Mac Mini with GT950. Mind you, Blender *is* quite a bit slower on the 950, even though it runs with twice the sysclock, but I'm not really missing the old Geforce. I too think it highly plausible that the GPU and the CPU merge within the next few years.

    --
    We suffer more in our imagination than in reality. - Seneca
    1. Re:For once ... by Miamicanes · · Score: 5, Informative

      > Or how modems used to have their own signal processors? But now most use the CPU.

      Welllll... I'd say the move to HSP modems took place more because the ascent of DSL and cable internet relegated modems to the status of, "nice to have if I happen to need it once in a blue moon to send a fax or dialup in the middle of nowhere at some point over the next 2-4 years." Remember all those articles 3-5 years ago about how host signal processing absolutely DESTROYS CPU performance because it demands constant attention from the CPU, and the software overhead of having to keep stopping to service the modem caused the computer to run at least 20-30% slower? Well, not much has changed, except now with a multi-core CPU it can kill the performance of just ONE core instead of bringing the whole computer to its knees. But even with multicore CPUs, I can guarantee that if modems were still the primary way people got online, there would definitely be a thriving market for "performance" modems that offloaded at LEAST the signal-processing functions to a real DSP (like the Lucent "semi-Winmodems", that actually gave users the best of both worlds... offloading the stuff that really dragged the CPU down to its own DSP, but doing things like compression and error-correction that could be handled in discrete batches faster than even dedicated hardware could achieve).

      There's another thing to remember about discrete chips... in the early 3dfx days, the mainstream CPU makers (Intel, AMD, and Cyrix) had ZERO interest in giving even the slightest attention to 3D graphics. Unless you're IBM (who wasn't interested in 3D, either), building CPUs is probably way beyond your company's capabilities. HOWEVER, designing a 3D graphics chip with the complexity of the first ones used by 3dfx IS within the capabilities of a well-funded design company with the connections to get it manufactured. It doesn't even need a fab with the capabilities of one owned by Intel, AMD, etc. So discrete 3d cards were an elegant way to sidestep the deadweight lack of interest on the CPU side by shifting it to a chip that smaller companies could design and build. Now that "the big guys" have turned their attention to it, the smaller players don't have a prayer (ergo, the merger mania among CPU/mobo chipset makers and graphics chip makers).

      The same observation can be made regarding cache and memory controllers. In the First Pentium Era, volume manufacturers like Compaq (and their comrades at arms, Intel & AMD) regarded cache as a luxury the unwashed masses could live without, even if it only saved $5 and cut the effective performance in half. Hey, consumers only look at that "Mhz" number, anyway... Fortunately, performance-oriented mobo makers were able to take matters into their own hands, and once again do an end run around the CPU vendors' sloth and put cache directly on the mobo. Once CPU makers decided cache mattered, and put lots of it on-die, the marginal benefit of putting more, relatively expensive tertiary cache on the motherboard diminished. As for memory controllers, they got moved into the CPU because it was the only way to reliably achieve increased memory bandwidth (designing a 32-bit parallel interface for ANYTHING that has to run at 400+ MHz and communicate across traces on a circuit board is a hardcore engineering challenge; Serial is cheaper to implement and can be faster overall than a simpler parallel solution, but there's a point where you can't shove the bits any faster, and the only way to increase bandwidth is to go parallel. It's not a coincidence that PCI Express video cards communicate 16 bits at a time, but even the fastest fibre-channel disk or network interface is happy with a single bit.

      The sad irony, though, is that 5 years from now, games will probably have graphics about as good as you can get from the best and most expensive SLI solutions money can buy today... but overall performance will probably be less consistent (ie, if Windows decides that it might be a good time to reorganize its temp directory while y

    2. Re:For once ... by juiceboxfan · · Score: 5, Informative

      It's very slightly (pennies) cheaper to put one chip on the motherboard rather than two. It's MUCH more expensive to merge two big CPU/GPU type chips into one. Manufacturing flaws become more common fast with bigger chips.

      I don't think your estimate is correct for packaging and placing a single chip vs. two chips but in high volume manufacturing even pennies make all the difference. What about the cost of the second fan and other infrastructure for the GPU? There is also the issue of real estate - two chips take up more room than one so your etch routing becomes more of a challenge requiring smaller etch geometries resulting in a more expensive PCB.

      If we could break CPUs into pieces now we'd do it, both for that and heat reasons. We can't because all the parts currently located in a CPU need to talk to each other very fast. The GPU is something that usually doesn't need to talk to the CPU much. So it's separate.

      No, it will always be cheaper to integrate at the chip level. Look at system prices, dual core is cheaper than dual CPU (if you can even find one these days) of similar performance. The trend in embedded systems (even more cost sensitive than PCs) is to integrate everything into a single package (SOC). As others have pointed out cache and the FPU were once separate chips from the CPU they are now all integrated into one package.

  2. Wrong summary by Anonymous Coward · · Score: 5, Informative

    He talks about the impending fall of the fixed function GPU.

  3. Re:I hope not! by Anonymous Coward · · Score: 5, Informative

    In such a world you won't need APIs because you'll have libraries that you can include in the compile process.

    A library you include in the compile process is an implementation of an API.

    APIs reduce code bulk at the cost of reduced code speed, don't they?

    No.

  4. Re:Obvious by Hortensia+Patel · · Score: 5, Informative

    gpu's aren't really parallel in that [traditional multithreaded] sense, they are parallel in the SIMD sense.

    Actually, they're somewhere in between. Some current hardware can reallocate individual processors between fragment and vertex processing depending on the current workload profile. Even at the level of an individual processor lots of "threads" may be running simultaneously; this is to hide latency when a shader program blocks on memory (texture or framebuffer) access.

    If you look at NV's descriptions of their 8xx-series drivers, they talk about *hundreds* of threads in flight at any given time. These aren't threads in the classical sense - there's no preemption, for a start - but they're much, much more advanced than SIMD-style "apply this instruction to all these values" parallelism.