Slashdot Mirror


GPU Gems 2

Martin Ecker writes "Following up on last year's first installment of the "GPU Gems" book series, NVIDIA has recently finished work on the second book in the series titled GPU Gems 2 - Programming Techniques for High-Performance Graphics and General-Purpose Computation, published by Addison-Wesley. Just like the first book, GPU Gems 2 is a collection of articles by various authors from game development companies, academia, and tool developers on advanced techniques for programming graphics processing units (or GPUs for short). It is aimed at intermediate to advanced graphics developers that are familiar with the most common graphics APIs - in particular OpenGL and Direct3D - and high-level shading languages, such as GLSL, HLSL, or Cg. The reader should also be proficient in C++. As with GPU Gems, GPU Gems 2 is not for beginners. For professional graphics and game developers, however, it is an excellent collection of interesting techniques, tips, and tricks." Read on for Ecker's review. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation author Matt Pharr, Randima Fernando (Editors) pages 814 publisher Addison-Wesley Publishing rating 9 reviewer Martin Ecker ISBN 0321335597 summary The second installment in NVIDIA's GPU Gems series shines with more "gems" on real-time graphics and general-purpose computation on GPUs.

The book is divided into six parts, each dealing with a different aspect of GPU programming. Compared to the first book, more emphasis is put on the quickly evolving area of general-purpose computation on GPUs, an area that is commonly known as GPGPU (General Purpose GPU; for more information see http://www.gpgpu.org). To my knowledge, this is the first book to contain so much information related to this relatively new field. In particular, three of the six parts of the book are about GPGPU and its applications. The first three parts, however, are about real-time computer graphics.

The first part of the book contains 8 chapters on photo-realistic rendering that mostly deal with how to efficiently render a large number of objects in a scene, which is a necessity for rendering convincing natural effects, such as grass or trees. Two chapters in this part of the book discuss geometry instancing and segment buffering, two techniques to render a large number of instances of the same object, and another chapter focuses on using occlusion queries to implement coherent hierarchical occlusion culling, which is also useful in scenes with high depth complexity.

Other interesting topics in this part of the book include adaptive tessellation of surfaces on the GPU, displacement mapping - an extension to the popular parallax mapping used in some current games - that allows to render realistic bumps on a simple quad, and terrain rendering with geometry clipmaps. Geometry clipmaps can be used to render large terrains almost completely on the GPU. They were first introduced in a SIGGRAPH 2004 paper by Frank Losasso and Hugue Hoppe, and the algorithm is discussed in detail by Arul Asivatham and Hoppe himself in chapter two of this book. This technique will most likely find wide application in next generation games.

Part two of the book consisting of 11 chapters deals with shading and lighting. I found chapter 9 by Oles Shishkovtsov on deferred shading in the soon-to-be-released computer game S.T.A.L.K.E.R. quite interesting. The game features a full deferred shading renderer, which is probably a first for a commercial game. In his chapter Oles describes some of the tricks used and some of the pitfalls encountered while developing that renderer. Also highly interesting is Gary King's chapter on computing irradiance environment maps in real-time on the GPU. These dynamically created irradiance maps can be used to approximate global illumination in dynamic scenes.
Furthermore, this part of the book has chapters on rendering atmospheric scattering, implementing bidirectional texture functions on the GPU, dynamic ambient occlusion culling, water rendering, and using shadow mapping with percentage-closer filtering to achieve soft shadows.

The third part of the book consists of 9 chapters on high-quality rendering. Most chapters in this part deal with implementing high-quality filtering in fragment shaders. For example, there is an interesting chapter on filtered line rendering and another chapter on cubic texture filtering. In chapter 23 NVIDIA also provides interesting insights into the inner workings of their Nalu demo, which was the release demo for the GeForce 6 series that displays an animated mermaid underwater. In particular, the chapter describes the techniques used to implement the mermaid's hair. Finally, Simon Green of NVIDIA presents his GPU-only implementation of improved Perlin Noise.

Whereas the first three parts of the book cover techniques for real-time computer graphics, the three final parts deal exclusively with GPGPU. Since GPUs nowadays offer a high level of programmability and because of their wide-spread use in commodity PCs, GPUs can be used as a cost-efficient processor for general computation in addition and parallel to the CPU.

Efficient usage of the GPU for general computation, so that conventional CPU implementations can be outperformed, requires special care when mapping algorithms to the highly parallel architecture of the GPU pipeline. Part four of the book mostly deals with exactly this and represents an introduction to the fantastic field of GPGPU. The 8 chapters of this part first describe the general streaming architecture of GPUs, with one chapter going into the details of the architecture of the GeForce 6 series in particular, and then move on to show how to map conventional CPU data structures and algorithms to the GPU. For example, textures can be regarded as the GPU equivalent to CPU data arrays. There is also a chapter on how to implement flow-control idioms on the GPU and a chapter on optimizing GPU programs.

The 6 chapters of part five of the book are on image-oriented computing and describe a number of GPGPU algorithms for performing global illumination computations, for example by using radiosity, on the GPU. There is also a chapter on doing computer vision on the GPU, which I found to be quite exciting. Because of its high parallelism, the GPU can be used to do the tedious tasks of edge detection and marker recognition required in computer vision in a very efficient manner, thus elevating the CPU to do other tasks in the meantime. James Fung, the author of this chapter, is also involved in an open source project called OpenVIDIA (see http://openvidia.sourceforge.net) that is all about GPU-accelerated computer vision. he final chapter in this part of the book explains how to perform conservative rasterization, which is important for some GPGPU algorithms to achieve accurate results.

The final part of the book has 6 chapters that present GPGPU techniques to perform a variety of simulation and numerical algorithms on the GPU. One chapter shows how to map linear algebra operations onto the GPU and develops a GPU framework to solve systems of linear equations. In other chapters the GPU is used for protein structure prediction, options pricing, flow simulation, and medical image reconstruction. These chapters show good examples of how the GPU can be used for non-graphics-related tasks. Furthermore, Peter Kipfer and Rüdiger Westermann present algorithms for efficient sorting on the GPU. Since sorting is such an important building block of many higher-level algorithms, it is important to have an efficient implementation for GPGPU algorithms.

The book contains many illustrations and diagrams that visualize the results of certain techniques or explain the presented algorithms in more detail. All images in the book are in color, which is definitely advantageous for a graphics book. In my opinion, the excellent quality and also the quantity of images and illustrations is one of the strongest points of this book compared to other graphics books.

The book also comes with a CD-ROM with supplemental material, videos, and demo applications to some chapters. Most of the applications include the full source code, which makes it easy to experiment with the techniques presented in the book. Note that most of the applications run on Windows only and many of them require a shader model 3.0 graphics card, such as a GeForce 6600 or 6800. The latest ATI cards, such as the X800, are not sufficient for running some demos because they only support shader model 2.0.

I highly recommend this book to any professional working as graphics or game developer. It is a valuable addition to my library of graphics books and I will come back to a number of articles in the near future. The focus on GPGPU in the second half of the book is a welcome addition and we can expect to see more and more non-graphics-related applications make use of the processing power in today's GPUs.

Martin Ecker has been involved in real-time graphics programming for more than 10 years and works as a game developer for arcade games. In his rare spare time he works on a graphics-related open source project called XEngine http://xengine.sourceforge.net.You can purchase GPU Gems 2 - Programming Techniques for High-Performance Graphics and General-Purpose Computation from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

10 of 70 comments (clear)

  1. Audio processesing using the GPU by Anonymous Coward · · Score: 5, Interesting

    One of the uses for the GPU that looks promising is audio processing. There was an article about one company developing VSTs that use the GPU. It will be interesting to see how people will utilize their graphics cards in the future.

    1. Re:Audio processesing using the GPU by dustman · · Score: 5, Interesting

      It will be interesting to see how people will utilize their graphics cards in the future.

      My prediction: In the future, people will use their graphics cards for graphics :)

      In the very near term, GPUs have so much more vector processing power, that people say "hmm, we could use the GPU for x"... But, the GPU is *not* a general purpose processor, and doing general "programming" for it isn't going to be a good use of it.

      GPU's are very powerful, so if you could bring a product/app to market within the next couple of years, you would have a performance advantage. But only for a little while.

      In the long term: The Cell processor has, on each CPU, 8 vector units. According to a quick google search, each one is a 32 GFLOPS vector processor... So, thats sort of like 256 GFLOPS of processing power available. A current NVidia GPU is more than 32, but a lot less than 100 GFLOPS. You don't really have "256 GFLOPS" available for any given task, because the processors are independent... You might be able to use one of the Cell vector units as a replacement for your "graphics card", but it won't be as special purpose as your graphics card is, so each GFLOP will get you less in terms of actual graphics goodness... And, they're independent, and although graphics is very parallelizable, no real world problems comes even close to being 100% parallelizable. Witness the dismal results of SLI rendering.

      Cell looks awesome, but when we can hold the real thing in our hands, I wouldn't be surprised if it didn't live up to its hype, (but I wouldn't be surprised if it did, either). My prediction for Cell is: the processor exists and is everything they say it is, which is just a PowerPC-based CPU with 8 attached decently powerful vector units... the whole concept of "automatic distributing computing" is probably not worth paying attention to.

      OK, this long post turned into an ad for Cell I think, but I am very psyched about it.

      A summary: You can use your graphics card for non-graphics processing, and you will have a performance advantage while doing it for a little while. In the long term, Cell or something like it will be the main CPU in your computer, and it will be much better for non-graphics-based vector processing than a GPU, and better for some kinds of graphics processing (raytracing, non-realtime, etc), but not for the current "standard" (raster-based graphics).

  2. Re:GLIDE by TheRealMindChild · · Score: 2, Interesting

    Hey, dont KNOCK GLIDE.

    From a developer standpoint, I enjoyed GLIDE a lot more then OpenGL, and multitudes more then Direct3D. I believe it didn't stick, simply because, unlike the other two, it didn't progress much, and Creative made sure it wasn't a general hardware API.

    --

    "When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
  3. How inclusive is the book? by moz25 · · Score: 3, Interesting

    The book seems like a must-have. One question though: to which extent does it apply to other manufacturer's GPUs too? I'm not entirely comfortable with it being written by one specific manufacturer if I'm looking for information applicable to all/most potential users.

  4. Quartz Composer on MacOS 10.4 by samkass · · Score: 3, Interesting

    One of the most interesting aspects of GPU programming that I've been playing with recently is Quartz Composer released as part of MacOS X 10.4's dev tools (included on the install DVD.)

    It's a visual programming environment that lets you hook together "patches" that create, control, and present audio and video. You can include GL-slang kernels as well. Also, since MacOS X 10.4's Core Image will recompile GL shader language into Altivec code if the GPU isn't up to the task, it adds a lot of flexibility as to when you're okay using the shader language. You can synchronize audio effects with real-time video effects, and hook up iSights, still images, MIDI sound, audio input, mix them all together on the CPU and GPU, and present some stunning effects. I'm certainly going to be checking this book out to see if it helps with this sort of endeavor.

    I don't want to Slashdot anyone's site, since most people working on it are just publishing their creations on personal blogs, but a few google searches can turn up some really fantastic visual effects people have created in the couple weeks it's been out.

    Here is Apple's intro to the subject:
    http://developer.apple.com/documentation/GraphicsI maging/Conceptual/QuartzComposer/qc_intro/chapter_ 1_section_1.html
    and, specifically,
    http://developer.apple.com/documentation/GraphicsI maging/Conceptual/CoreImaging/ci_plugins/chapter_4 _section_3.html

    --
    E pluribus unum
  5. FFT's on GPU's? by viva_fourier · · Score: 3, Interesting

    The very last chapter "Medical Image Reconstruction with the FFT" was really the only one that had caught my eye -- anyone out there know of any projects involving processing loads of FFT's on a GPU such as in image restoration? Just curious...

    --
    and now back to the fallout shelter...
    1. Re:FFT's on GPU's? by daVinci1980 · · Score: 3, Interesting

      There's a shortcut to a sample for this over at NVIDIA's develope r website. I looked into this briefly, and I was pretty impressed with the performance... The GPU always outperforms the CPU, and in most cases the improvement is a factor of 1.5-2.0.

      Not as high as I would expect, but I expect this will increase over time.

      --
      I currently have no clever signature witicism to add here.
  6. Re:Slight clarification by The+Optimizer · · Score: 3, Interesting
    Deferred shading isn't really dependent on the version of hardware you are using. It's more a question of whether it provides value.

    Sort of. You're right about the value proposition, but regarding hardware, here is an example:

    One most important thing that is needed during the shading pass is to obtain the fragments position in the space you are working in. This usually means getting it's X,Y and Z values and transforming them into a space such as view space. On PC video cards, you can't read Z info from the Z-buffer (really would be nice), so you have to store it off somewhere. On a DX9 card such a Radeon 9600 you usually write out the Z to a separate render target using 32F format. On a DX8 card such as a Geforce 3, there is no support for float render targets, in fact nothing beyond 8888 and x555/x565 formats is supported. At lot more work is needed to get the fragment's position, and you have fewer pixel shaders ops to do it in.

    You are not entirely right in saying that Deferred shading saves computations by eliminating overdraw, and yes fast-Z rejection on subsequent passes is a big help. Yes, overdraw is saved, but your lighting costs now directly turned into fill, and it is possible to have situations that are worse than forward (traditional) shading. The biggest benefit of Deferred shading IMHO is that you separate out attribute rendering from lighting. Other benefits include complete per-pixel lighting and normal mapping for the scene, and the visual consistency across the entire scene.

    As an example, if you are going to render a character using forward shading, and there are several point lights that might impact the character, at each frame you need to determine which lights can impact the character and select a shader (or multi-pass) that supports all N number (and types) of lights.

    With deferred shading, you just render out the character's attributes (albedo, position, normal, specular info, whatever) to buffers (DX9 multi-render targets are a big help). Then, in the lighting passes, you project the volume affected by the light into 2d screen space and render a ligthing shader into that volume. The shader looks at each fragment and calculates the light influence on that fragment and accumulates it (separately for diffuse and specular usually). Repeat for each light in the scene. The upshot is that the amount of fill/pixel shader needed is dependent upon the projected area affected by the light. The possible downside is that many of the fragments in that area may be in front or behind the light volume, in which case the lighting calculation is performed on a pixel that's the light doesn't impact (it just accumulates zero light).

    For large numbers of small, dynamic lights, deferred can be a huge win. For static lighting it could also be a win. It all really depends on what the game is doing. There are other aspects of Deferred that are different or more difficult such as alpha-blending, but a full discussion is outside the scope of this forum.
  7. Re:GLIDE by Anonymous Coward · · Score: 3, Interesting

    Perhaps he confused Glide (3dfx) with Rendition's Speedy3d (later "RRedline") API. Rendition had a relationship with Creative for the production of boards for their first chip, the Verite 1000. It was Rendition that pioneered the "ISV Evangelist" concept for the PC 3D industry, but 3dfx caught on fast to doing that. It was required, because with no common API back then (somehow nobody considered OpenGL an option right at the beginning), the only way to get games to run on your chip was to get games companies to write directly to your proprietary API.

    Other fun Rendition facts: the Verite 1000 was back 6 months before 3dfx's first chip was, but it was only a prototype, and it had to be spun for production because the VGA was badly broken, along with some other things. 3dfx's approach of 3d only, no 2d, no VGA helped them early on, but hurt them later, when they fell behind to nvidia. It didn't help 3dfx that nvidia became MS's best friend and promised to create a d3d-compliant chip, while 3dfx held vigorously on to Glide. There were many other reasons, too, including being unable to finish chips on any sort of schedule.

    Also, Rendition did not make Z buffering a first-class performance option, because at the time, John Carmack said it was absolutely unnecessary, and that Quake would not use Z buffering. Oops! Rendition also did not make blinear filtering or ARGB color interpolation, or alpha blending a top performance mode, while 3dfx did. Of course, 3dfx dedicated one entire ASIC to texture mapping, and another entire ASIC to color rendering, alpha blending, and Z buffering. Rendition thought that 3dfx's approach would never fly, since it needed twice the memory (memory for texture, and memory for color/z) in a non-unified memory architecture. Then the price of RAM fell through the floor, and Rendition was in a world of hurt with a sub-3d-performance part.

    (I worked at both Rendition and 3dfx).

  8. Coprocessors and "array processors" -- feh! by Latent+Heat · · Score: 2, Interesting
    Back in the day of 8086, 286, and even 386, I did stuff with audio processing on a DSP coprocessor board, machine language programming and all. It was good for its day, but the DSP coprocessor board became obsolete, the relentless march of 486, Pentium I, and Pentium II went on, and it just didn't pay to fiddle like that, especially when there were was good code optimization with VC++.

    I guess we are stalled with a minor improvement with the Pentium 3, and step backward per clockrate with the Pentium 4 and limited availability of compilers that know how to optimize for it. With the pending multi cores per chip, I might play around with balancing DSP tasks between processors and threads, but I don't think I am going to muck around with some proprietary GPU or DSP -- life is too short and the "time to market" window is too narrow.