GPU Gems 2 · Slashdot Mirror

Martin Ecker writes "Following up on last year's first installment of the "GPU Gems" book series, NVIDIA has recently finished work on the second book in the series titled GPU Gems 2 - Programming Techniques for High-Performance Graphics and General-Purpose Computation, published by Addison-Wesley. Just like the first book, GPU Gems 2 is a collection of articles by various authors from game development companies, academia, and tool developers on advanced techniques for programming graphics processing units (or GPUs for short). It is aimed at intermediate to advanced graphics developers that are familiar with the most common graphics APIs - in particular OpenGL and Direct3D - and high-level shading languages, such as GLSL, HLSL, or Cg. The reader should also be proficient in C++. As with GPU Gems, GPU Gems 2 is not for beginners. For professional graphics and game developers, however, it is an excellent collection of interesting techniques, tips, and tricks." Read on for Ecker's review. GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation author Matt Pharr, Randima Fernando (Editors) pages 814 publisher Addison-Wesley Publishing rating 9 reviewer Martin Ecker ISBN 0321335597 summary The second installment in NVIDIA's GPU Gems series shines with more "gems" on real-time graphics and general-purpose computation on GPUs.

The book is divided into six parts, each dealing with a different aspect of GPU programming. Compared to the first book, more emphasis is put on the quickly evolving area of general-purpose computation on GPUs, an area that is commonly known as GPGPU (General Purpose GPU; for more information see http://www.gpgpu.org). To my knowledge, this is the first book to contain so much information related to this relatively new field. In particular, three of the six parts of the book are about GPGPU and its applications. The first three parts, however, are about real-time computer graphics.

The first part of the book contains 8 chapters on photo-realistic rendering that mostly deal with how to efficiently render a large number of objects in a scene, which is a necessity for rendering convincing natural effects, such as grass or trees. Two chapters in this part of the book discuss geometry instancing and segment buffering, two techniques to render a large number of instances of the same object, and another chapter focuses on using occlusion queries to implement coherent hierarchical occlusion culling, which is also useful in scenes with high depth complexity.

Other interesting topics in this part of the book include adaptive tessellation of surfaces on the GPU, displacement mapping - an extension to the popular parallax mapping used in some current games - that allows to render realistic bumps on a simple quad, and terrain rendering with geometry clipmaps. Geometry clipmaps can be used to render large terrains almost completely on the GPU. They were first introduced in a SIGGRAPH 2004 paper by Frank Losasso and Hugue Hoppe, and the algorithm is discussed in detail by Arul Asivatham and Hoppe himself in chapter two of this book. This technique will most likely find wide application in next generation games.

Part two of the book consisting of 11 chapters deals with shading and lighting. I found chapter 9 by Oles Shishkovtsov on deferred shading in the soon-to-be-released computer game S.T.A.L.K.E.R. quite interesting. The game features a full deferred shading renderer, which is probably a first for a commercial game. In his chapter Oles describes some of the tricks used and some of the pitfalls encountered while developing that renderer. Also highly interesting is Gary King's chapter on computing irradiance environment maps in real-time on the GPU. These dynamically created irradiance maps can be used to approximate global illumination in dynamic scenes.
Furthermore, this part of the book has chapters on rendering atmospheric scattering, implementing bidirectional texture functions on the GPU, dynamic ambient occlusion culling, water rendering, and using shadow mapping with percentage-closer filtering to achieve soft shadows.

The third part of the book consists of 9 chapters on high-quality rendering. Most chapters in this part deal with implementing high-quality filtering in fragment shaders. For example, there is an interesting chapter on filtered line rendering and another chapter on cubic texture filtering. In chapter 23 NVIDIA also provides interesting insights into the inner workings of their Nalu demo, which was the release demo for the GeForce 6 series that displays an animated mermaid underwater. In particular, the chapter describes the techniques used to implement the mermaid's hair. Finally, Simon Green of NVIDIA presents his GPU-only implementation of improved Perlin Noise.

Whereas the first three parts of the book cover techniques for real-time computer graphics, the three final parts deal exclusively with GPGPU. Since GPUs nowadays offer a high level of programmability and because of their wide-spread use in commodity PCs, GPUs can be used as a cost-efficient processor for general computation in addition and parallel to the CPU.

Efficient usage of the GPU for general computation, so that conventional CPU implementations can be outperformed, requires special care when mapping algorithms to the highly parallel architecture of the GPU pipeline. Part four of the book mostly deals with exactly this and represents an introduction to the fantastic field of GPGPU. The 8 chapters of this part first describe the general streaming architecture of GPUs, with one chapter going into the details of the architecture of the GeForce 6 series in particular, and then move on to show how to map conventional CPU data structures and algorithms to the GPU. For example, textures can be regarded as the GPU equivalent to CPU data arrays. There is also a chapter on how to implement flow-control idioms on the GPU and a chapter on optimizing GPU programs.

The 6 chapters of part five of the book are on image-oriented computing and describe a number of GPGPU algorithms for performing global illumination computations, for example by using radiosity, on the GPU. There is also a chapter on doing computer vision on the GPU, which I found to be quite exciting. Because of its high parallelism, the GPU can be used to do the tedious tasks of edge detection and marker recognition required in computer vision in a very efficient manner, thus elevating the CPU to do other tasks in the meantime. James Fung, the author of this chapter, is also involved in an open source project called OpenVIDIA (see http://openvidia.sourceforge.net) that is all about GPU-accelerated computer vision. he final chapter in this part of the book explains how to perform conservative rasterization, which is important for some GPGPU algorithms to achieve accurate results.

The final part of the book has 6 chapters that present GPGPU techniques to perform a variety of simulation and numerical algorithms on the GPU. One chapter shows how to map linear algebra operations onto the GPU and develops a GPU framework to solve systems of linear equations. In other chapters the GPU is used for protein structure prediction, options pricing, flow simulation, and medical image reconstruction. These chapters show good examples of how the GPU can be used for non-graphics-related tasks. Furthermore, Peter Kipfer and Rüdiger Westermann present algorithms for efficient sorting on the GPU. Since sorting is such an important building block of many higher-level algorithms, it is important to have an efficient implementation for GPGPU algorithms.

The book contains many illustrations and diagrams that visualize the results of certain techniques or explain the presented algorithms in more detail. All images in the book are in color, which is definitely advantageous for a graphics book. In my opinion, the excellent quality and also the quantity of images and illustrations is one of the strongest points of this book compared to other graphics books.

The book also comes with a CD-ROM with supplemental material, videos, and demo applications to some chapters. Most of the applications include the full source code, which makes it easy to experiment with the techniques presented in the book. Note that most of the applications run on Windows only and many of them require a shader model 3.0 graphics card, such as a GeForce 6600 or 6800. The latest ATI cards, such as the X800, are not sufficient for running some demos because they only support shader model 2.0.

I highly recommend this book to any professional working as graphics or game developer. It is a valuable addition to my library of graphics books and I will come back to a number of articles in the near future. The focus on GPGPU in the second half of the book is a welcome addition and we can expect to see more and more non-graphics-related applications make use of the processing power in today's GPUs.

Martin Ecker has been involved in real-time graphics programming for more than 10 years and works as a game developer for arcade games. In his rare spare time he works on a graphics-related open source project called XEngine http://xengine.sourceforge.net.You can purchase GPU Gems 2 - Programming Techniques for High-Performance Graphics and General-Purpose Computation from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

7 of 70 comments (clear)

Min score:

Reason:

Sort:

Re:I love dead links by japaget · 2005-05-12 09:41 · Score: 4, Informative

Go to http://books.slashdot.org/article.pl?sid=04/06/01/ 1120217 for the review of the earlier book.
Slight clarification by The+Optimizer · 2005-05-12 09:56 · Score: 4, Informative

I found chapter 9 by Oles Shishkovtsov on deferred shading in the soon-to-be-released computer game S.T.A.L.K.E.R. quite interesting. The game features a full deferred shading renderer, which is probably a first for a commercial game.

The first commercial game that I know of to use a full deferred shading engine was Shrek for the Xbox, which was released in Fall 2001.
I also worked on an unannounced PC game in 2003 that had a fully deferred shading (lighting) renderer. Alas, that title was cancelled.

Deferred Shading on the PC is not very practical on pre-shader model 2.0 hardware, though possible I'm sure. The Xbox allows direct access to the register combiners, exposing more than 2x the fragment processing power than DX8 / Shader 1-1.3 on the PC.
1. Re:Slight clarification by daVinci1980 · 2005-05-12 11:18 · Score: 4, Informative
  
  Deferred shading isn't really dependent on the version of hardware you are using. It's more a question of whether it provides value.
  
  The entire reason behind doing deferred shading (in games) is that lighting computations per-pixel--especially when you start using HDR--are extraordinarily expensive. Deferring the shading of these fragments until you are sure they will be visible saves the GPU a ton of work.
  
  To briefly explain what deferred shading is (for those who aren't graphics programmers)... Deffered shading is a technique where you lay down the entire scene in two passes. The first pass renders the entire scene but with all color writes disabled. This allows modern cards to draw data at about 2x the normal rate (plus you can generally avoid any shading--except those that do depth replacement--which provides an additional speed increase to this pass). The second pass is then rendered using the full / normal technique. The benefit is that since the depth buffer has already been filled, the staggering amount of graphics hardware devoted to rejecting fragments that are hidden is used to full advantage. This allows for some serious speed increases, especially if the shading of visible surfaces is very complex.
  
  The downside of this technique (and maybe what the parent was trying to get at) is that if your shaders are not particularly complex, than this technique is really not much of a win... In fact it can be slower than standard one-pass solutions in that respect.
  
  --
  I currently have no clever signature witicism to add here.
Re:Audio processesing using the GPU by dtemplar · 2005-05-12 10:04 · Score: 3, Informative

Some audio cards already use DSP chips developed for graphics processing (notably the UAD-1 DSP card,) but the main problem with using GPUs sitting on an video card to process audio is that the video card doesn't have any way of sending the processed data back to the rest of the system at a useful rate. So you can process audio at 40+ GFLOPS, but you can't send it back to your sequencer software and/or audio card. If/once video card manufacturers hack some sort of interface to send data back to the system, then things are going to get pretty interesting for sure.
Re:Audio processesing using the GPU by Rufus211 · 2005-05-12 10:26 · Score: 2, Informative

That's one of the main advantages of the new PCI-Express cards. AGP was built for video cards so is very lobsided. PCI-Express was built as a general purpose high-speed interconnect and as such has balanced bandwidth to and from the (video) card. Also a returning audio stream, even if you're doing stereo or surround sound, it's all that bandwidth intensive.
Re:I love dead links by carnivore302 · 2005-05-12 20:39 · Score: 2, Informative

I'll give you another link, to The Cg Tutorial: The Definitive Guide to Programmable Real-Time Graphics, which can be seen as an introductory level book on graphics programming. I especially liked the chapter on bump mapping, where this concept is extremely well explained. A drawback is that it should only be considered an introduction. It sounds like the gpu gems book goes more in-depth.

--
Please login to access my lawn
Re:LAME GPUs by Anonymous Coward · 2005-05-13 01:52 · Score: 1, Informative

Others have gone down this path - i.e, let's use the GPU for non-graphics processing. The big problem is getting the data to and from the GPU processor's memory. If you're gonna feed it with audio stuff, for instance, you gotta get the data in and out. This turned out to be the bottleneck, unfortunately.

If the GPU had high-speed direct access to main memory - ironically, the kind of architecture that those crappy low-end integrated graphics subsystems use - then this problem would go away, but currently the idea, while attractive in theory, doesn't work very well because of the I/O problem of getting a large amount of data in and out of GPU memory. Obviously, this is not a problem when the GPU is processing graphics data, because the model data - i.e the vertices etc being rendered, changes relatively slowly wrt to the actual rendered output, but for other potential non-GPU processing e.g audio, this would not be the case.