Slashdot Mirror


AGP Texture Download Problem Revealed

EconolineCrush writes "The latest high-end graphics cards are capable of rendering games at 1600x1200 in 32-bit color at jaw-dropping frame rates, but that might be all they're good for. For all their gaming prowess, all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak. This article lays it all out, testing video cards from ATI, Matrox, and NVIDIA, and clearly illustrates just how bad the problem is. While these cards have no problems rendering images to your screen, you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."

16 of 265 comments (clear)

  1. Um, this is a surprise? by Yarn · · Score: 4, Informative

    I'd certainly expect the AGP bus to be used asymmetrically, how often do you want to do high speed data capture with a card that's primarily output?

    The only situation I can see where you'd want more than PCI bandwidth returning would be for uncompressed HDTV capture, and there are better ways to do that (grab the raw broadcast stream for example)

    --
    -Yarn - Rio Karma: Excellent
    1. Re:Um, this is a surprise? by psavo · · Score: 2, Informative

      nitpicking, AGP is not a bus. It's Accelerated Graphics Port. See article at anand for more info.

      --
      fucktard is a tenderhearted description
  2. Software issue? by larien · · Score: 5, Informative
    From the article, the author reckons this is a software (driver) issue rather than a hardware issue. I also note the test rig ran Windows, but how does linux shape up? Is it better or worse?

    In any event, there's another issue he doesn't really touch upon; while he mentions that a single frame at 1600x1200@32bit colour is 7.5MB, he ignores the fact that a 30fps movie would require (30*7.5)=225MB per second uncompressed; you either have to have that much disk bandwidth or have enough CPU grunt to compress that on the fly. I guess a dedicated MPEG encoder card could help, but your average box is going to have trouble keeping up with on-screen gibs, rocket trails and blood splatters and encoding video.

    1. Re:Software issue? by Anonymous Coward · · Score: 1, Informative

      1600x1200 is a little extreme. But a while back I had to make a video presentation for work. The first thing I tried was connecting a DV cam to the s-video output, but the compression made the text unreadable. So I used software that captured an AVI and then compressed it to an MPEG with high quality settings. Everything was beatiful, except that the video was 5 frames per second. The screen resolution was 800x600x16. At 30 fps that is just over 27 megs a second, which the SCSI U160 disk could easily keep up with. So there are obvious benefits to correcting the problem even if some machines won't be able to capture their Quake match.

  3. Re:Hmm. by MagPulse · · Score: 4, Informative

    This would affect everyone in a different way though. TV stations and production sets, even public access TV, along with low budget movies, would be able to use their PCs with a Radeon 9700 or NV30 card to produce their content. They could not only reproduce many of the effects from movies like Toy Story (notably excluding ray tracing), but do it in real-time for instant feedback, meaning much much faster production cycles. This has the potential to make a big impact.

  4. Professional GFX processing by i_am_nitrogen · · Score: 3, Informative

    Way back when I was working on libfbx, we (the two main libfbx developers) learned of a 48-bit framebuffer developed by SGI. It's used mainly to render special FX for Hollywood. After several composited layers with various effects on an 8-bit per channel system, you can really start to notice the quantization artifacts. Moving to 12- or 16-bits per color channel (depending on whether there's an alpha channel) makes a huge improvement. I've never heard of any 16 byte per pixel (128bit) image format. It'd probably be something like 16-bits per channel RGBA (64), plus 32-bit depth buffer (96), plus 16-bit stencil and select(pick) buffers (128).

  5. Re:128 bit colour? by Space+cowboy · · Score: 3, Informative

    Once, definitely. Twice, probably. Thrice, perhaps.

    You typically composite and re-composite layer after layer to create decent effects, it's not a one-shot thing. Certainly professional video runs at ~48bit for film work.

    Simon

    --
    Physicists get Hadrons!
  6. Re:128 bit colour? by tomstdenis · · Score: 2, Informative

    flaimbait much?

    First off there is no such thing as 32-bit color. Its 24-bit color with either a padding octet or an alpha channel.

    Second, 256 levels is enough that provided a good monitor you can make due quite well.

    Third, flamebait much?

    Tom

    --
    Someday, I'll have a real sig.
  7. Re:128 bit colour? by fingal · · Score: 4, Informative
    If you want to display a gradient from say, dark blue to light blue, you have quite a few shades of blue to choose from. More than 1024, that's for sure, especially in 32 bit color. But your monitor can only display 1024 vertical lines, each being a different shade. (Depending on your resolution, blah, blah, blah.)

    Hmmm. Close but still not quite right. Think of the colour space as a cube with RGB as the three axis of the cube. In 32bit colour you have 8 bits per colour plane, giving you a cube that is 256 x 256 x 256. Any gradient from any point on the cube to any other point on the cube is going to be a maximum of 443 (if my maths is freaked out - distance from two opposite corners of the cube). Plus some messing about with the various quantisation that this line will pass through gives you definite banding on all but the lowest resolution displays...

    --

    The only Good System is a Sound System

  8. Yes, but... by Anonymous Coward · · Score: 5, Informative
    a) texture imposters (realtime adaptive billboarding)

    That's what render-to-texture is for, you don't need to read data back to the CPU.

    b) split world/image-space occlusion culling.

    This wouldn't be too useful for realtime graphics anyways, because of the way the 3D graphics pipeline works. The CPU can already be processing data a few frames ahead of what the GPU is currently working on. If you read back data from the card every frame, you have to wait for the GPU to finish rendering the current frame before you can start work on the next one.

    1. Re:Yes, but... by Mike+Connell · · Score: 5, Informative

      That's what render-to-texture is for, you don't need to read data back to the CPU.

      That is true for simple versions, but with methods moving towards image based rendering you often have to pull the data back anyway. Then you can process the textures to produce better imposters - not necessarily just billboards

      Re: occlusion culling. People are using these methods today for realtime graphics (for example combinations of Greens HZB, or HOMs) even with the low readback speed. UNC's Gigawalk software is one published example (Google for it). Getting Z or alpha channel infomation back is the biggest hit, so these methods would be even more efficient and so more widley applicable with faster transfers. When you're rendering N million triangles per frame (UNC quote 82Million) you have to do this stuff to get realtime rendering.

      So it is used for realtime graphics today - although mainly for heavy duty applications not games.

      HTH

    2. Re:Yes, but... by Mike+Connell · · Score: 3, Informative

      Oops, forgot to point out one more thing too: HP and NVidia have both implemented opengl extensions to address the issue of getting Z occlusion information back (nvidia's is layered on top of the HP extension iirc). This isn't useful for reading back the framebuffer fast, but helps when doing realtime occlusion culling.

  9. I disagree. by Anonymous Coward · · Score: 1, Informative

    Let's say Pixar starts using 3D chips to accelerate their rendering. They will be doing one of two things:

    1) High quality rendering - It takes one hour to render a frame, so the download time is negligible.

    2) Realtime previewing - Why would you want to download each frame to the CPU if all you want is a preview?

  10. Not a new problem, not just 3d by Anonymous Coward · · Score: 1, Informative

    This has been an issue for quite some time. Raster once put reading from the card at being 1/10th the speed of writing to it. This is the reason we have very little "fake transparency" going on right now. Those methods read the frame buffer and then composite upon the necessary region. With this method transparency can neither be fast nor update in real-time.

    The solution is to take this into account when desgning the compositing model which Apple has done and Keith Packard and co are doing with Xrender and it's offshoots.

    macros

  11. Faster readback has been requested for years by cyranose · · Score: 3, Informative

    I've been doing real-time 3D graphics for 10 years and read-back speeds have been the biggest problem for doing many advanced algorithms. We have asked the companies to improve this many times. The problem as I see it: Quake and other benchmark apps don't rely on readback.
    Here are a few other important but non-Quake techniques that are driven by readback speeds. I'll go into more detail on the first for illustration purposes.
    High-quality real-time occlusion culling -- many techniques render the scene quickly by using a unique color tag per object or polygon and then read back the framebuffer to figure out everything that was visible (and how many pixels for each) for a final high-quality pass. If HW drivers would even just implement the standard glHistogram functions (which essentially compress the framebuffer before readback), this would become practical. NVidia adds their NVOcclusion extension, but it's limited in how many objects at a time you can test, it's very asynchronous, and it requires depth sorting on the CPU to make it most useful. The render-color technique does not. Yet HW makers are spending lots of money adding custom HW to do z-occlusion when a simple driver-based software technique may be easier.
    Dynamic Reflection Maps -- for simple, reflective surfaces -- Requires background rendering from multiple POVs (generally six 90 degree views) and caching these. Even if you can cache a small set of maps in AGP memory, you want fast async readback if you have a large fairly static scene and you're roaming around.
    Real-time radiosity -- similar to above, but needs more CPU processing of the returned images and possibly depth maps (reading back the depth buffer is often even more expensive than the color).
    Real-time ray tracing -- the better quality approaches need fast readback to store intermediate results (due to recursion, etc..). With floating point framebuffers and good vertex/pixel shaders, ray-tracing becomes possible, but not yet practical. I believe ./ may even have run a link to one of these techniques a while back.
    So there's a lot more to this issue than just making movies of your games. Faster, better graphics would be possible. So why isn't this a priority?
    ------------ cyranose@realityprime.com

  12. Re:128 bit colour? by Alsee · · Score: 3, Informative

    Any gradient from any point on the cube to any other point on the cube is going to be a maximum of 443 (if my maths is freaked out - distance from two opposite corners of the cube)

    The distance between opposite corners is about 443, but the diagonal distance between color points is 1.732, so you still have 256 points in the gradient.

    Think about it this way, the gradient from (0,0,0) to (255,255,255) passes through (1,1,1), (2,2,2), etc. Exactly 256 points.

    -

    --
    - - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.