AGP Texture Download Problem Revealed
EconolineCrush writes "The latest high-end graphics cards are capable of rendering games at 1600x1200 in 32-bit color at jaw-dropping frame rates, but that might be all they're good for. For all their gaming prowess, all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak. This article lays it all out, testing video cards from ATI, Matrox, and NVIDIA, and clearly illustrates just how bad the problem is. While these cards have no problems rendering images to your screen, you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."
I'd certainly expect the AGP bus to be used asymmetrically, how often do you want to do high speed data capture with a card that's primarily output?
The only situation I can see where you'd want more than PCI bandwidth returning would be for uncompressed HDTV capture, and there are better ways to do that (grab the raw broadcast stream for example)
-Yarn - Rio Karma: Excellent
In any event, there's another issue he doesn't really touch upon; while he mentions that a single frame at 1600x1200@32bit colour is 7.5MB, he ignores the fact that a 30fps movie would require (30*7.5)=225MB per second uncompressed; you either have to have that much disk bandwidth or have enough CPU grunt to compress that on the fly. I guess a dedicated MPEG encoder card could help, but your average box is going to have trouble keeping up with on-screen gibs, rocket trails and blood splatters and encoding video.
This would affect everyone in a different way though. TV stations and production sets, even public access TV, along with low budget movies, would be able to use their PCs with a Radeon 9700 or NV30 card to produce their content. They could not only reproduce many of the effects from movies like Toy Story (notably excluding ray tracing), but do it in real-time for instant feedback, meaning much much faster production cycles. This has the potential to make a big impact.
Way back when I was working on libfbx, we (the two main libfbx developers) learned of a 48-bit framebuffer developed by SGI. It's used mainly to render special FX for Hollywood. After several composited layers with various effects on an 8-bit per channel system, you can really start to notice the quantization artifacts. Moving to 12- or 16-bits per color channel (depending on whether there's an alpha channel) makes a huge improvement. I've never heard of any 16 byte per pixel (128bit) image format. It'd probably be something like 16-bits per channel RGBA (64), plus 32-bit depth buffer (96), plus 16-bit stencil and select(pick) buffers (128).
A solution to the problem with music today
Once, definitely. Twice, probably. Thrice, perhaps.
You typically composite and re-composite layer after layer to create decent effects, it's not a one-shot thing. Certainly professional video runs at ~48bit for film work.
Simon
Physicists get Hadrons!
flaimbait much?
First off there is no such thing as 32-bit color. Its 24-bit color with either a padding octet or an alpha channel.
Second, 256 levels is enough that provided a good monitor you can make due quite well.
Third, flamebait much?
Tom
Someday, I'll have a real sig.
Hmmm. Close but still not quite right. Think of the colour space as a cube with RGB as the three axis of the cube. In 32bit colour you have 8 bits per colour plane, giving you a cube that is 256 x 256 x 256. Any gradient from any point on the cube to any other point on the cube is going to be a maximum of 443 (if my maths is freaked out - distance from two opposite corners of the cube). Plus some messing about with the various quantisation that this line will pass through gives you definite banding on all but the lowest resolution displays...
The only Good System is a Sound System
That's what render-to-texture is for, you don't need to read data back to the CPU.
b) split world/image-space occlusion culling.
This wouldn't be too useful for realtime graphics anyways, because of the way the 3D graphics pipeline works. The CPU can already be processing data a few frames ahead of what the GPU is currently working on. If you read back data from the card every frame, you have to wait for the GPU to finish rendering the current frame before you can start work on the next one.
Let's say Pixar starts using 3D chips to accelerate their rendering. They will be doing one of two things:
1) High quality rendering - It takes one hour to render a frame, so the download time is negligible.
2) Realtime previewing - Why would you want to download each frame to the CPU if all you want is a preview?
This has been an issue for quite some time. Raster once put reading from the card at being 1/10th the speed of writing to it. This is the reason we have very little "fake transparency" going on right now. Those methods read the frame buffer and then composite upon the necessary region. With this method transparency can neither be fast nor update in real-time.
The solution is to take this into account when desgning the compositing model which Apple has done and Keith Packard and co are doing with Xrender and it's offshoots.
macros
I've been doing real-time 3D graphics for 10 years and read-back speeds have been the biggest problem for doing many advanced algorithms. We have asked the companies to improve this many times. The problem as I see it: Quake and other benchmark apps don't rely on readback. ./ may even have run a link to one of these techniques a while back.
Here are a few other important but non-Quake techniques that are driven by readback speeds. I'll go into more detail on the first for illustration purposes.
High-quality real-time occlusion culling -- many techniques render the scene quickly by using a unique color tag per object or polygon and then read back the framebuffer to figure out everything that was visible (and how many pixels for each) for a final high-quality pass. If HW drivers would even just implement the standard glHistogram functions (which essentially compress the framebuffer before readback), this would become practical. NVidia adds their NVOcclusion extension, but it's limited in how many objects at a time you can test, it's very asynchronous, and it requires depth sorting on the CPU to make it most useful. The render-color technique does not. Yet HW makers are spending lots of money adding custom HW to do z-occlusion when a simple driver-based software technique may be easier.
Dynamic Reflection Maps -- for simple, reflective surfaces -- Requires background rendering from multiple POVs (generally six 90 degree views) and caching these. Even if you can cache a small set of maps in AGP memory, you want fast async readback if you have a large fairly static scene and you're roaming around.
Real-time radiosity -- similar to above, but needs more CPU processing of the returned images and possibly depth maps (reading back the depth buffer is often even more expensive than the color).
Real-time ray tracing -- the better quality approaches need fast readback to store intermediate results (due to recursion, etc..). With floating point framebuffers and good vertex/pixel shaders, ray-tracing becomes possible, but not yet practical. I believe
So there's a lot more to this issue than just making movies of your games. Faster, better graphics would be possible. So why isn't this a priority?
------------ cyranose@realityprime.com
Any gradient from any point on the cube to any other point on the cube is going to be a maximum of 443 (if my maths is freaked out - distance from two opposite corners of the cube)
The distance between opposite corners is about 443, but the diagonal distance between color points is 1.732, so you still have 256 points in the gradient.
Think about it this way, the gradient from (0,0,0) to (255,255,255) passes through (1,1,1), (2,2,2), etc. Exactly 256 points.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.