AGP Texture Download Problem Revealed
EconolineCrush writes "The latest high-end graphics cards are capable of rendering games at 1600x1200 in 32-bit color at jaw-dropping frame rates, but that might be all they're good for. For all their gaming prowess, all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak. This article lays it all out, testing video cards from ATI, Matrox, and NVIDIA, and clearly illustrates just how bad the problem is. While these cards have no problems rendering images to your screen, you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."
"However, no manufacturer has presently made this aspect of driver performance a priority." .. Will it justify the cost
Why should they, was anybody complaining till now. The well wont come to horse, the horse has to go to the well to drink water.
So unless a large number of people want it nobody wants to mess around with a perfectly working driver.
And it is not a piece of cake. Recording its own rendrings the software way would be a bitch, the best way would be to provide an access point on the bus itself, though it would play havoc with the board timings and noise issues.
In the end it will call come down to
My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
FB : https://www.facebook.com/TanveersPhotography
The article presents that once the images are rendered out to the display, they are simply discarded. Sure, for any sort of video capture or whatnot, that sucks. However, the article does not attempt to answer why video card manufacturers do this, or if there are any cards that do take advantage of the AGPx4 bandwidth. My guess is cost. If all AGP video cards provided video feedback into the bus, you're probably looking at a non-consumer level product. And you know what? All I do IS use my GeForce to play video games. If dumping the frames after they are rendered keep the cost of my card down, I'm probably happier for it. Quite simply: Does this matter for the average consumer?
If I'm reading this article right, they're claiming that it also hinders normal screen captures.
That would mean that software like VNC would have much higher performance, if the drivers were updated, the way these guys are demanding. (Wouldn't it?)
That'd be fantastic!
Education is the silver bullet.
There are actually some good reasons to be able to do this apart from just taking screenshots. I did (sad but true) these tests over 4 years ago finishing grad school, and the results (read back speed is very bad) were much the same.
Two reasons for wanting to grab the framebuffer (or parts of it) are for
a) texture imposters (realtime adaptive billboarding) and
b) split world/image-space occlusion culling.
With faster readback, both these techniques would probably be used more in "normal" software (ie games).
0.02
Tales from behind the Lagom Curtain
If you're doing multi-pass rendering, it might be extremely convenient to capture the results back to main memory. Especially if the board doesn't have enough texture memory to support all of your temporary buffers.
And boards are starting to ship with 128-bit IEEE floating point buffers.
Essentially, you're right - a human can't tell the difference beyond 24-bit on a given image. But if 100 images were composited together (very likely, to support something like RenderMan-style rendering in hardware), 24 bits is nowhere near enough - you'd get all sorts of accumulation error.
Education is the silver bullet.
I had to switch an application from a screaming PC to a chunky old SGI we now use for a stool because of this problem. We eventually found an expensive graphics card that could keep up. I think it was called Wildcat something or other. We were getting free Quatro 3's at the time which we really wanted to use, but they just had horrible memory read rates. The nVidia guy told us it was an unoptimized path, using software with no hardware support or something. Like maybe they were reading a pixel at a time or something.
...that I have ever read. Either that, or I am missing something here... The idea that graphics subsytems have 'bandwidth to burn' is kind of ironic, given that every graphics chip is ultimately held back in performance by the amount of bandwidth available to it - especially when using high quality options like anti-aliasing. The main focus of the article is actually a very niche segment... the idea of transeferring back rendered images over the AGP bus for TV / film / etc. is a joke... Rendering at high quality takes a huge amount of bandwidth (ie. textures and geometry)... as someone else pointed out, transferring back high-res images would take up over 200MB - that's a quarter of your AGP bandwidth! And without taking into account contention and timing issues in uploading/downloading that would mean that you simple couldn't realise the full potential of the bandwidth without a lot of other (expensive?) hardware... The simple fact is that for production uses, you would be *far* better off taking a stream of data from the DVI connector, and storing that for later use... Screen capture for business use is a reasonable point - however when does that require 3d rendering to be taking place? There should be no contention and no reason why the AGP bus couldn't be utilised fully - although would the graphics companies make enough out of this to justify the effort? As for internet streaming - how many people have access to bandwidth fast enough for high quality, full screen video streaming? Enough said...
Our ray intersection algorithm implemented on the GPU (an "old" Radeon 8500) was able to intersect 114M rays per second. This was loads faster than the best CPU implementation, which could handle between 20 and 40 intersections.
But when we tried to implement a ray tracer based on this, and an efficient one that didn't intersect every ray with every triangle, the readback rate killed us. Our execution times slowed down to the low end of the fastest CPU implementations.
And the readback delay seems to be completely due to the drivers, which apparently still use the old PCI-bus code. If the drivers could use the full potential of the AGP bus, our ray tracer could approach twice the speed of the best CPU ray tracers.
I can think of several reasons:
- The company hasn't released the game yet, but wants to release a video of gameplay to the public. Current methods would require implementing a "save game as it goes" and then a "replay, in offline rendering mode at a steady frame rate, and record results" pass. Or, you could save it at reduced quality if you had video out on your computer and video in on another computer.. but that's just ridiculous, imo.
- Likewise, you have the game, and your friend hasn't purchased it yet, and lives too far away to just take a glance at it..
- You're having a graphical glitch in a game with your particular card that can't be easily illustrated with screenshots. Think how much easier it would be to just send a video clip than having to send a half-dozen screenshots and a wordy explanation, where they still might not believe you.
- You have a Radeon9700, he has a Geforce2. You want to show him how different Doom III looks on your card, as opposed to his card, in real time.
Etc..
Very few people use their typical desktop video cards for actual video production or anything related to it because the hardware up until now was simply unable to handle that sort of load. Now we have these cards that are the beginning of a new era of computer-generated visuals. The article is saying that they can do quite a bit more than they can do now if someone would just write some better drivers for them.
Now, streaming real-time rendering images over the internet? Maybe not fullscreen stuff right now because of a multitude of hampering factors on affordable internet bandwidth which I won't name for clarity's sake, but for the limiting factor to be the internet itself and not the graphics card is still a significant step.
This would definately be very beneficial to low-budget game developers and movie directors. We could very well see the return of the shareware boom (remember the early-mid 90's?) because of this.
sure, only a small portion of the people who'd buy the cards would use these features that the article talks about, but they'd be people that didn't have that capability before. Whenever this happens in any medium/artform/what-have-you, there is the tendency for a lot of experimental stuff to appear. I think we have some very interesting times ahead of us if someone gets these drivers written.
Just another freak in the freak kingdom.
Well those are great examples, but I think you have to draw the line somewhere. There's so many neat things you can when you can read back data quickly, but is it really worth the trouble?
Now that cards have abitrary dependant texture reads, doing warps for IBR right in the hardware is a real possibility. Also, the latest 3D cards can push upwards of 100 million triangles, enough to render that 82 million triangle scene in realtime (assuming some basic LOD and occlusion culling).
Read backs are going to become more and more irrelevant in the future. Try looking at the Moore's law doubling times for GPU speed vs. bus bandwidth. Since AGP was introduced, the speed has doubled 3 times, with AGP 8x just becoming available now. On the other hand, it only takes 3D hardware a year and a half to achieve the same speed jump. As time passes, the size of the bus compared to the amount of data being processed by the GPU will only become smaller.
I think treating the graphics bus as a one way street is inevitable, so we might as well accept it and learn take advantage of it.
If you read the AGP spec, which was written by Intel, you will note that it is based on the PCI 2.0 spec. The PCI 2.0 spec is for a 32 bit, 33 MHz symmetric bus which gives you a max transfer of rate of 132 MB per second. The AGP spec is for an asymmetric bus, 33 MHz read and 66+ MHz write. But writes were optimized at the expense of reads, since Intel was pushing video with NO onboard texture memory, and who would want to read back the image in real-time anyway, right?!?
Yes, I am sure that drivers do have some affect, but the AGP spec is the first bottleneck. On an OpenGL news group it was reported last year that a person tested two identical video cards, the only difference being one was AGP and the other was PCI. The read performance for the PCI version was several times faster than the AGP version.
Of course, some video cards are also to blame because of the frame buffer format they use, but that is another story...
Actually, my scenario is more like:
I use my expensive GFX card to render shots for my incredibly innovative but poorly funded sci-fi flick. I want to grab each frame in perfect detail so it can be post-processed. The easiest and cheapest way to do this is to have the renderer save each frame as it's computed. Real-time is not an issue, just like it's not an issue with a raytracer or whatever.
It better become feasable if companies are going to want renderfarms based on the nv30/40/whatever. Having two seperate machines per renderer would be pretty.. dodgy