Slashdot Mirror


AGP Texture Download Problem Revealed

EconolineCrush writes "The latest high-end graphics cards are capable of rendering games at 1600x1200 in 32-bit color at jaw-dropping frame rates, but that might be all they're good for. For all their gaming prowess, all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak. This article lays it all out, testing video cards from ATI, Matrox, and NVIDIA, and clearly illustrates just how bad the problem is. While these cards have no problems rendering images to your screen, you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."

20 of 265 comments (clear)

  1. Um, this is a surprise? by Yarn · · Score: 4, Informative

    I'd certainly expect the AGP bus to be used asymmetrically, how often do you want to do high speed data capture with a card that's primarily output?

    The only situation I can see where you'd want more than PCI bandwidth returning would be for uncompressed HDTV capture, and there are better ways to do that (grab the raw broadcast stream for example)

    --
    -Yarn - Rio Karma: Excellent
    1. Re:Um, this is a surprise? by Mike+Connell · · Score: 5, Interesting

      There are actually some good reasons to be able to do this apart from just taking screenshots. I did (sad but true) these tests over 4 years ago finishing grad school, and the results (read back speed is very bad) were much the same.

      Two reasons for wanting to grab the framebuffer (or parts of it) are for

      a) texture imposters (realtime adaptive billboarding) and
      b) split world/image-space occlusion culling.

      With faster readback, both these techniques would probably be used more in "normal" software (ie games).
      0.02

  2. Software issue? by larien · · Score: 5, Informative
    From the article, the author reckons this is a software (driver) issue rather than a hardware issue. I also note the test rig ran Windows, but how does linux shape up? Is it better or worse?

    In any event, there's another issue he doesn't really touch upon; while he mentions that a single frame at 1600x1200@32bit colour is 7.5MB, he ignores the fact that a 30fps movie would require (30*7.5)=225MB per second uncompressed; you either have to have that much disk bandwidth or have enough CPU grunt to compress that on the fly. I guess a dedicated MPEG encoder card could help, but your average box is going to have trouble keeping up with on-screen gibs, rocket trails and blood splatters and encoding video.

  3. Imagine That by mosch · · Score: 5, Insightful
    Wow, what a surprise. Video cards being built on ultra-thin margins are only being designed for the use that 99.99% of the population wants to use them for. You'd think with their huge 4% and 5% profits they'd add in lots of features that only a very few people want, just in case!

    In summary, who the fuck cares?

    1. Re:Imagine That by epine · · Score: 4, Insightful


      This is exactly the attitude that creates endless headaches mapping good concepts onto workable implementations, and results in systems becoming so convoluted by the time they work properly they are nearly impossible to maintain.

      The principle of least surprise dictates that random orders of magnitude should not be sacrificed in your fundamental primitives.

      It seems to me that if I spend $300 on my CPU and $600 on my GPU that I might want to be able fetch back what the GPU creates. What kind of idiot puts their most powerful processor at the end of a one way street?

      There are endless reasons that could come up why this feature might need to be exploited. Just because you can't come up with them doesn't mean they don't exist. You are talking about 99.9 percent of your own creativity, which I assure you is a far sight less that the sum total of the creativity out there looking for cool new things to do.

      It does make sense to consider cost/benefit here. The first observation here is that we are talking about a baseline primitive (texture returned to system memory), and that we are looking to recover a rough factor of ten, not a rough factor of 10 percent.

      In the video card industry, things are designed to hit the 90 percent point. These days the GPU industry rivals the CPU industry in dollar value. I simply can't believe the graphics card companies can't afford to have someone sit down and crank this up to 50% bus utilization. I suspect they could do this without even scratching their head.

      I've had to use many primitives over the years designed by this guy or his second cousin. If he only knew how much of the pain he experiences as a computer user is the result of good people bending over backwards to deal with unsuspected, arbitrary constraints when they could have been polishing the product interface instead. But some people have no imagination for these things.

  4. It's not the cards by tmark · · Score: 5, Insightful

    all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak...you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."

    As the quoted article clearly indicates, the problem lies with the drivers and not with the cards, the latter which the original poster intimates.

    And the underlying reason is immediately understandable: after years of AGP cards and years of noone really complaining raising this issue - (except, now, developers of video-editing software who could benefit) - it seems clear that there isn't much demand for this kind of performance. In the (near ?) future there might be, but why should these companies spend money working on driver performance in areas like this when really customers only care about how well Quake will run ?

    When people are willing to pay for these features is when companies will pay to build the requisite drivers. And that is how it should be.

    1. Re:It's not the cards by zenyu · · Score: 4, Interesting

      I had to switch an application from a screaming PC to a chunky old SGI we now use for a stool because of this problem. We eventually found an expensive graphics card that could keep up. I think it was called Wildcat something or other. We were getting free Quatro 3's at the time which we really wanted to use, but they just had horrible memory read rates. The nVidia guy told us it was an unoptimized path, using software with no hardware support or something. Like maybe they were reading a pixel at a time or something.

  5. Huh... by Viking+Coder · · Score: 4, Interesting

    If I'm reading this article right, they're claiming that it also hinders normal screen captures.

    That would mean that software like VNC would have much higher performance, if the drivers were updated, the way these guys are demanding. (Wouldn't it?)

    That'd be fantastic!

    --
    Education is the silver bullet.
  6. Re:Hmm. by MagPulse · · Score: 4, Informative

    This would affect everyone in a different way though. TV stations and production sets, even public access TV, along with low budget movies, would be able to use their PCs with a Radeon 9700 or NV30 card to produce their content. They could not only reproduce many of the effects from movies like Toy Story (notably excluding ray tracing), but do it in real-time for instant feedback, meaning much much faster production cycles. This has the potential to make a big impact.

  7. Might this be intentional? by seldolivaw · · Score: 4, Insightful

    I know nothing about anything, obviously, but I can see that game designers might think it nice to be able to send stuff to your screen but for you to be unable to send it to storage somewhere.

    This *is meant to be* a dumb question. Mod me down if I'm wrong; it's only Karma.

  8. Re:128 bit colour? by Viking+Coder · · Score: 5, Interesting

    If you're doing multi-pass rendering, it might be extremely convenient to capture the results back to main memory. Especially if the board doesn't have enough texture memory to support all of your temporary buffers.

    And boards are starting to ship with 128-bit IEEE floating point buffers.

    Essentially, you're right - a human can't tell the difference beyond 24-bit on a given image. But if 100 images were composited together (very likely, to support something like RenderMan-style rendering in hardware), 24 bits is nowhere near enough - you'd get all sorts of accumulation error.

    --
    Education is the silver bullet.
  9. Is it me, or is the author smoking crack? by JackAsh · · Score: 4, Insightful

    A couple of salient points come to mind when reading this article:

    1) Recording games/presentations/etc. The reason why we don't do it is because if the system was capable of generating it real time in the first place, it's far less space intensive to record the parameters of the animation than the output. i.e. It's cheaper to say "Daemia fires rocket at these coordinates" than record an MPEG of said rocket shot. AND, as hardware gets better, your recording does too.

    Which leads me to point 2:

    2) Since it's cheaper to capture realtime animation by capturing parameters, the only use of the capture function would be NON-realtime applications - i.e. getting your Geforce5TiUltraPro to render an extremely complex scene with incredible realism at 1 fps. That's not a typo. If we have 10MB/s back-into-the-PC bandwidth and each super high resolution shot takes 10MB on average, we have a wonderful solution working at 1 fps. Spend the fill rates on 600 passes for each pixel or something like that. Imagine the quality of the scenes! Capture the damn things and be glad you're not rendering at 1 frame per hour like they were 5 years ago.

    Repeat after me - if you're rendering for posterity you don't need real time... That'll come eventually.

    -JackAsh

  10. One of the worst technical articles.... by grahamtriggs · · Score: 5, Interesting

    ...that I have ever read. Either that, or I am missing something here... The idea that graphics subsytems have 'bandwidth to burn' is kind of ironic, given that every graphics chip is ultimately held back in performance by the amount of bandwidth available to it - especially when using high quality options like anti-aliasing. The main focus of the article is actually a very niche segment... the idea of transeferring back rendered images over the AGP bus for TV / film / etc. is a joke... Rendering at high quality takes a huge amount of bandwidth (ie. textures and geometry)... as someone else pointed out, transferring back high-res images would take up over 200MB - that's a quarter of your AGP bandwidth! And without taking into account contention and timing issues in uploading/downloading that would mean that you simple couldn't realise the full potential of the bandwidth without a lot of other (expensive?) hardware... The simple fact is that for production uses, you would be *far* better off taking a stream of data from the DVI connector, and storing that for later use... Screen capture for business use is a reasonable point - however when does that require 3d rendering to be taking place? There should be no contention and no reason why the AGP bus couldn't be utilised fully - although would the graphics companies make enough out of this to justify the effort? As for internet streaming - how many people have access to bandwidth fast enough for high quality, full screen video streaming? Enough said...

  11. Re:128 bit colour? by fingal · · Score: 4, Informative
    If you want to display a gradient from say, dark blue to light blue, you have quite a few shades of blue to choose from. More than 1024, that's for sure, especially in 32 bit color. But your monitor can only display 1024 vertical lines, each being a different shade. (Depending on your resolution, blah, blah, blah.)

    Hmmm. Close but still not quite right. Think of the colour space as a cube with RGB as the three axis of the cube. In 32bit colour you have 8 bits per colour plane, giving you a cube that is 256 x 256 x 256. Any gradient from any point on the cube to any other point on the cube is going to be a maximum of 443 (if my maths is freaked out - distance from two opposite corners of the cube). Plus some messing about with the various quantisation that this line will pass through gives you definite banding on all but the lowest resolution displays...

    --

    The only Good System is a Sound System

  12. Ray Tracing on the GPU by eeeeaagh · · Score: 5, Interesting
    We just ran into this problem when implementing a ray tracer using the GPU that will be presented soon at the upcoming Graphics Hardware Workshop.

    Our ray intersection algorithm implemented on the GPU (an "old" Radeon 8500) was able to intersect 114M rays per second. This was loads faster than the best CPU implementation, which could handle between 20 and 40 intersections.

    But when we tried to implement a ray tracer based on this, and an efficient one that didn't intersect every ray with every triangle, the readback rate killed us. Our execution times slowed down to the low end of the fastest CPU implementations.

    And the readback delay seems to be completely due to the drivers, which apparently still use the old PCI-bus code. If the drivers could use the full potential of the AGP bus, our ray tracer could approach twice the speed of the best CPU ray tracers.

  13. Yes, but... by Anonymous Coward · · Score: 5, Informative
    a) texture imposters (realtime adaptive billboarding)

    That's what render-to-texture is for, you don't need to read data back to the CPU.

    b) split world/image-space occlusion culling.

    This wouldn't be too useful for realtime graphics anyways, because of the way the 3D graphics pipeline works. The CPU can already be processing data a few frames ahead of what the GPU is currently working on. If you read back data from the card every frame, you have to wait for the GPU to finish rendering the current frame before you can start work on the next one.

    1. Re:Yes, but... by Mike+Connell · · Score: 5, Informative

      That's what render-to-texture is for, you don't need to read data back to the CPU.

      That is true for simple versions, but with methods moving towards image based rendering you often have to pull the data back anyway. Then you can process the textures to produce better imposters - not necessarily just billboards

      Re: occlusion culling. People are using these methods today for realtime graphics (for example combinations of Greens HZB, or HOMs) even with the low readback speed. UNC's Gigawalk software is one published example (Google for it). Getting Z or alpha channel infomation back is the biggest hit, so these methods would be even more efficient and so more widley applicable with faster transfers. When you're rendering N million triangles per frame (UNC quote 82Million) you have to do this stuff to get realtime rendering.

      So it is used for realtime graphics today - although mainly for heavy duty applications not games.

      HTH

  14. Perhaps... by ColGraff · · Score: 5, Insightful

    "What kind of idiot puts their most powerful processor at the end of a one way street?"

    Maybe they're the kind of idiots who know most people just want the best possible OUTPUT for gaming possible, and so don't want to add any overhead in card performance - or even additional design time - that isn't related to gaming performance. You know, the idiots who make cards that get award after award from gaming companies, then write near-perfect drivers, port those drivers to linux, and let you overclock the card to your heart's content. Those sort of idiots. My, they're idiotic.

    Nobody says, "buy a geforce 4 ti, make the next toy story." No, it's advertised as a gaming card, and that's what its designed to do. If you want to do high-end video rendering things, perhaps a gaming card isn't the best choice.

    --
    I'm the stranger...posting to /.
    1. Re:Perhaps... by gspeare · · Score: 4, Funny

      Hey, I just realized that my high-end printing device has absolutely no hardware provision for reverse-direction printing! If I want to take the high quality document I just printed and put it back into electronic form, I have to spend hundreds of dollars* for a completely separate "scanning" device! What a ripoff!

      Really, as soon as the market for this sort of capture starts to grow, someone will have a hardware solution. The first ones will be cheesy: a connecter into a separate PCI capture card, for example; but eventually a more reasonable method will become standard design.

      To me, this is just the free market in action, working (more or less) as it should be.

      * I know how much scanners cost. Think hyperbole. :)

  15. This is old news; Intel AGP spec was short sighted by PhilFrisbie · · Score: 4, Interesting
    This has been discussed many times on various news groups. Here is my 'Readers Digest' version:

    If you read the AGP spec, which was written by Intel, you will note that it is based on the PCI 2.0 spec. The PCI 2.0 spec is for a 32 bit, 33 MHz symmetric bus which gives you a max transfer of rate of 132 MB per second. The AGP spec is for an asymmetric bus, 33 MHz read and 66+ MHz write. But writes were optimized at the expense of reads, since Intel was pushing video with NO onboard texture memory, and who would want to read back the image in real-time anyway, right?!?

    Yes, I am sure that drivers do have some affect, but the AGP spec is the first bottleneck. On an OpenGL news group it was reported last year that a person tested two identical video cards, the only difference being one was AGP and the other was PCI. The read performance for the PCI version was several times faster than the AGP version.

    Of course, some video cards are also to blame because of the frame buffer format they use, but that is another story...