AGP Texture Download Problem Revealed
EconolineCrush writes "The latest high-end graphics cards are capable of rendering games at 1600x1200 in 32-bit color at jaw-dropping frame rates, but that might be all they're good for. For all their gaming prowess, all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak. This article lays it all out, testing video cards from ATI, Matrox, and NVIDIA, and clearly illustrates just how bad the problem is. While these cards have no problems rendering images to your screen, you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."
I'd certainly expect the AGP bus to be used asymmetrically, how often do you want to do high speed data capture with a card that's primarily output?
The only situation I can see where you'd want more than PCI bandwidth returning would be for uncompressed HDTV capture, and there are better ways to do that (grab the raw broadcast stream for example)
-Yarn - Rio Karma: Excellent
Maybe you should have read the article? The point is that the slow transfer rate from the card TO the PC's RAM means that capturing video (or recording a gaming session for playback later) is severely hampered.
To be honest though, most people buy a GF4 to play games, not capture video.
Code, Hardware, stuff like that.
In any event, there's another issue he doesn't really touch upon; while he mentions that a single frame at 1600x1200@32bit colour is 7.5MB, he ignores the fact that a 30fps movie would require (30*7.5)=225MB per second uncompressed; you either have to have that much disk bandwidth or have enough CPU grunt to compress that on the fly. I guess a dedicated MPEG encoder card could help, but your average box is going to have trouble keeping up with on-screen gibs, rocket trails and blood splatters and encoding video.
"However, no manufacturer has presently made this aspect of driver performance a priority." .. Will it justify the cost
Why should they, was anybody complaining till now. The well wont come to horse, the horse has to go to the well to drink water.
So unless a large number of people want it nobody wants to mess around with a perfectly working driver.
And it is not a piece of cake. Recording its own rendrings the software way would be a bitch, the best way would be to provide an access point on the bus itself, though it would play havoc with the board timings and noise issues.
In the end it will call come down to
My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
FB : https://www.facebook.com/TanveersPhotography
In summary, who the fuck cares?
I work with data much, much larger than 128 MB. If the board had 2 GB of memory, I'd use it.
Not everyone is using their video card to play Quake. =) (Although, I do that, too.)
Education is the silver bullet.
all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak...you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."
As the quoted article clearly indicates, the problem lies with the drivers and not with the cards, the latter which the original poster intimates.
And the underlying reason is immediately understandable: after years of AGP cards and years of noone really complaining raising this issue - (except, now, developers of video-editing software who could benefit) - it seems clear that there isn't much demand for this kind of performance. In the (near ?) future there might be, but why should these companies spend money working on driver performance in areas like this when really customers only care about how well Quake will run ?
When people are willing to pay for these features is when companies will pay to build the requisite drivers. And that is how it should be.
The article presents that once the images are rendered out to the display, they are simply discarded. Sure, for any sort of video capture or whatnot, that sucks. However, the article does not attempt to answer why video card manufacturers do this, or if there are any cards that do take advantage of the AGPx4 bandwidth. My guess is cost. If all AGP video cards provided video feedback into the bus, you're probably looking at a non-consumer level product. And you know what? All I do IS use my GeForce to play video games. If dumping the frames after they are rendered keep the cost of my card down, I'm probably happier for it. Quite simply: Does this matter for the average consumer?
If I'm reading this article right, they're claiming that it also hinders normal screen captures.
That would mean that software like VNC would have much higher performance, if the drivers were updated, the way these guys are demanding. (Wouldn't it?)
That'd be fantastic!
Education is the silver bullet.
Huh? Why on earth would they want 128-bit colour. AFAIK the human eye can't tell the difference beyond 24-bit,
Yes, human eye can't go beyond that, but any decent processor can. And image should be processed after being grabbed from screen, for example divx:ed, or something.
if you don't know why scanners grab images at more than 8it/channel then..
fucktard is a tenderhearted description
Using floating-point luminosity values eliminates a variety of clipping artifacts which otherwise appear close to light sources.
Tarsnap: Online backups for the truly paranoid
This would affect everyone in a different way though. TV stations and production sets, even public access TV, along with low budget movies, would be able to use their PCs with a Radeon 9700 or NV30 card to produce their content. They could not only reproduce many of the effects from movies like Toy Story (notably excluding ray tracing), but do it in real-time for instant feedback, meaning much much faster production cycles. This has the potential to make a big impact.
Not that this has anything to do with the article in question, but fast ram on the video card is essential if you're going to play games in hi-res.
The AGP bus can't supply data / textures fast enough to a modern GPU/VPU. Both the bus and the main memory is way to slow. Some business pcs uses shared video and main memory. It works ok for most 2D apps, and will even allow you to play DVDs or streamed video. For games; forget it.
- Ost
---- Sig. gone.
I had the same reaction, so I checked it out. Apparently 128-bit internal processing is useful when doing many stages of texturing and effects, because while 8 bits per color is typically fine for humans, some of that resolution is invariably lost during processing.
However, there's NO REASON I can tell why you'd actually want to grab 128-bit color rendered frames! They could be dithered to 24 or 32 bit without losing anything visible.
I know nothing about anything, obviously, but I can see that game designers might think it nice to be able to send stuff to your screen but for you to be unable to send it to storage somewhere.
This *is meant to be* a dumb question. Mod me down if I'm wrong; it's only Karma.
If you're doing multi-pass rendering, it might be extremely convenient to capture the results back to main memory. Especially if the board doesn't have enough texture memory to support all of your temporary buffers.
And boards are starting to ship with 128-bit IEEE floating point buffers.
Essentially, you're right - a human can't tell the difference beyond 24-bit on a given image. But if 100 images were composited together (very likely, to support something like RenderMan-style rendering in hardware), 24 bits is nowhere near enough - you'd get all sorts of accumulation error.
Education is the silver bullet.
Way back when I was working on libfbx, we (the two main libfbx developers) learned of a 48-bit framebuffer developed by SGI. It's used mainly to render special FX for Hollywood. After several composited layers with various effects on an 8-bit per channel system, you can really start to notice the quantization artifacts. Moving to 12- or 16-bits per color channel (depending on whether there's an alpha channel) makes a huge improvement. I've never heard of any 16 byte per pixel (128bit) image format. It'd probably be something like 16-bits per channel RGBA (64), plus 32-bit depth buffer (96), plus 16-bit stencil and select(pick) buffers (128).
A solution to the problem with music today
I wouldn't use one of these cards to capture video though. I can't see why most people would, actually. The Matrox cards might be an exception. Quadro is a CAD/CAM card. These are just consumer grade cards. They buffer and write video directly to the hard disk. Real video editing hardware works differently, but even they often have several gigs of onboard RAM.
So really, I guess that I meant to say that I fail to see the relevance of the article. It is kinda of silly, actually, to even want to record real-time game footage with this hardware. Just pipe the video output to a real capture card on another machine. Problem solved.
If I'm reading the article correctly, they're claiming that you can barely get 30 frames per second, full-screen. If you want to do a diff, and send the delta, you potentially need to be able to capture the full screen to do it. If you can only capture at 30 frames per second, you are LOCKED at 30 frames per second, even if you try to compress the output, and send only deltas.
Education is the silver bullet.
A couple of salient points come to mind when reading this article:
1) Recording games/presentations/etc. The reason why we don't do it is because if the system was capable of generating it real time in the first place, it's far less space intensive to record the parameters of the animation than the output. i.e. It's cheaper to say "Daemia fires rocket at these coordinates" than record an MPEG of said rocket shot. AND, as hardware gets better, your recording does too.
Which leads me to point 2:
2) Since it's cheaper to capture realtime animation by capturing parameters, the only use of the capture function would be NON-realtime applications - i.e. getting your Geforce5TiUltraPro to render an extremely complex scene with incredible realism at 1 fps. That's not a typo. If we have 10MB/s back-into-the-PC bandwidth and each super high resolution shot takes 10MB on average, we have a wonderful solution working at 1 fps. Spend the fill rates on 600 passes for each pixel or something like that. Imagine the quality of the scenes! Capture the damn things and be glad you're not rendering at 1 frame per hour like they were 5 years ago.
Repeat after me - if you're rendering for posterity you don't need real time... That'll come eventually.
-JackAsh
The day i see a gradient on my computer screen without visible "banding" is the day we have reached a high enough color depth...32-bits is simply not enough.
Last time i checked, my eye was a human one.
You think the **AA would ever allow this the ability to make a perfect digital copy of what ever is displayed on you screen. Now your monitor will have to be disabled every time a copyrighted work is displayed on your screen.
A stunning example of stating the obvious.
The hardcore 3D gamer market is small enough; I can't see manufacturers busting their humps to serve an even smaller one.
...that I have ever read. Either that, or I am missing something here... The idea that graphics subsytems have 'bandwidth to burn' is kind of ironic, given that every graphics chip is ultimately held back in performance by the amount of bandwidth available to it - especially when using high quality options like anti-aliasing. The main focus of the article is actually a very niche segment... the idea of transeferring back rendered images over the AGP bus for TV / film / etc. is a joke... Rendering at high quality takes a huge amount of bandwidth (ie. textures and geometry)... as someone else pointed out, transferring back high-res images would take up over 200MB - that's a quarter of your AGP bandwidth! And without taking into account contention and timing issues in uploading/downloading that would mean that you simple couldn't realise the full potential of the bandwidth without a lot of other (expensive?) hardware... The simple fact is that for production uses, you would be *far* better off taking a stream of data from the DVI connector, and storing that for later use... Screen capture for business use is a reasonable point - however when does that require 3d rendering to be taking place? There should be no contention and no reason why the AGP bus couldn't be utilised fully - although would the graphics companies make enough out of this to justify the effort? As for internet streaming - how many people have access to bandwidth fast enough for high quality, full screen video streaming? Enough said...
Yeah, I know it's fun to bash Microsoft and hint that your OSOS (Open Source Operating System) of choice would do better, but the drivers in question here are not Microsoft drivers. They're vendor-supplied drivers which would probably use 90% common code and have 99% of the same problems on any OS.
Slashdot - News for Herds. Stuff that Splatters.
Once, definitely. Twice, probably. Thrice, perhaps.
You typically composite and re-composite layer after layer to create decent effects, it's not a one-shot thing. Certainly professional video runs at ~48bit for film work.
Simon
Physicists get Hadrons!
Yeah, personally I'm a fan of 1 bit color :)
Beware: In C++, your friends can see your privates!
When you record video it is normally compressed by hardware or a DSP. They are compressed for a damn good reason.
Uncompressed, say just 1600x1200x24bit is about 6Mb per frame. At say 70 frames/sec is about 420Mb a second to store to disk.
So what exactly are you going to do with that much data? If you had 512Mb of ram you could hold 1 seconds worth.
Forget a hard disk, even a 3 disk raid doesn't have that sustained IO rate.
flaimbait much?
First off there is no such thing as 32-bit color. Its 24-bit color with either a padding octet or an alpha channel.
Second, 256 levels is enough that provided a good monitor you can make due quite well.
Third, flamebait much?
Tom
Someday, I'll have a real sig.
Um. You get banding because of pixelation, not because of a lack of colors to choose from. Maybe it would help if you knew what you were talking about?
If you want to display a gradient from say, dark blue to light blue, you have quite a few shades of blue to choose from. More than 1024, that's for sure, especially in 32 bit color. But your monitor can only display 1024 vertical lines, each being a different shade. (Depending on your resolution, blah, blah, blah.)
Therefore, you get banding. Go ahead, use 64 or 128 bit color. It'll help, in the 'it won't help at all' sense.
However, linux's open source nature at least gives people a chance to tweak the system to provide that advantage if it isn't there already; it may cause some interesting developments in linux graphics.
Not 30 frames a second. 8 frames a second assuming you don't use a larger resolution.
you might not use your 3d graphics card to capture video, but if you wanted to edit video, it means you can't use your matrox to hardware accelerate 3d wipes/transitions/color transformations. Which is a bit of a shame really. Equivalent to having one sitting in your machine but having to play Doom3 through a software renderer in the frustrating stakes.
Of course if your editing video on the cheap, you probably go for something slightly more dedicated like the Matrox RT2500 anyway, which is not that much more expensive.
How about the obvious for video production... since going out isn't a problem... why not just hook up a recording device (could be digital media) to the video out port of the video card.
Does this really have to be over-engineered?
Skiers and Riders -- http://www.snowjournal.com
Hmmm. Close but still not quite right. Think of the colour space as a cube with RGB as the three axis of the cube. In 32bit colour you have 8 bits per colour plane, giving you a cube that is 256 x 256 x 256. Any gradient from any point on the cube to any other point on the cube is going to be a maximum of 443 (if my maths is freaked out - distance from two opposite corners of the cube). Plus some messing about with the various quantisation that this line will pass through gives you definite banding on all but the lowest resolution displays...
The only Good System is a Sound System
Our ray intersection algorithm implemented on the GPU (an "old" Radeon 8500) was able to intersect 114M rays per second. This was loads faster than the best CPU implementation, which could handle between 20 and 40 intersections.
But when we tried to implement a ray tracer based on this, and an efficient one that didn't intersect every ray with every triangle, the readback rate killed us. Our execution times slowed down to the low end of the fastest CPU implementations.
And the readback delay seems to be completely due to the drivers, which apparently still use the old PCI-bus code. If the drivers could use the full potential of the AGP bus, our ray tracer could approach twice the speed of the best CPU ray tracers.
If the drivers are truely the only issue and not the hardware, wouldn't this be a great opportunity for the XF86 guys and whoever writes the particular tdfx modules to optimize Linux first.
"No Mr. Vallenti sir you don't understand we have to use Linux. It's the only game out there for our CG budget. Windows can't do RAM write back with decent FPSes, and commodity GPU's are 20 times cheaper..."
Wouldn't that suck for them... at least it would be amusing.
Novel theory: Modern Man evolved from psychopath
I can think of several reasons:
- The company hasn't released the game yet, but wants to release a video of gameplay to the public. Current methods would require implementing a "save game as it goes" and then a "replay, in offline rendering mode at a steady frame rate, and record results" pass. Or, you could save it at reduced quality if you had video out on your computer and video in on another computer.. but that's just ridiculous, imo.
- Likewise, you have the game, and your friend hasn't purchased it yet, and lives too far away to just take a glance at it..
- You're having a graphical glitch in a game with your particular card that can't be easily illustrated with screenshots. Think how much easier it would be to just send a video clip than having to send a half-dozen screenshots and a wordy explanation, where they still might not believe you.
- You have a Radeon9700, he has a Geforce2. You want to show him how different Doom III looks on your card, as opposed to his card, in real time.
Etc..
Darn you to heck for making me try to think in 3d. ;p
;p
;p
Yes, I'm pretty sure you're more or less right on the 443, though I would have expressed it as ~400, due to the fact that I don't like niggling with triangles.
The thing is, you get more shades of blue than just the 443. As 255 RGB values, shades of red can be
255 0 0
255 1 0
255 1 1
255 0 1
255 2 0
255 2 1
255 2 2
255 0 2
255 1 2
(I say red now because I put the 255's first, and don't want to write it again.)
And so on. Each resulting in a different shade of blue.
*I think* anyway. We're wandering off the pier of stuff I know, into the stuff I think I might be able to figure out.
So, I think you'd get more than 443, and have more blue than monitor lines, still.
That's what render-to-texture is for, you don't need to read data back to the CPU.
b) split world/image-space occlusion culling.
This wouldn't be too useful for realtime graphics anyways, because of the way the 3D graphics pipeline works. The CPU can already be processing data a few frames ahead of what the GPU is currently working on. If you read back data from the card every frame, you have to wait for the GPU to finish rendering the current frame before you can start work on the next one.
Very few people use their typical desktop video cards for actual video production or anything related to it because the hardware up until now was simply unable to handle that sort of load. Now we have these cards that are the beginning of a new era of computer-generated visuals. The article is saying that they can do quite a bit more than they can do now if someone would just write some better drivers for them.
Now, streaming real-time rendering images over the internet? Maybe not fullscreen stuff right now because of a multitude of hampering factors on affordable internet bandwidth which I won't name for clarity's sake, but for the limiting factor to be the internet itself and not the graphics card is still a significant step.
This would definately be very beneficial to low-budget game developers and movie directors. We could very well see the return of the shareware boom (remember the early-mid 90's?) because of this.
sure, only a small portion of the people who'd buy the cards would use these features that the article talks about, but they'd be people that didn't have that capability before. Whenever this happens in any medium/artform/what-have-you, there is the tendency for a lot of experimental stuff to appear. I think we have some very interesting times ahead of us if someone gets these drivers written.
Just another freak in the freak kingdom.
Also remember that the figure of 443 is the theoretical maximum number that can be achieved. Most interpolations will be from two points in the colour cube that are much closer together and will therefore result in correspondingly worse artifacts.
The only Good System is a Sound System
"What kind of idiot puts their most powerful processor at the end of a one way street?"
Maybe they're the kind of idiots who know most people just want the best possible OUTPUT for gaming possible, and so don't want to add any overhead in card performance - or even additional design time - that isn't related to gaming performance. You know, the idiots who make cards that get award after award from gaming companies, then write near-perfect drivers, port those drivers to linux, and let you overclock the card to your heart's content. Those sort of idiots. My, they're idiotic.
Nobody says, "buy a geforce 4 ti, make the next toy story." No, it's advertised as a gaming card, and that's what its designed to do. If you want to do high-end video rendering things, perhaps a gaming card isn't the best choice.
I'm the stranger...posting to
First, Matrox and 3dLabs are both shipping products that do 10r-10g-10b-2alpha color.
Second, the poster wants to do more than "make due". You can also make due with 16 colors. And no, 256 levels is not enough, if you're compositing many images together, or if your data has a high dynamic range (which would require more gamma range than 256 levels are capable of providing, without serious banding.)
Third, pot. Kettle. Black.
Education is the silver bullet.
And I tend to agree that its a software issue.
NVIDIA says that if you ask for contents of the framebuffer in a call to glReadPixels and you ask for it in the same pixel units its stored in, you won't be really disappointed. If, however, you ask for that same region of the framebuffer in another format, you're screwed. (So, if your framebuffer is 8-8-8-8 RGBA, and you ask for luminance or 10-10-10-2 or something else odd, you aren't going to be pleased with the performance.)
This isn't by the way, just a render-movies-on-your-PC issue. Lots of scientific computing, visualization, etc., applications render with OpenGL and then grab the framebuffer to store a result. This throughput issue is significant considering that for many applications, what was an enormous data set 10 years ago is now not such a big data set. Like another poster said, this issue is one of the ones that still ties people to SGI.
While 99% of your other concerns might be dealt with, there are still lingering problems like this one that keep some people from moving to commodity hardware.
Outside of a dog, a book is a man's best friend. Inside a dog, its too dark to read.
Another benefit besides accuracy for multi-pass rendering with tens or hundreds of passes, is that it allows for high dynamic range rendering. 128 bits is enough to encode candlelight and daylight in the same floating point number. So the game engine can just "count up photons" as Carmack says in his recent speech, and then the 128-bit passes are done, then the final pass samples it down to 32-bit for presentation on the monitor. This allows the downsampling to take advantage of any information available on the the monitor's gamma curve - what the actual displayed intensity is for a given value. It also lets programmers give up one level of fudging and simply do physically correct lighting calculations, since they can leave the presentation issues to the final downsample.
I've been doing real-time 3D graphics for 10 years and read-back speeds have been the biggest problem for doing many advanced algorithms. We have asked the companies to improve this many times. The problem as I see it: Quake and other benchmark apps don't rely on readback. ./ may even have run a link to one of these techniques a while back.
Here are a few other important but non-Quake techniques that are driven by readback speeds. I'll go into more detail on the first for illustration purposes.
High-quality real-time occlusion culling -- many techniques render the scene quickly by using a unique color tag per object or polygon and then read back the framebuffer to figure out everything that was visible (and how many pixels for each) for a final high-quality pass. If HW drivers would even just implement the standard glHistogram functions (which essentially compress the framebuffer before readback), this would become practical. NVidia adds their NVOcclusion extension, but it's limited in how many objects at a time you can test, it's very asynchronous, and it requires depth sorting on the CPU to make it most useful. The render-color technique does not. Yet HW makers are spending lots of money adding custom HW to do z-occlusion when a simple driver-based software technique may be easier.
Dynamic Reflection Maps -- for simple, reflective surfaces -- Requires background rendering from multiple POVs (generally six 90 degree views) and caching these. Even if you can cache a small set of maps in AGP memory, you want fast async readback if you have a large fairly static scene and you're roaming around.
Real-time radiosity -- similar to above, but needs more CPU processing of the returned images and possibly depth maps (reading back the depth buffer is often even more expensive than the color).
Real-time ray tracing -- the better quality approaches need fast readback to store intermediate results (due to recursion, etc..). With floating point framebuffers and good vertex/pixel shaders, ray-tracing becomes possible, but not yet practical. I believe
So there's a lot more to this issue than just making movies of your games. Faster, better graphics would be possible. So why isn't this a priority?
------------ cyranose@realityprime.com
The article claims that the drivers, not the HW, are causing the performance problem. Based on my conversations with a premier graphics programmer and some x86 experts, I don't believe that it is this simple. In particular, note that XFree86 2D, which uses its own drivers, also has pathetic readback rates.
I barely understand the technical details, but it seems like there are some serious misfeatures in the way that the AGP bus interacts with CPUs and caches on both Intel and AMD during readback; it is going to be hard for card vendors to fix this problem (even if they decide to care). It may be that a new bus and/or new CPU glue will be needed for high-readback-rate applications.
My card will ouput the same image to its VGA & TV ouputs at the same time.
Surelly simply by connecting the S-video output to a VCR while playing quake through the monitor should do the trick.
Capturing what you do in the average FPS would be silly, but what if you're doing 3D rendering with your graphics card? What you propose would be like ripping CDs by plugging a CD player into your soundcard's line-in jack. What the article envisions would be more like ripping CDs with EAC...you eliminate the digital-to-analog-to-digital conversion.
20 January 2017: the End of an Error.
in the article it shows how they benchmarked consumer cards like geforce4 and radeon 8500. they also benchmarked some "entry level" high end cards like the quadro4 750 gxl, parhelia, and radeon 9700. i am not sure about the radeons, but i do know that the quadro4 is a different chip than the gf4, not just a card with extra features turned on.
all of the cards had the same problem.
you probably shouldn't have read this.
Nvidia writes their own Linux Driver. I'm using it, and it works great.
Yes, but are you downloading textures/frames from the card to main memory?
The issue here is whether it is possible to use the programmable GPU to render frames for use in animation projects. The various bandwidth problems appear to be associated with drivers optimized for immediate display.
With an open source driver, the few individuals running linux based rendering farms could, theoretically, relieve the CPU of some of its load. With closed source drivers, you will have to rely on nVidia optimizing their drivers for this kind of minority application.
OpenGL supports reading back the screen buffer mostly so that the OpenGL validation suite can check the rendering accuracy. For that, it doesn't have to be efficient. And if you read back in some format other than the actual structure of the framebuffer, every pixel gets converted in software and performance will be awful.
This article reads like it was written by an overclocker, not a graphics developer.
The nascent art of machinima, which involves using 3D game engines to make desktop movies, could benefit from a practical way to record game output faster. (It would also be nice to export directly to .AVI format for editing in Premiere or Avid, but that's another wishlist.)
Are you really saying that instead of simply fixing the software drivers, you should get a second high end computer capable of capturing video at real time rates? Man are you that stupid or just trolling badly?
Any gradient from any point on the cube to any other point on the cube is going to be a maximum of 443 (if my maths is freaked out - distance from two opposite corners of the cube)
The distance between opposite corners is about 443, but the diagonal distance between color points is 1.732, so you still have 256 points in the gradient.
Think about it this way, the gradient from (0,0,0) to (255,255,255) passes through (1,1,1), (2,2,2), etc. Exactly 256 points.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
Also, having the ability to render faster means that you can do it faster than real-time. If you are working to a deadline in a TV news studio, that might be a real advantage (think late-breaking news where a story has to be put together during a comercial break).
science is a religion
science is a religion
If you read the AGP spec, which was written by Intel, you will note that it is based on the PCI 2.0 spec. The PCI 2.0 spec is for a 32 bit, 33 MHz symmetric bus which gives you a max transfer of rate of 132 MB per second. The AGP spec is for an asymmetric bus, 33 MHz read and 66+ MHz write. But writes were optimized at the expense of reads, since Intel was pushing video with NO onboard texture memory, and who would want to read back the image in real-time anyway, right?!?
Yes, I am sure that drivers do have some affect, but the AGP spec is the first bottleneck. On an OpenGL news group it was reported last year that a person tested two identical video cards, the only difference being one was AGP and the other was PCI. The read performance for the PCI version was several times faster than the AGP version.
Of course, some video cards are also to blame because of the frame buffer format they use, but that is another story...
Comment removed based on user account deletion
Follow my reasoning here. I've heard from other articles at /. that Alan Cox (or one of the big name advocates) couldn't think of a reason to justify to NVidea to OpenSource their drivers. There would be no profit for them to do so.
But if they had, the drivers would have been updated to scratch whoever's itch needed to be scratched. In this case the bandwidth from card to Memory.
One of the benifits of Open source is that even seldom used features are enhanced, so that when suddenly there is a demand for this the features are in place.
It seems the lesson here is that proper captures from video RAM are slow. Yeah, it'd be nice to change that. But how many people really care? Given how long it took anyone to notice, I can't help but think that very very few people really care - and with good reason. Unless you're into making rendered movies, it's irrelevant.
Build stuff. Stuff that walks, stuff that rolls, whatever.
I asked nVidia at SIGGRAPH why image readback is so slow. They said, no motherboard they know of (not even their own) supports AGP Writes back to the system memory. Without that, you're limited to PCI bandwidth at best, far less than what the AGP spec allows.
However, we're not even seeing that. Results are showing 1% of what is possible. It's certainly a hardware issue, but there may be a lot of room to improve from the software side, too.
Why would anyone engrave "Elbereth"?
First, 10-r, 10-g, 10-b is pretty valuable to some people. I agree that 2-bit alpha is pretty miserable - but some people don't need to alpha blend. *shrug* I was just illustrating that there are color schemes in shipping products today, that use more than 24 bits for rgb.
Second, for those people that DO need to blend, they often need to blend 100s of images. You don't need to get out to 1000s of images to see these effects. Just because current standards for MPEG and JPEG don't allow more, that doesn't mean it's useless. And I'm talking more about generating PRman (RenderMan)-style graphics. One approach is to render many, many passes - decomposing the math down into 100s (1000s) of images. It adds up to visual artifacts, very quickly, unless you have extended bit depths.
Third, saying the first poster was posting flamebait - I was saying that what you were doing was a case of "the pot calling the kettle black." I was accusing you of posting flamebait. =)
Education is the silver bullet.
Why is it that a much more expensive Quadro card gives equally slow results? I've run a very similar test on an SGI 320 (shared-memory design) and it only gives 18.9 MB/s.
Anyone reading this with a Wildcat 6000-series? What does that bench at?
Why would anyone engrave "Elbereth"?
the kind of idiots who know most people just want the best possible OUTPUT for gaming, and so don't want to add any overhead in card performance - or even additional design time - that isn't related to gaming performance. You know, the idiots who make cards that get award after award from gaming companies, then write near-perfect drivers,
here it comes...
port those drivers to linux
Bingo!
The only problem is in the driver. Hardware's up to the job.
The driver has been ported to Linux.
So fix it!
Closed source? Reverse engineer it.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I spent most of the summer working on AGP driver bugs, so let me clarify a few things.
AGP was designed by Intel as an ad hoc solution to combat the problem of transferring large textures to a graphics card over the PCI bus. It's an extension to PCI, essentially, allowing fast, pipelined, ONE-WAY transfers. That should be repeated. AGP is PCI, with a different connector, and a bunch of extra pins and logic for pipelined transfers from system memory to the card. In fact, without "fast writes" enabled, CPU -> graphics card writes are plain PCI; only transfers requested BY THE CARD are accelerated.
There is nothing new about this. It's in the spec.
It is NOT meant to be a two-way bus. It it was never designed for offloading cinematic rendering to the card, for later recovery. AGP came out around 1997, before NVIDIA or ATI had shaders in hardware. PC rendering was nowhere near photorealistic at the time; that was the domain of software raytracers. Without AGP, video cards seriously hog the AGP bus with their texture streaming. That is ALL that AGP fixes.
The real solution is to come up with a new bus. I tend to like unified memory architecture designs, but they have disadvantages as well. The real trouble is getting the PC industry to agree on anything; if ATI came up with a new bus standard, for instance, I doubt NVIDIA or Matrox would adopt it, not wishing to appear to submit to their competitor.
-John
Someone build a bloody box with a DVI input and a gigabit ethernet port on it. Connect DVI out of video card to DVI input on our magic box, gigabit Ethernet on the box to gigabit ethernet on the PC. As each frame is generated, capture it and spew it back to the PC over the ethernet, then ask the custom software on the PC (via a packet from the magic box) to put the next frame over the DVI.
Lather, rinse, repeat.
Won't be cheap, but someone could almost certainly whip one up with a Xilinx FPGA. I know they make one with a built-in TMDS receiver, which is what you'd need to decode the DVI signal.
"AGP Texture Download Problem Revealed"
/. editing staff; your readership is depending on you to drag the other editors up the bell curve kicking and screaming by your example. Don't give up now. =)
"AGP Texture Download Problem" implies that there's a problem downloading textures via AGP from main memory. But it's not about texture transfers at all, it's about transfers of rendered frames back to the system (in the opposite direction).
Hey, 'Taco... You're the high point of the
I'm not suprised at this - when you spend your effort optimising for
output, dragging that final image back up to the input is kinda like
running up a downward moving escalator...you *can* do it - but you
probably shouldn't.
It seems to me that if you are rendering movies with this technology,
you are either a small operation who can probably afford to wait (say)
10x longer than realtime to do it - or you are some big production house
who can afford to do better.
In those cases, why not simply stick a frame-grabber onto the digital output?
Heck you can even get around the 8 bits-per-component problem by using a
fragment shader to render the high order bits to red and the middle bits
to green and the low order bits to blue - then do three passes to render
the Red component of your image at 24 bits per pixel, then the green, then
the blue.
Using the downstream performance to your advantage is the way to go.
The title of this article (which talks about "Texture Download" is most
confusing because that's a term usually used to describe the process of
taking a texture map out of the CPU and stuffing it into the graphics
card's texture memory.
This is more like "Screen Dump Upload".
www.sjbaker.org
Well, that's what I thought, and I think what I said. If you're doing a gradient, I don't know why you'd want the shortest distance across the RGB cube, unless you enjoyed banding. ;p
I don't know what you are looking for in the 2-dimentional display. A single simple gradient between 2 colors is a line. The greyscale gradient was the simplest arbitray example. I could equally have used a gradient between yellow (255,255,0) and blue (0,0,255) and it works to the same 256 steps.
The earlier post had suggested that opposite corners would be 443 steps. I was explaining it's not. It's a distance of 443, but still 256 steps.
-
- - You can't take something off the Internet! That's like trying to take pee out of a swimming pool.
I wrote a benchmark last night that did DirectDraw and OpenGL pixelblock transfers, both ways across the AGP bus. Now, I wouldn't call my results totally rigorous (there are various versions of drivers, no Win9x machines, a couple WinXP & the rest are Win2k), but I ran many of them multiple times, on a selection of machines/cards, & got pretty consistent numbers each time. Also, the DirectDraw readback numbers agreed fairly closely with the Studio Magic Direct3D results.
(Write denotes system to gfx card, Read denotes gfx card to system)
A few things struck me:- OpenGL does WAY faster readbacks, especially on nVidia hardware.
- OpenGL is faster for writes too, on nVidia, but a lot slower on ATI
- ATI seem to optimise more for DirectX
- The SGI's unified memory architecture does help, though not as much as I would have expected.
- Matrox's OpenGL drivers sucked big time.
- These numbers would look better in one of Damage's graphs.
Anyway, I'm convinced that there's no particular hardware problems involved, other than perhaps readback being limited to PCI66 speeds. I have no idea why DirectX readbacks are so much slower - can it really be that every single company just hasn't bothered to optimise this path, even though they have for OpenGL? Or is there something within DirectX itself that's holding them all back?
Why would anyone engrave "Elbereth"?