Facts and Fiction of GPU-Based H.264 Encoding
notthatwillsmith writes "We've all heard a lot of big promises about how general-purpose GPU computing can greatly accelerate common tasks that are slow on the CPU — like H.264 video encoding. Maximum PC compared the GPU-accelerated Badaboom app to Handbrake, a popular CPU-based encoder. After testing a variety of workloads ranging from archival-quality DVD rips to transcodes suitable for play on the iPhone, Maximum PC found that while Badaboom is significantly faster than X264-powered Handbrake in a few tests that require video resizing, it simply can't compare to the X264-powered Handbrake for archival-quality DVD backups."
Wouldn't archival-quality backups be actual MPEG instead of H.2 or whatever? I mean if you're archiving, why go lossy?
Is it just a badly-designed test?
To begin with, x264 blows the water out of Badaboom in terms of speed when similar settings are used. Badaboom appears to use the rough equivalent of --aq-mode 0 --subme 1 --scenecut -1 --no-cabac --partitions i4x4 --no-dct-decimate in terms of x264 commandline... its no wonder its "fast" when they compare it to x264 on far slower settings!
GPU encoders won't be able to compete with CPU encoders until they either get a lot faster (in which case they'll compete in the "high performance" market) or they get much better quality, since at sane settings x264 unsurprisingly blows Badaboom out of the water quality-wise, too. Until then, the product is not only completely proprietary but furthermore simply inferior, and they're going to have a very hard time marketing it.
Frist psot!
"the-alamo-doesn't-have-a-basement dept."
Um, what???
The CPU usage of the program when used with a good video card is 25% on my quad core machine, implying it is CPU bound right now. That means if they can get the CPU overhead down, even a little bit, they will stand to get huge gains.
This is the most obvious and boring insight they could possibly offer... Everyone with the slightest interest knows this already.
The low quality of hardware-based video encoder cards is a very well-known fact, and those MPEG encoders cards are just ASICs on a PCI card, almost exactly the same hardware as your video card.
The point of offering up APIs for GPUs, and AMD's attempt to integrate the GPU ASIC with the CPU via HyperTransport, is aimed at improving things, however.
x264 does a good job because it's an open source project, with several skilled and interested individuals continually tweaking the code to improve quality and performance. Once hardware-based video encoding routines aren't hidden in closed-source firmware on a dedicated card, the same development effort can step up and improve HARDWARE encoding now, exactly as they have with software.
Not only can quality be significantly improved, you can expect performance to improve significantly as well, even with greater quality. The initial implementation of any codec is always relatively poor performing, and low quality, so this wouldn't even be an insightful observation if it was comparing x264 with any other software based encoder... The only difference is that a new software h.264/AVC encoder would be SLOWER than x264, as well as being much lower quality.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
To know how the next pixel should be compressed you must know the statistical likelihoods of the previous pixels. So compression is a really linear operation. You could have threads that work from each keyframe of the video independently but that still isn't ideal for graphics cards.
From the CUDA guide-
"Every instruction issue time, the SIMT unit selects a warp that is ready to execute and issues the next instruction to the active threads of the warp. A warp executes one common instruction at a time, so full efficiency is realized when all 32 threads of a warp agree on their execution path."
So if you have code that isn't SIMD-able you are really only using 1/32 available threads per unit of branching code.
Comparing a GPU, an SIMD (single instruction, multiple data) vector processor, to a CPU, a superscalar sequential processor, is like comparing apples and oranges. Sure, they are both fruits but they don't taste the same. Using the term 'general-purpose' to describe a GPU is pushing the limits of what a GPU is. Certainly, it can run general-purpose programs but much faster at running what it was designed to run, data-parallel applications. A GPU does not have to have a fast clock because it makes up for it by doing a lot of operations in parallel.
A CPU, OTOH, can have a very fast clock but even if it has a superscalar architecture, it cannot come close to the performance of a GPU on data-parallel apps.
It is obvious that neither the GPU nor the CPU are universal processors and, IMO, that is an unforgivable sin. Having both of these types of processors in the same machine is asking for trouble. They require two incompatible programming models. Programming such a beast is like pulling teeth with a crowbar. Only a few are good at it and that is not good for the industry. What is needed is a fast vector processor that can run in MIMD (multiple instruction, multiple data) mode. This way, it would have no trouble running general-purpose apps just as fast as data-parallel apps. The problem is that such a processor would need a radically different programming model, one that is specifically designed for fine-grain MIMD prosessing. None of the current programming tools would work with it. Still, that's the future of parallel computing. There is no getting around this.
Herading the Impending Death of the CPU
It will take at least another 18 months before GPU encoding becomes seamless and the ideal solution for most users.
Intel is working on its own GPU, I am sure that they will exploit multimedia handling capabilities (video/photoshop) as one of the selling points of that GPU.
I've said it before, and I'll say it again. Slysoft's Clone DVD Mobile rocks! Just point to your ripped DVD files, choose your profile (PSP, Ipod, PS3...etc), resolution, and quality. Let it bake in the oven for a few hours and it's done.
And yes, I purchased the software after the trial period because I love it so much.
Basically, it's a GUI interface to the mencoder application that's freely available.
Life is not for the lazy.
I have been working on h.264 video codecs for a large organization for 4 years. This is not a valid comparison; it does not tell you only about GPU vs. CPU, because the differences between the code bases are enormous. Many parts of an h.264 encoder can be parallelized - motion estimation, transform, intra prediction/decision. Others can't - CABAC because of the algorithm, the deblocking filter because the standard is broken. If you take an optimized encoder and offload just the easy parts like motion estimation to a GPU you can get about 30-50% speed improvement. If you did a really good job and/or your original code wasn't very good you might get a 2x speed gain with the same encoder. I doubt you'll ever see much more than that though, with a general interface like CUDA or DX/OpenGL.
They're not encoding video. They're transcoding it. They're starting from one compressed representation and outputting another compressed representation. (Now, with twice the artifacts!)
The good test for this is football. The players, ball, and field are all moving in different directions. If the motion compensation gets that right, it's doing a very good job.
They say that they are comparing apples to apples ? Well not according to my humble opinion: closed- and open-source products are like apples are to oranges. First of all the open source product is usually not hampered by intentionally crippling due to MPAA restrictions. Secondly the closed source app can not depend on fixing problems at short notice.
Tell me when I can get a PCI card with a one or more Cell co-processors to do the heavy lifting.
Did anyone catch what GPU/graphics card they used? The article mentions they used a Q6600 ($185) as their test CPU but it makes no mention of which GPU they ran with.
Did they run this on an 9800GT? 8800GT? 8600?
To make this a fair comparison they should be running the test on a system with a quadcore and the lowest end GPU for the CPU test. Then run the same comparison on a low end Intel CPU (same price as that low end GPU from above) and a GPU priced about the same as their Q6600.
This would fit better with comparing what NVIDIA's been claiming with their optimized PC campaign.
although this is a thread about GPU h.264 ENcoding, here is a program that uses the GPU for h.264 DEcoding: http://mpc-hc.sourceforge.net/