Facts and Fiction of GPU-Based H.264 Encoding
notthatwillsmith writes "We've all heard a lot of big promises about how general-purpose GPU computing can greatly accelerate common tasks that are slow on the CPU — like H.264 video encoding. Maximum PC compared the GPU-accelerated Badaboom app to Handbrake, a popular CPU-based encoder. After testing a variety of workloads ranging from archival-quality DVD rips to transcodes suitable for play on the iPhone, Maximum PC found that while Badaboom is significantly faster than X264-powered Handbrake in a few tests that require video resizing, it simply can't compare to the X264-powered Handbrake for archival-quality DVD backups."
To begin with, x264 blows the water out of Badaboom in terms of speed when similar settings are used. Badaboom appears to use the rough equivalent of --aq-mode 0 --subme 1 --scenecut -1 --no-cabac --partitions i4x4 --no-dct-decimate in terms of x264 commandline... its no wonder its "fast" when they compare it to x264 on far slower settings!
GPU encoders won't be able to compete with CPU encoders until they either get a lot faster (in which case they'll compete in the "high performance" market) or they get much better quality, since at sane settings x264 unsurprisingly blows Badaboom out of the water quality-wise, too. Until then, the product is not only completely proprietary but furthermore simply inferior, and they're going to have a very hard time marketing it.
All MPEG formats (including H.264) are lossy; if you want lossless, use HuffYUV, Lagarith, or FFV1 (or one of a countless variety of similar proprietary formats, such as Sheer YUV). Of course, this will give far larger file sizes, for obvious reasons.
And it seems that I made a slight oversight here also; --qp 0 in x264 (in the standard as qpprime_y_zero_transform_bypass_flag) is set, H.264 can indeed be a lossless format too, making it the only MPEG video format with a lossless mode.
I was referring to the idea of encoding a lossy format in another lossy format, resulting in further losses. Not necessarily just the loss of the original lossless-to-lossy. Sorry if I was unclear.
Seriously, why encode twice? And why rate performance on how fast you can lose bits?
Since Badaboom is a baseline-only encoder, I would guess one of its main markets would be to backup movies in a format that can be played by iPods or similar.
You may have a point, or you might not. Depends on the definition of "archival", and your specific purpose for doing so. I imagine most historians who deal with digital data would scoff at your conflating the terms used to describe their work, with some home user who just wants to back-up their DVDs...
There's certainly going to be loss, when encoding from MPEG-2 DVDs to H.264. But considering how ridiculously large DVD video is for the relatively small amount of data it contains, I'd say a tiny drop in quality is generally acceptable in exchange for reducing the storage space required for near-as-high-quality backups of your DVDs in (eg.) 1/10th the space.
Don't quote me on that, though, it's just a hypothetical example. I just recently finished explaining, here, why H.264 isn't all that much more effective than MPEG-2 where indistinguishable/high-quality (rather than just "watchable") is desired: http://slashdot.org/comments.pl?sid=956141&cid=24940379
In fact, you could probably re-compress a DVD with MPEG-2 (instead of H.264) and get equivalent quality at almost equally low data-rates, simply because the DVD producer's MPEG-2 encoders are terrible, and the settings they use (GOP size, fixed resolution/black borders, high frequency noise, etc.) waste a LOT of the bitrate on things which really don't improve visual quality.
And to be a bit pedantic... H.264 is, in fact "MPEG". It's MPEG-4 AVC (Part10), while DVDs use MPEG-2.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
H.264/AVC includes lossless compression as well as lossy. The same is true for the wavelet based "snow" codec. Still, I'd recommend FFV1 for best compression, as long as you don't need the video to be playable by all the standard H.264 decoders out there.
This test is about reencoding from a DVD to H.264/AVC. If you want lossless quality, you need only copy the MPEG-2 stream... Reencoding to a lossless format will dramatically increase the file size, without any quality improvement.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
To know how the next pixel should be compressed you must know the statistical likelihoods of the previous pixels. So compression is a really linear operation. You could have threads that work from each keyframe of the video independently but that still isn't ideal for graphics cards.
From the CUDA guide-
"Every instruction issue time, the SIMT unit selects a warp that is ready to execute and issues the next instruction to the active threads of the warp. A warp executes one common instruction at a time, so full efficiency is realized when all 32 threads of a warp agree on their execution path."
So if you have code that isn't SIMD-able you are really only using 1/32 available threads per unit of branching code.
HuffYUV supports RGB and YUY2, and the ffmpeg extension (FFVHUFF) supports YV12 too.
They're not encoding video. They're transcoding it. They're starting from one compressed representation and outputting another compressed representation. (Now, with twice the artifacts!)
The good test for this is football. The players, ball, and field are all moving in different directions. If the motion compensation gets that right, it's doing a very good job.
You seem to not understand the difference (or that there is a difference) between multi-threaded programming, and SIMD data processing.
The former requires dividing a single application up into independent parts (threads), where no one part needs to wait for the output of another, yet all are doing important processing, that comes together in the end to accomplish something the user wanted.
The later simply requires recognizing what algorithms you are performing repeatedly and sequentially in a program (like video processing) and using the appropriate instructions to send those commands to the SIMD unit in the CPU (or, potentially, the GPU, as I was hypothesizing in my last comment) so that they will be performed much more quickly, rather than each operation sequentially.
The former (threading) is a complex problem, which isn't remotely as earth-shattering as the crazy tech press would have you believe. It's just biting the CPU manufacturers in the ass, because that's the only way they know how to make more money, and the rest of the world isn't coming along as quickly as they'd like. But most importantly, it isn't useful for, nor relevant to GPUs.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
I don't know what your source is, but MPEG-2 can't even APPROACH MPEG-4 AVC quality at the same bitrate (at low bitrate), and MPEG-4 AVC can produce a much more compact file for a specified quality (such as where DVD-quality or better). On the other hand, MPEG-4 is much more recent, and takes an order of magnitude more processing power to encode and decode. MPEG-4 uses much improved intraframe compression, variable-size macroblocks, and more advanced descriptions of block motion. Even if we drop the issue of MPEG-2 support for B-frames and limits on P/B frames per GOP (limited by the MPEG-2 profiles, which could be ignored), MPEG-4 is much more efficient at removing redundant information. Finally, MPEG-4 adds more advanced entropy coding for the final lossless compression of coefficients, etc after lossy compression is performed -- the CAVLC coding is an improvement on MPEG-2's standard variable-length coding. CABAC's arithmetic coding is even more efficient than CAVLC.
MPEG-4/AVC was intended to deliver comparable quality to MPEG-2 at half of the bitrate, and certainly succeeds at low bitrates. At higher bitrates (near-perfect picture quality), you certainly would have been right about the Advanced Simple Profile for MPEG-4 (used in Divx, Xvid, etc), but AVC should still be more efficient.
Incidentally, the MPEG-2 profile allowed in DVDs was picked to ease the work of the decoding hardware (savings on cost for consumers), at the cost of compactness. The fixed resolutions, bit rate limitations (both max and min bitrates), and GOP limits make it much easier to create a compatible hardware decoder. Yes, they can sometimes significantly decrease compression, but they made early DVD players marketable. Within these significant limitations, the studio-grade encoding software and technicians are PHENOMENAL at delivering maximum quality. If you're used to consumer grade MPEG-2 encoding, something like the pro version of Cinema Craft Encoder is a revelation (an expensive one though -- nearly $2K). See if you can sniff up a trial or demo, and compare the output quality to premiere.
This is done by having an extremely simple open-source wrapper which is statically linked to x264; the raw frames to be encoded are passed to it over a pipe by the main program. This completely bypasses the limitations of the GPL without violating the spirit of it, since anyone who wants to can still read the source code of the wrapper, modify it, and recompile it as necessary and still use it with the main application.
Moreso, that is exactly how proprietary software is supposed to interact with GPL software. See Mere Aggregation, especially the last paragraph:
By contrast, pipes, sockets and command-line arguments are communication mechanisms normally used between two separate programs. So when they are used for communication, the modules normally are separate programs.
Yes, Core2 seems to have much better SSE units than the AMD chips, but this only really manifests itself when running code optimized to use SSE... And that's usually hand optimized assembly, as compilers aren't generally good at generating SSE code yet.
John the ripper SSE2 mode on a core2 is 2-3 times faster than the generic compile...
John the ripper SSE2 mode on an AMD (tested on a quad core phenom and dual core opterons) is slightly slower than the generic compile with gcc 4.3 and -O3.
The core2 beats a similarly clocked phenom by a significant margin on the sse2 code (2.3ghz cores, 2200k vs 1600k per core) but the AMD is considerably quicker running the gcc compiled code
The big question is, how much of the code you run is optimised for the SSE units found in modern processors, and how much of it uses it at all? How much of the precompiled software you download is compiled to run on a 386, and thus makes no use whatsoever of modern processor features?
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
But not if it's only running on a single core. Then it'd obviously max out at 25% on a quad core machine, provided he's got nothing else running.
You mean, like these?
http://ffmpeg.mplayerhq.hu/shame.html
I happened to look at ConvertXtoDVD the other day. While ffmpeg itself is licensed under the LGPL, ConvertXtoDVD also appears to use both libpostproc and libswscale which are both GPL. The ffmpeg licensing page states, "If those parts get used the GPL applies to all of FFmpeg."
I don't see any LICENSE.txt file nor any mention of the GPL or the LGPL in the version of the product I downloaded. Running strings against the binaries looking for things like "public" doesn't bring it up either.
Yours is the kind of response that I hate getting the most. You obviously didn't bother to read my post all the way through, AND most certainly didn't follow the link I provided where I explained everything in detail...
Yet, you spend time on a lengthy, indignantly reply, where you proceed to waste both your and my time, with questions I've already answered, in-depth. It only makes it more sad to know that your pointless rant got modded up. Anyhow, I'm going to skip those which you could already have read the answer to, and just cover the other points.
It's a nice addition, but the improvement is rather small... Less than 10% in the best case. Not ground breaking.
Of course is, but only minimally, as I've said.
Not really. The GOP size could be up to 137 frames IIRC (almost 10x the standard 15-18 size), and compatible with all MPEG-2 hardware. That's the minimum required for IEEE-1180 compatibility, so all decoder chips can manage at least that much without problems.
More (wider than 16/9) aspect ratios could have been included to avoid needing black bars on every single DVD.
Utterly WRONG...
Every time you see edge-noise encoded on a DVD, you're seeing sheer human stupidity in action. Every time you see black bars that don't fall on a macroblock (16 pixel) boundary, you're seeing a HUGE waste of bits.
These problems are universal to damn near ALL DVDs, even though it's absolutely trivial to avoid. The "technicians" involved have NO IDEA what they are doing, and waste a huge amount of bits, and significantly lower quality, because of it. It's just incredibly fortunate DVDs have such a huge amount of bandwidth that these idiotic mistakes can be covered up by increasing the bitrate further. Of course, if they try to squeeze a film onto a single-layer DVD for cost, or include a lot of extras, then the video starts looking pretty lousy because of it.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant