The Wretched State of GPU Transcoding
MrSeb writes "This story began as an investigation into why Cyberlink's Media Espresso software produced video files of wildly varying quality and size depending on which GPU was used for the task. It then expanded into a comparison of several alternate solutions. Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user. The conclusion, after weeks of work and going blind staring at enlarged images, is that the state of 'consumer' GPU transcoding is still a long, long way from prime time use. In short, it's simply not worth using the GPU to accelerate your video transcodes; it's much better to simply use Handbrake, which uses your CPU."
I've heard from a lot of sources that the quality of output from various GPU accelerated video encoding schemes almost invariably lacks when compared to an established, known good CPU based video encoding scheme. When the GPU encoders can match quality, will they still be fast? Are they just cheating now? What gives?
The GPU isn't meant to do everything. If it were, there wouldn't be a CPU. Considering the hatred that was poured on Quicksync here, and that Quicksync still produces better quality Transcodes than GPUs while being substantially faster, I don't think we'll be seeing the end of CPU transcoding anytime soon.
AntiFA: An abbreviation for Anti First Amendment.
...since the results of OpenCL code is static across GPUs rather than being an arbitrary output.
There is a screwed up graph on page two where they use the same graphic twice, and the caption describes aspects of the one that is missing. I really wanted to see the comparison too. You would think in an article of that size and scope someone would be responsible for checking layout as well as copy. It is no wonder we are losing to china. Their English may be worse, but their work ethic and attention to detail is possibly better.
Silence is a state of mime.
I think that the real benefit of GPUs for transcoding will be seen once people start making new as-yet unimagined encoding schemes that are designed to do data parallel tasks that wouldn't even be considered on a traditional CPU.
Here's a link to the article in 1 page.
Because it was a review the actual GPU encoders themselves not various frontends to those GPU encoders.
The real problem is a lack of a common API for encoding regardless of GPU/CPU, which leads to vendor-specific implementations with varying degrees of quality. The most efficient way to pretty much do anything is a dedicated HW block (from both perf and power point of view), so there is no question that there is value in encoding using dedicated hardware, but the software has to catch up.
Help! I am a self-aware entity trapped in an abstract function!
that encoders inexplicably insist on codex and wrappers that predate the Millenium? The problem with transcoding is that it exists at all. Strongarm the holdout encoders into using h264 or mp4v with mp4 wrappers, and transcoding will be like... well, like anything no one does anymore.
The Admin and the Engineer
Getting varied quality from different cards means you're doing something VERY wrong.
Maybe it means you're good at programming one GPU but you're not as good at programming the other. Or if another person did the code for the other GPU, maybe the other person doesn't code as well as you do.
But if all these chips have different instruction sets and APIs, it sounds kinda like saying, "If your program runs slower on iOS than it does on Android, you're doing something very wrong." Maybe. The point is that things were supposed to be getting easier, but apparently they're not.
Breakfast served all day!
So basically the article says GPU rendering is bad, but QuickSync is good enough for prime time.
Duh. QS is made to do a very specific task (encoding/decoding video) and it can do it super fast at decent quality rates. There's always the tradeoff of quality vs. encoding time. With QS, I can rip an entire 50GB Blu-Ray in 12 minutes to a 1080p MKV @ 8000kbps. It takes about 16 hours doing the same task with a normal x264 encoder such as Handbrake even though the quality is a little bit better. Is it worth waiting around 16 hours for me? Nope.
With enough bitrate, anything looks good. The key is to just bump up the bitrate in MediaCoder when using QuickSync for encoding to something very high.
That's why video professionals and tv stations rely on hardware based transcoding, and this solutions tend to be expensive. There should be many systems than encode H264 videos really fast, something like this: http://www.blackmagic-design.com/products/teranex/
As the author:
Because 3000-word articles with PNGs at ~300K per large image and 100K per preview image aren't fun reading in a single go. There's ~1.5MB of imagery just on the third page . Pages 3-8 have about the same, and that's with the images only loaded as thumbnails.
If you've got a fast net connection, you won't care. If you don't have a fast net connection, loading 16MB of images at once isn't a lot of fun.
Visual quality comparisons are one area where you can't use low-quality JPGs. A 9-page article at ET is a real rarity, it's not something we do because we want to spam ads.
Pity the Handbrake devs are dickwads.
1. It's not funny.
2. They make an excellent bit of software that I have been using for free for years. Unless you helped them out you can't complain.
3. The guys creating Handbrake and the guys making video encoders are not the same people, so your rant is misdirected.
4. I mailed them two suggestions for improvements, and both got implemented. Now this may be because my suggestions were the kind of things that were (a) genuine improvements and (b) interesting for the developer and therefore would have been implemented anyway, but in my experience they are responsive to the right kind of suggestions.
I use DVDFab to rip DVDs using my GPU, and it positively flies. Most 2 hr movies take around 10 minutes to convert to H.264. It doesn't support VBR, but outside of that I've never had trouble with it. The resulting video quality is quite good as well (except with files that need deinterlacing, but that's always a problem). I think the person who wrote the articles just didn't try the right programs.
The problem is that the processor (GPU in this case) shouldn't make a difference as to the results of the calculations. Sure, a shittier GPU is going to have a shittier picture when forced to run at a certain framerate beyond its capabilities, but when used as a processor for a process that isn't time constricted, it should just take longer. Instead, feeding the same input into one brand of GPU is giving different results when it is run on a different GPU.
Simple reason: Because DVD Fab never came up. I Googled several variations on the term and asked Nvidia, Intel, and AMD for their own recommendations as far as products were concerned. Cyberlink and Arcsoft were recommended by multiple sources. Badaboom, I knew about and was familiar with. Xilisoft and MediaCoder were added as a result of additional research.
I never came across DVD Fab. That's not a judgment on its quality or output.
I set out to test presets. Specifically, I set out to test the presets of software packages which are sold on the purported *strength* of those presets. I say so in the first paragraph:
" Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user."
That's why MediaCoder results weren't included.
The entire article came about because Cyberlink's iPhone 4S preset yielded files that were 1.4GB if I used CPU encoding or a GTX 580, and 188MB if I used Quick Sync. That disparity is what I noticed when I went to check encode quality for the initial IVB review.
Can you build custom profiles in CME and create outputs that avoid these problems? You can -- though some options aren't available. That, however, is not the point. If I'm going to build my own custom profiles, I can download a copy of MediaCoder for free and do it with a more powerful piece of software that offers a huge number of options.
I did a review of "Software that claims to automate the GP encode process." I did not do a review of "Can Cyberlink MediaEspresso EVER create a decent image?" Given what I set out to evaluate, my ability to tweak profiles to achieve a satisfactory result is not a valid criteria for my conclusions.
The review and summary are giving mixed signals then, as I had the same reaction to the article.
If this is a review of the encoders and not the front ends, then why is Handbrake specifically pointed out for ease of use?
Handbrake is only a front end to an encoder that can easily give similar or vastly worse results if you don't know how to use it.
If tyranny and oppression come to this land, it will be in the guise of fighting a foreign enemy. - James Madison
That's a distinction that the average user doesn't make. At the end of the day, I don't care if the front-end secretly passes the video to a collection of manatees who perform FFT calculations using colored balls they pick out of a pit. The criteria was a piece of software with easy-to-use presets that produces decent-quality video after I push "Ok."
If Program X does that, and Program Y doesn't, then Program X wins. The reason *why* is interesting and pertinent, but the question wasn't "Why do two different front-ends give different results using the same encoder?"
Mine was a reply to "Because it was a review the actual GPU encoders themselves not various frontends to those GPU encoders." which you and I both seem to disagree with.
That said, I do believe that there are better GPU assisted applications than those tested, such as DVDFab mentioned above.
I'd be very interested to see how it compares using this methodology, but testing every available application could become a full time job.
I have no affiliation with DVDFab, but it comes to mind as a decent encoder well before any of the ones tested.
If tyranny and oppression come to this land, it will be in the guise of fighting a foreign enemy. - James Madison
That while the x264 guys aren't wrong to want to keep working on a software encoder that is tweakable, there is nothing wrong with a fixed function hardware encoder for some tasks. Sometimes, speed is what you want and "good enough" is, well, good enough.
Like at work I edit instructional videos for our website (I work at a university) using Vegas. I use its internal H.264 encoders, which can be accelerated using the GPU. They are quite zippy, I can generally get a realtime or better encode, even when there is a decent amount of shit going on in the video that needs to be processed (remember that Vegas isn't for video conversion, I'm doing editing, effects, that kind of thing).
Now the result is not up to x264 quality, per bit. I could get better quality by mucking around setting up an avisynth frameserver and having x264 do the encoding using some tweaked settings for high quality. However it would be much slower.
Not worth it. I'll just encoder a reasonably high bitrate video. It is getting fed to Youtube anyhow, so there's a limit to how good it is going to look. The faster hardware assisted encode speeds are worth it.
If I was mastering a Blu-ray? Ya I might do the final encode to go off to fabrication with x264 (actually more likely an expensive commercial solution that can generate mastering compliant bitstreams). Spend the extra time to get it as quality as possible because of all the other work and because it could actually be noticable.
There is room for both approaches.
Please see Elemental Technologies GPU-accelerated H.264 transcodes.
Just FYI, I had lots of problems on DVDFab using Intel-based GPU acceleration, such as temporal misalignment frames and lots of juddering; the video seemed to speed up and slow down around 2-3x per second. I ended up leaving DVDFab entirely and switching to Handbrake to take advantage of the queue features. DVDFab has some nice features for breaking the encryption on DVDs though, so may be worth keeping around for that now that Handbrake has removed support for it.
Help I am stuck in a signature factory!
Use OpenCL and not the H.264-specific APIs the vendor provides? Yes, GPU vendors cheat, I 've seen pictures. Now, how about x264 supporting OpenCL?
I found it pretty terrible too. After uninstalling it, I came across Avidemux which is much easier (for me) to use. I've been using that since.
http://fixounet.free.fr/avidemux/
Surprise, surprise, I have the feeling that most of you haven't actually read the article. The article is not arguing that GPUs are inherently flawed. Also, the article is not an NVIDIA-vs-AMD competition. Rather, the author tests software on each platform. It's the software that is bad, not the GPUs themselves. For instance, the NVIDIA GPU does quite well with Arcsoft and Xilisoft; this wouldn't be possible if GPUs were somehow broken for transcoding. After all, as others have pointed out here, floating point support is actually quite good on modern GPUs.
Still, poor software shouldn't come too much as a surprise. While CUDA and OpenCL certainly make GPU-based computing easier, it is still a relatively new technology that only a few programmers know how to use efficiently. I'm also not sure that the market pressure is there yet from consumers for efficient GPU-based applications (how many of them actually know what a GPU is?).
CUDA was released, supported by NVIDIA GPUs, in early 2007. The first OpenCL specification was not released until late 2008 (OpenCL has not been around for 4 years, as you claim). As for which is more popular, I'm afraid that you have this backwards too. The dominant market force for GPU computing is supercomputing. How many of the top 5 supercomputers used AMD GPUs? Zero. How many use NVIDIA GPUs? Three. And they're all using CUDA because it's more feature rich---it can do fancy things like direct memory copies between infiniband interconnects and GPU memory.
FYI: OpenCL on NVIDIA is implemented on top of CUDA, so you're still using CUDA if you're using OpenCL on NVIDIA.
After waiting and trying and waiting and trying and waiting and trying... finally conversion to 6GB mkv with full DTS works reliably. I converted my library of 600+ blu rays over the last few weeks.
Using the GPU I get about 70fps, and I've watched about 15 of the movies without noticing any problems at all.
I flat out gave up with trying to support my fricking PS3.
I do not own 600 blu rays. That was supposed to be 200+.
Around here running that 24x7 would cost ~ $200. You'd need to run it for several years to pay for the cost of a new system.
Sure! Send me one and I'll test it. :)
I'm not sure which year you are discussing in, but the situation of the article refers to how the available options stand TODAY, not as they stood in 2008 or 2007, when not even direct GPU transcoding was available in a functional form. If you have a 4 years old HD4xxx series GPU, you can run OpenCL 1.0/1.1 software on it. Period. I don't see the point of you mentioning super computer clusters running CUDA in this discussion. Are these clusters available to us for transcoding video on our GPUs? Not likely. Take a look at how much OpenCL software is available compared to how much CUDA software is available, and you will see which "camp" is the popular one. Hint: it's not CUDA.
CABAC doesn't scale well in massively threaded environments that is true. However there are ways to avoid the issues involved and this really isn't the issue either. It's not the CABAC so much as the bit stream writing for the most part. CABAC scales fine if you parallelize it across slices. Of course no modern encoders make use of multiple slices per field/frame, so it's more of an issue of whether latency is an issue. You can run parallel CABAC encoders by buffering frames.
The real problem especially when dealing with a NVidia vs. ATI issue is that while floating point performance on these two GPUs rock, the NVidia chips have piss poor support for shift/rotate etc... bit level operations on internal registers which makes reading and writing bit streams utterly painful at best. The CABAC code obviously takes a pretty severe hit from this. A solution to this problem is a single shared table across parallel threads for all 8 bit position states. Though, this will likely still suck since there will be huge numbers of mutexes on the table for the lookup and the table is just too large to duplicate for each core. But on the NVidia, binary manipulation operations seriously are lacking where ATI has had those sorted out for a while. This is also why doing hash brute force cracking on an NVidia appears much slower than on a ATI.
I personally use NVidia for games and ATI for computing.
If you've got a fast net connection, you won't care. If you don't have a fast net connection, loading 16MB of images at once isn't a lot of fun.
Speaking of which, can anybody recommend a software package that cleanly implements that "load images upon scrolling near the active viewport" that I see on some sites? It seems like a nice way to do things.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)