The Wretched State of GPU Transcoding
MrSeb writes "This story began as an investigation into why Cyberlink's Media Espresso software produced video files of wildly varying quality and size depending on which GPU was used for the task. It then expanded into a comparison of several alternate solutions. Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user. The conclusion, after weeks of work and going blind staring at enlarged images, is that the state of 'consumer' GPU transcoding is still a long, long way from prime time use. In short, it's simply not worth using the GPU to accelerate your video transcodes; it's much better to simply use Handbrake, which uses your CPU."
I've heard from a lot of sources that the quality of output from various GPU accelerated video encoding schemes almost invariably lacks when compared to an established, known good CPU based video encoding scheme. When the GPU encoders can match quality, will they still be fast? Are they just cheating now? What gives?
The GPU isn't meant to do everything. If it were, there wouldn't be a CPU. Considering the hatred that was poured on Quicksync here, and that Quicksync still produces better quality Transcodes than GPUs while being substantially faster, I don't think we'll be seeing the end of CPU transcoding anytime soon.
AntiFA: An abbreviation for Anti First Amendment.
...since the results of OpenCL code is static across GPUs rather than being an arbitrary output.
that Cyberlink's software is pretty damn shitty.
I've done a little bit of playing around with GPU encoding myself and its not real hard to turn out something faster than your CPU on the GPU with identical quality. Getting varied quality from different cards means you're doing something VERY wrong.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
There is a screwed up graph on page two where they use the same graphic twice, and the caption describes aspects of the one that is missing. I really wanted to see the comparison too. You would think in an article of that size and scope someone would be responsible for checking layout as well as copy. It is no wonder we are losing to china. Their English may be worse, but their work ethic and attention to detail is possibly better.
Silence is a state of mime.
I think that the real benefit of GPUs for transcoding will be seen once people start making new as-yet unimagined encoding schemes that are designed to do data parallel tasks that wouldn't even be considered on a traditional CPU.
It sounded like an interesting read. However, I didn't get past the summary. Why would you split it into 9 pages?
Here's a link to the article in 1 page.
I'm confused as to how a review of transcoding applications that utilize GPUs and is user friendly doesn't include DVDFab??? DVDFab is user-friendly, supports CUDA, DXVA, Intel Quick Sync and Software (CPU) encoding which supports the CoreAVC codec. DVDFab is available for Windows and Mac OSX. Perhpas it wasn't selected because there isn't a Linux version...
Using CUDA with DVDFab and 2-pass encoding, I get consistently excellent results and my high quality encoding time of a Blu-ray (for backup purposes) is between 90 and 120 minutes. 1-pass encoding is faster. These results have been consistent.
The real problem is a lack of a common API for encoding regardless of GPU/CPU, which leads to vendor-specific implementations with varying degrees of quality. The most efficient way to pretty much do anything is a dedicated HW block (from both perf and power point of view), so there is no question that there is value in encoding using dedicated hardware, but the software has to catch up.
Help! I am a self-aware entity trapped in an abstract function!
that encoders inexplicably insist on codex and wrappers that predate the Millenium? The problem with transcoding is that it exists at all. Strongarm the holdout encoders into using h264 or mp4v with mp4 wrappers, and transcoding will be like... well, like anything no one does anymore.
The Admin and the Engineer
So basically the article says GPU rendering is bad, but QuickSync is good enough for prime time.
Duh. QS is made to do a very specific task (encoding/decoding video) and it can do it super fast at decent quality rates. There's always the tradeoff of quality vs. encoding time. With QS, I can rip an entire 50GB Blu-Ray in 12 minutes to a 1080p MKV @ 8000kbps. It takes about 16 hours doing the same task with a normal x264 encoder such as Handbrake even though the quality is a little bit better. Is it worth waiting around 16 hours for me? Nope.
With enough bitrate, anything looks good. The key is to just bump up the bitrate in MediaCoder when using QuickSync for encoding to something very high.
That's why video professionals and tv stations rely on hardware based transcoding, and this solutions tend to be expensive. There should be many systems than encode H264 videos really fast, something like this: http://www.blackmagic-design.com/products/teranex/
after it all, it is a GRAPHICS processing unit, an it's designed for a very specific sub section of computing.. known as GRAPHICS PROCESSING.
if I wanted some jack of all trade type computation- I'd use something a little more common like a CPU..
Transcoding- hey! that's GRAPHICS PROCESSING isn't it? gosh- I hope my GPU can do me some of that!
Pity the Handbrake devs are dickwads.
1. It's not funny.
2. They make an excellent bit of software that I have been using for free for years. Unless you helped them out you can't complain.
3. The guys creating Handbrake and the guys making video encoders are not the same people, so your rant is misdirected.
4. I mailed them two suggestions for improvements, and both got implemented. Now this may be because my suggestions were the kind of things that were (a) genuine improvements and (b) interesting for the developer and therefore would have been implemented anyway, but in my experience they are responsive to the right kind of suggestions.
I use DVDFab to rip DVDs using my GPU, and it positively flies. Most 2 hr movies take around 10 minutes to convert to H.264. It doesn't support VBR, but outside of that I've never had trouble with it. The resulting video quality is quite good as well (except with files that need deinterlacing, but that's always a problem). I think the person who wrote the articles just didn't try the right programs.
I set out to test presets. Specifically, I set out to test the presets of software packages which are sold on the purported *strength* of those presets. I say so in the first paragraph:
" Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user."
That's why MediaCoder results weren't included.
The entire article came about because Cyberlink's iPhone 4S preset yielded files that were 1.4GB if I used CPU encoding or a GTX 580, and 188MB if I used Quick Sync. That disparity is what I noticed when I went to check encode quality for the initial IVB review.
Can you build custom profiles in CME and create outputs that avoid these problems? You can -- though some options aren't available. That, however, is not the point. If I'm going to build my own custom profiles, I can download a copy of MediaCoder for free and do it with a more powerful piece of software that offers a huge number of options.
I did a review of "Software that claims to automate the GP encode process." I did not do a review of "Can Cyberlink MediaEspresso EVER create a decent image?" Given what I set out to evaluate, my ability to tweak profiles to achieve a satisfactory result is not a valid criteria for my conclusions.
That while the x264 guys aren't wrong to want to keep working on a software encoder that is tweakable, there is nothing wrong with a fixed function hardware encoder for some tasks. Sometimes, speed is what you want and "good enough" is, well, good enough.
Like at work I edit instructional videos for our website (I work at a university) using Vegas. I use its internal H.264 encoders, which can be accelerated using the GPU. They are quite zippy, I can generally get a realtime or better encode, even when there is a decent amount of shit going on in the video that needs to be processed (remember that Vegas isn't for video conversion, I'm doing editing, effects, that kind of thing).
Now the result is not up to x264 quality, per bit. I could get better quality by mucking around setting up an avisynth frameserver and having x264 do the encoding using some tweaked settings for high quality. However it would be much slower.
Not worth it. I'll just encoder a reasonably high bitrate video. It is getting fed to Youtube anyhow, so there's a limit to how good it is going to look. The faster hardware assisted encode speeds are worth it.
If I was mastering a Blu-ray? Ya I might do the final encode to go off to fabrication with x264 (actually more likely an expensive commercial solution that can generate mastering compliant bitstreams). Spend the extra time to get it as quality as possible because of all the other work and because it could actually be noticable.
There is room for both approaches.
Please see Elemental Technologies GPU-accelerated H.264 transcodes.
Am I the only one to find this software the most unintuitive tool ever created ?
Except that you want it to do compression. As those encoders prove, it does the graphics just fine, you just don't get any compression out of it. And it is not called a "compression" processing unit.
Compression is about hte worst thing you can do on a GPU since for really good results it ends up with massive data dependencies and is very difficult to parallelize.
The only thing worse is decompression (well, at least the lossless part), which is why it is handled by special purpose hardware and not at all the graphics part of the GPU.
When you have a bunch of morons who cannot make a consistent naming scheme for their graphics cards, do you really think the qualities will be good either?
I'd be surprised if the damn things even had the same transistor count (same version) with a design that makes most of the processor parallel that isn't required to be done sequentially for the sake of saving money on what would normally be low yields.
I seriously had to use features to find my card in a list 3 days ago because there was 4 cards with the same codenames and titles. Fuck ATI.
Use OpenCL and not the H.264-specific APIs the vendor provides? Yes, GPU vendors cheat, I 've seen pictures. Now, how about x264 supporting OpenCL?
Surprise, surprise, I have the feeling that most of you haven't actually read the article. The article is not arguing that GPUs are inherently flawed. Also, the article is not an NVIDIA-vs-AMD competition. Rather, the author tests software on each platform. It's the software that is bad, not the GPUs themselves. For instance, the NVIDIA GPU does quite well with Arcsoft and Xilisoft; this wouldn't be possible if GPUs were somehow broken for transcoding. After all, as others have pointed out here, floating point support is actually quite good on modern GPUs.
Still, poor software shouldn't come too much as a surprise. While CUDA and OpenCL certainly make GPU-based computing easier, it is still a relatively new technology that only a few programmers know how to use efficiently. I'm also not sure that the market pressure is there yet from consumers for efficient GPU-based applications (how many of them actually know what a GPU is?).
CUDA was released, supported by NVIDIA GPUs, in early 2007. The first OpenCL specification was not released until late 2008 (OpenCL has not been around for 4 years, as you claim). As for which is more popular, I'm afraid that you have this backwards too. The dominant market force for GPU computing is supercomputing. How many of the top 5 supercomputers used AMD GPUs? Zero. How many use NVIDIA GPUs? Three. And they're all using CUDA because it's more feature rich---it can do fancy things like direct memory copies between infiniband interconnects and GPU memory.
FYI: OpenCL on NVIDIA is implemented on top of CUDA, so you're still using CUDA if you're using OpenCL on NVIDIA.
Why not use gate arrays and logic devices?
http://www.altera.com/literature/wp/wp-brdcst0306.pdf
http://www.xilinx.com/support/documentation/topicaudiovideoimageprocess.htm
After waiting and trying and waiting and trying and waiting and trying... finally conversion to 6GB mkv with full DTS works reliably. I converted my library of 600+ blu rays over the last few weeks.
Using the GPU I get about 70fps, and I've watched about 15 of the movies without noticing any problems at all.
I flat out gave up with trying to support my fricking PS3.
I do not own 600 blu rays. That was supposed to be 200+.
Around here running that 24x7 would cost ~ $200. You'd need to run it for several years to pay for the cost of a new system.
I'm not sure which year you are discussing in, but the situation of the article refers to how the available options stand TODAY, not as they stood in 2008 or 2007, when not even direct GPU transcoding was available in a functional form. If you have a 4 years old HD4xxx series GPU, you can run OpenCL 1.0/1.1 software on it. Period. I don't see the point of you mentioning super computer clusters running CUDA in this discussion. Are these clusters available to us for transcoding video on our GPUs? Not likely. Take a look at how much OpenCL software is available compared to how much CUDA software is available, and you will see which "camp" is the popular one. Hint: it's not CUDA.
CABAC doesn't scale well in massively threaded environments that is true. However there are ways to avoid the issues involved and this really isn't the issue either. It's not the CABAC so much as the bit stream writing for the most part. CABAC scales fine if you parallelize it across slices. Of course no modern encoders make use of multiple slices per field/frame, so it's more of an issue of whether latency is an issue. You can run parallel CABAC encoders by buffering frames.
The real problem especially when dealing with a NVidia vs. ATI issue is that while floating point performance on these two GPUs rock, the NVidia chips have piss poor support for shift/rotate etc... bit level operations on internal registers which makes reading and writing bit streams utterly painful at best. The CABAC code obviously takes a pretty severe hit from this. A solution to this problem is a single shared table across parallel threads for all 8 bit position states. Though, this will likely still suck since there will be huge numbers of mutexes on the table for the lookup and the table is just too large to duplicate for each core. But on the NVidia, binary manipulation operations seriously are lacking where ATI has had those sorted out for a while. This is also why doing hash brute force cracking on an NVidia appears much slower than on a ATI.
I personally use NVidia for games and ATI for computing.
Funny/relevant or not, the AC's complaint is largely accurate, in my opinion. For a support forum, the Handbrake forums are an incredibly hostile environment, with the devs often being the worst offenders. Yes, they've made a great piece of software, but I don't see why that excuses their rude behavior. I can't imagine the devs interact with people in the real world like that, and I don't see why they should interact with people like that on the Internet.