The Wretched State of GPU Transcoding

← Back to Stories (view on slashdot.org)

The Wretched State of GPU Transcoding

Posted by Soulskill on Tuesday May 8, 2012 @10:57AM from the things-that-should-work-better-in-2012 dept.

MrSeb writes "This story began as an investigation into why Cyberlink's Media Espresso software produced video files of wildly varying quality and size depending on which GPU was used for the task. It then expanded into a comparison of several alternate solutions. Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user. The conclusion, after weeks of work and going blind staring at enlarged images, is that the state of 'consumer' GPU transcoding is still a long, long way from prime time use. In short, it's simply not worth using the GPU to accelerate your video transcodes; it's much better to simply use Handbrake, which uses your CPU."

8 of 158 comments (clear)

Min score:

Reason:

Sort:

Re:And the moral of the Story is... by CajunArson · 2012-05-08 11:13 · Score: 5, Informative

The quick sync hardware is part of the IGP block but it is specialized hardware specifically geared towards transcoding. For example, it is not using the main GPU pipeline and shader hardware to do the transcoding.

--
AntiFA: An abbreviation for Anti First Amendment.
Re:Does anyone have editors anymore? by Dputiger · 2012-05-08 11:19 · Score: 5, Informative

As the author of the story, that's an error that slipped past in formatting. I'm uploading the proper graph right after I hit "Reply" on this.
Single Page Version of Article by Anonymous Coward · 2012-05-08 11:38 · Score: 5, Informative

Here's a link to the article in 1 page.
Re:And the moral of the Story is... by billcopc · 2012-05-08 11:57 · Score: 5, Insightful

Well see, that's the thing. A GPU is better suited to some kinds of massively parallel tasks, like video encoding. After all, you're applying various matrix transforms to an image, with a bunch of funky floating point math to whittle all that transformed data down to its most significant/perceptible bits. GPUs are supposed to be really really good at this sort of thing.
My hunch is that the problems we're seeing are caused by two big issues:
1. lack of standardization across GPU processing technologies. CUDA vs OpenCL vs Quicksync, and a bunch of tag-alongs too. Each one was designed around a particular GPU architecture, so porting programs between them is non-trivial.
2. lack of expertise in GPU programming. Let's be fair here: GPUs are a drastically different architecture than any PC or embedded platform we're used to programming. While I could follow specs and write an MPEG or H.264 encoder in any high-level language in a fairly straight-forward manner, I can't even begin to envision how I would convert that linear code into a massively parallel algorithm running on hundreds of dumbed-down shader processors. It's not at all like a conventional cluster, because shaders have very limited instruction sets, little memory but extremely fast interconnects. We have a hard enough time making CPU encoders scale to 4 or 8 cores, this requires some serious out-of-the-box thinking to pull off.
Moving to a GPU virtually requires starting over from scratch. This is a set of constraints that are very foreign to the transcoding world, where the accepted trend was to use ever-increasing amounts of cheaply available CPU and memory, with extensively configurable code paths. The potential is there, but it will take time for the hardware, APIs and developer skills to converge. GPU transcoding should be seen as a novelty for now, just like CPU encoding was 15 years ago when ripping a DVD was extremely error-prone and time-consuming. If you want a quick, low quality transcode, the GPU is your friend. If you're expecting broadcast-quality encodes, you're gonna have to wait a few years for this niche to grow and mature.

--
-Billco, Fnarg.com
Re:Lack of standards, quality. by cheesybagel · 2012-05-08 12:11 · Score: 5, Informative

Hint: Not all GPUs have IEEE FP compliant math. Often they break the standard, or do something else altogether just to improve performance.
Re:Lack of standards, quality. by Skarecrow77 · 2012-05-08 13:14 · Score: 5, Insightful

but, at least in this context, speed is nearly irrelevant because it fails at the task at hand, producing high quality video.
who cares how fast it completes a task if it's failing? Nobody gives little jimmy props when he finishes the hour-long test in 5 minutes but scores a 37% on it.
Re:Incompetent Author? by Dputiger · 2012-05-08 13:29 · Score: 5, Informative

I set out to test presets. Specifically, I set out to test the presets of software packages which are sold on the purported *strength* of those presets. I say so in the first paragraph:
" Our goal was to find a program that would encode at a reasonably high quality level (~1GB per hour was the target) and require a minimal level of expertise from the user."
That's why MediaCoder results weren't included.
The entire article came about because Cyberlink's iPhone 4S preset yielded files that were 1.4GB if I used CPU encoding or a GTX 580, and 188MB if I used Quick Sync. That disparity is what I noticed when I went to check encode quality for the initial IVB review.
Can you build custom profiles in CME and create outputs that avoid these problems? You can -- though some options aren't available. That, however, is not the point. If I'm going to build my own custom profiles, I can download a copy of MediaCoder for free and do it with a more powerful piece of software that offers a huge number of options.
I did a review of "Software that claims to automate the GP encode process." I did not do a review of "Can Cyberlink MediaEspresso EVER create a decent image?" Given what I set out to evaluate, my ability to tweak profiles to achieve a satisfactory result is not a valid criteria for my conclusions.
Re:Lack of standards, quality. by parlancex · 2012-05-08 14:55 · Score: 5, Informative

Hint: Not all GPUs have IEEE FP compliant math. Often they break the standard, or do something else altogether just to improve performance.
I can't speak for ATI, but actually all FP32 math on Nvidia architectures for many generations now has been IEEE compliant, excluding NAN and -inf +inf and exception handling cases, and except for their hardware sin, cos, log implementations, and except when using the fused multiply add instruction (though the last one you could actually get around by using special compiler intrinsics to avoid the fusing).