Slashdot Mirror


Facts and Fiction of GPU-Based H.264 Encoding

notthatwillsmith writes "We've all heard a lot of big promises about how general-purpose GPU computing can greatly accelerate common tasks that are slow on the CPU — like H.264 video encoding. Maximum PC compared the GPU-accelerated Badaboom app to Handbrake, a popular CPU-based encoder. After testing a variety of workloads ranging from archival-quality DVD rips to transcodes suitable for play on the iPhone, Maximum PC found that while Badaboom is significantly faster than X264-powered Handbrake in a few tests that require video resizing, it simply can't compare to the X264-powered Handbrake for archival-quality DVD backups."

79 comments

  1. makes sense to me by perlchild · · Score: 1, Interesting

    Wouldn't archival-quality backups be actual MPEG instead of H.2 or whatever? I mean if you're archiving, why go lossy?
    Is it just a badly-designed test?

    1. Re:makes sense to me by Silverlancer · · Score: 4, Informative

      All MPEG formats (including H.264) are lossy; if you want lossless, use HuffYUV, Lagarith, or FFV1 (or one of a countless variety of similar proprietary formats, such as Sheer YUV). Of course, this will give far larger file sizes, for obvious reasons.

    2. Re:makes sense to me by Silverlancer · · Score: 5, Informative

      And it seems that I made a slight oversight here also; --qp 0 in x264 (in the standard as qpprime_y_zero_transform_bypass_flag) is set, H.264 can indeed be a lossless format too, making it the only MPEG video format with a lossless mode.

    3. Re:makes sense to me by perlchild · · Score: 2, Informative

      I was referring to the idea of encoding a lossy format in another lossy format, resulting in further losses. Not necessarily just the loss of the original lossless-to-lossy. Sorry if I was unclear.

      Seriously, why encode twice? And why rate performance on how fast you can lose bits?

    4. Re:makes sense to me by Silverlancer · · Score: 2, Informative

      Since Badaboom is a baseline-only encoder, I would guess one of its main markets would be to backup movies in a format that can be played by iPods or similar.

    5. Re:makes sense to me by evilviper · · Score: 4, Informative

      Wouldn't archival-quality backups be actual MPEG instead of H.2 or whatever?

      You may have a point, or you might not. Depends on the definition of "archival", and your specific purpose for doing so. I imagine most historians who deal with digital data would scoff at your conflating the terms used to describe their work, with some home user who just wants to back-up their DVDs...

      There's certainly going to be loss, when encoding from MPEG-2 DVDs to H.264. But considering how ridiculously large DVD video is for the relatively small amount of data it contains, I'd say a tiny drop in quality is generally acceptable in exchange for reducing the storage space required for near-as-high-quality backups of your DVDs in (eg.) 1/10th the space.

      Don't quote me on that, though, it's just a hypothetical example. I just recently finished explaining, here, why H.264 isn't all that much more effective than MPEG-2 where indistinguishable/high-quality (rather than just "watchable") is desired: http://slashdot.org/comments.pl?sid=956141&cid=24940379
      In fact, you could probably re-compress a DVD with MPEG-2 (instead of H.264) and get equivalent quality at almost equally low data-rates, simply because the DVD producer's MPEG-2 encoders are terrible, and the settings they use (GOP size, fixed resolution/black borders, high frequency noise, etc.) waste a LOT of the bitrate on things which really don't improve visual quality.

      And to be a bit pedantic... H.264 is, in fact "MPEG". It's MPEG-4 AVC (Part10), while DVDs use MPEG-2.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    6. Re:makes sense to me by evilviper · · Score: 4, Informative

      All MPEG formats (including H.264) are lossy;

      H.264/AVC includes lossless compression as well as lossy. The same is true for the wavelet based "snow" codec. Still, I'd recommend FFV1 for best compression, as long as you don't need the video to be playable by all the standard H.264 decoders out there.

      if you want lossless, use HuffYUV, Lagarith, or FFV1 (or one of a countless variety of similar proprietary formats, such as Sheer YUV). Of course, this will give far larger file sizes, for obvious reasons.

      This test is about reencoding from a DVD to H.264/AVC. If you want lossless quality, you need only copy the MPEG-2 stream... Reencoding to a lossless format will dramatically increase the file size, without any quality improvement.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    7. Re:makes sense to me by Anonymous Coward · · Score: 0

      Watch out! At least HuffYUV is NOT lossless - if your data is RGB, 4:4:4 etc. per pixel chroma. Got burned with it myself.

    8. Re:makes sense to me by Silverlancer · · Score: 2, Informative

      HuffYUV supports RGB and YUY2, and the ffmpeg extension (FFVHUFF) supports YV12 too.

    9. Re:makes sense to me by Anonymous Coward · · Score: 0

      While I disagree about the H.264 vs MPEG2 part (I think codecs such as x264 are FAR better than *ANY* MPEG2 encoder out there), DVDs are a very wasteful medium. It's trivial to make a very high quality backup, in a fraction of the size.

      Most DVDs:
      -don't quite fill a DVD9 in the first place, and DVD5's aren't uncommon either
      -then often waste 1GB or so over multiple space-consuming audio tracks (DTS/AC3 5.1/AC3 2.0/Musicam/etc) -- sometimes in more than one language, sometimes also director's comments, etc
      -the menus are often encoded at a fairly high bitrate (I've seen plenty of menus using more than 500MB)
      -more often than not, some bitrate is wasted on encoding black bars (you only have so many options when you want to set the aspect ratio)

      Remove the menus and unecessary audio tracks, crop the black bars, and you're already at like half the space. Now just use any decent codec/encoder with good settings, and you should be able to get great results (nearly identical) in even smaller.

      In 9GB or so (DVD9 size), I expect *at least* 720p quality. Such huge sizes for 480p (or i...) is just ludicrous.

    10. Re:makes sense to me by Anonymous Coward · · Score: 5, Informative

      I don't know what your source is, but MPEG-2 can't even APPROACH MPEG-4 AVC quality at the same bitrate (at low bitrate), and MPEG-4 AVC can produce a much more compact file for a specified quality (such as where DVD-quality or better). On the other hand, MPEG-4 is much more recent, and takes an order of magnitude more processing power to encode and decode. MPEG-4 uses much improved intraframe compression, variable-size macroblocks, and more advanced descriptions of block motion. Even if we drop the issue of MPEG-2 support for B-frames and limits on P/B frames per GOP (limited by the MPEG-2 profiles, which could be ignored), MPEG-4 is much more efficient at removing redundant information. Finally, MPEG-4 adds more advanced entropy coding for the final lossless compression of coefficients, etc after lossy compression is performed -- the CAVLC coding is an improvement on MPEG-2's standard variable-length coding. CABAC's arithmetic coding is even more efficient than CAVLC.

      MPEG-4/AVC was intended to deliver comparable quality to MPEG-2 at half of the bitrate, and certainly succeeds at low bitrates. At higher bitrates (near-perfect picture quality), you certainly would have been right about the Advanced Simple Profile for MPEG-4 (used in Divx, Xvid, etc), but AVC should still be more efficient.

      Incidentally, the MPEG-2 profile allowed in DVDs was picked to ease the work of the decoding hardware (savings on cost for consumers), at the cost of compactness. The fixed resolutions, bit rate limitations (both max and min bitrates), and GOP limits make it much easier to create a compatible hardware decoder. Yes, they can sometimes significantly decrease compression, but they made early DVD players marketable. Within these significant limitations, the studio-grade encoding software and technicians are PHENOMENAL at delivering maximum quality. If you're used to consumer grade MPEG-2 encoding, something like the pro version of Cinema Craft Encoder is a revelation (an expensive one though -- nearly $2K). See if you can sniff up a trial or demo, and compare the output quality to premiere.

    11. Re:makes sense to me by Silverlancer · · Score: 2, Interesting

      In my experience HCEnc, a freeware encoder (not open-source though), tends to beat CCE quality-wise. Most of Doom9 seems to agree, though I don't think the differences were too dramatic.

    12. Re:makes sense to me by The+End+Of+Days · · Score: 1

      Oh no, not the scoffing of historians. That's almost as bad as the whispered derision of computer nerds.

    13. Re:makes sense to me by khellendros1984 · · Score: 3, Funny

      What are you talking about?!?! I do the audio equivalent all the time! I download 64kbps MP3 files, and reencode them to 320kbps to regain the quality! Screw information theory! I'm an audiophile, I can tell the difference!

      --
      It is pitch black. You are likely to be eaten by a grue.
    14. Re:makes sense to me by Anonymous Coward · · Score: 0

      To be fair, the small GOP and frequent I frames serves other purposes too. With DVD and broadcast it allows you to fast forward and rewind easily. Changing channels is also much faster because you will hit an I frame very quickly. Now if you don't need to skip around a video much, then not limiting the gap between I frames can significantly lower the bitrate of your video. Many movies have scenes that can be a minute long with no camera change, which generally means a minute with not needing an I frame.

    15. Re:makes sense to me by evilviper · · Score: 2, Informative

      Yours is the kind of response that I hate getting the most. You obviously didn't bother to read my post all the way through, AND most certainly didn't follow the link I provided where I explained everything in detail...

      Yet, you spend time on a lengthy, indignantly reply, where you proceed to waste both your and my time, with questions I've already answered, in-depth. It only makes it more sad to know that your pointless rant got modded up. Anyhow, I'm going to skip those which you could already have read the answer to, and just cover the other points.

      arithmetic coding is even more efficient

      It's a nice addition, but the improvement is rather small... Less than 10% in the best case. Not ground breaking.

      but AVC should still be more efficient.

      Of course is, but only minimally, as I've said.

      The fixed resolutions, bit rate limitations (both max and min bitrates), and GOP limits make it much easier to create a compatible hardware decoder.

      Not really. The GOP size could be up to 137 frames IIRC (almost 10x the standard 15-18 size), and compatible with all MPEG-2 hardware. That's the minimum required for IEEE-1180 compatibility, so all decoder chips can manage at least that much without problems.

      More (wider than 16/9) aspect ratios could have been included to avoid needing black bars on every single DVD.

      Within these significant limitations, the studio-grade encoding software and technicians are PHENOMENAL at delivering maximum quality.

      Utterly WRONG...

      Every time you see edge-noise encoded on a DVD, you're seeing sheer human stupidity in action. Every time you see black bars that don't fall on a macroblock (16 pixel) boundary, you're seeing a HUGE waste of bits.

      These problems are universal to damn near ALL DVDs, even though it's absolutely trivial to avoid. The "technicians" involved have NO IDEA what they are doing, and waste a huge amount of bits, and significantly lower quality, because of it. It's just incredibly fortunate DVDs have such a huge amount of bandwidth that these idiotic mistakes can be covered up by increasing the bitrate further. Of course, if they try to squeeze a film onto a single-layer DVD for cost, or include a lot of extras, then the video starts looking pretty lousy because of it.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  2. It seems even this article has a few fictions. by Silverlancer · · Score: 5, Informative

    To begin with, x264 blows the water out of Badaboom in terms of speed when similar settings are used. Badaboom appears to use the rough equivalent of --aq-mode 0 --subme 1 --scenecut -1 --no-cabac --partitions i4x4 --no-dct-decimate in terms of x264 commandline... its no wonder its "fast" when they compare it to x264 on far slower settings!

    GPU encoders won't be able to compete with CPU encoders until they either get a lot faster (in which case they'll compete in the "high performance" market) or they get much better quality, since at sane settings x264 unsurprisingly blows Badaboom out of the water quality-wise, too. Until then, the product is not only completely proprietary but furthermore simply inferior, and they're going to have a very hard time marketing it.

    1. Re:It seems even this article has a few fictions. by evilviper · · Score: 4, Informative

      To begin with, x264 blows the water out of Badaboom in terms of speed when similar settings are used.

      If you'd RTFA, you'd see this disparity is repeatedly mentioned, and they attempted to make a fair comparison.

      In a direct comparison, using as close to the same visual quality settings as we could, Handbrake's circa February 2008 X264 codec actually beat the Elemental encoder by almost a minute. Image quality was roughly the same; we've included several stills below so you can directly compare the results.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    2. Re:It seems even this article has a few fictions. by Silverlancer · · Score: 2, Informative

      Yet that line brings up yet another problem--they're using the absolute latest software from Elemental, but are using a 7-month-old version of x264 that is lacking an enormous number of recent improvements. Its anything but a fair test.

    3. Re:It seems even this article has a few fictions. by DarkHorseman · · Score: 2, Funny

      4 --aq-mode 0 --subme 1 --scenecut -1 --no-cabac --partitions i4x4 --no-dct-decimate in terms of x264 commandline... its no wonder its "fast" when they compare it to x264 on far slower settings!4

      Do I lose nerd points for this looking like spanish?

    4. Re:It seems even this article has a few fictions. by NorQue · · Score: 1, Funny

      It should actually look like arcane command line magic to you. Points deducted.

      This is what spanish looks like:

      "esto parece como español"

      ;)

    5. Re:It seems even this article has a few fictions. by Ed+Avis · · Score: 2, Funny

      As long as you can read some Spanish text and it looks to you like assembly language for some long-dead processor, you retain your nerd points.

      --
      -- Ed Avis ed@membled.com
    6. Re:It seems even this article has a few fictions. by Garganus · · Score: 1

      Speaking of the test before matching actual load vs. using defaults, am I the only one perturbed by the same movie being encoded twice by different engines but at the same fixed bitrate ...somehow coming out at different sizes? ...and by a couple hundred megs!
      constant unit time of media * (constant unit data / constant unit time) == inconsistent unit data

    7. Re:It seems even this article has a few fictions. by afidel · · Score: 1

      Eh, I know that for LAME unless you specify strict cbr you get an average bit rate that attempts to be close to what you specified. If one encoder tries to be below the target bitrate and the other attempts to provide better quality at the expense of larger file size I can see how they would diverge, even significantly.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    8. Re:It seems even this article has a few fictions. by DarkHorseman · · Score: 1

      Awlll 50 points from Hufflepuff :(

  3. I'll say it again: X264-powered Handbrake! by Anonymous Coward · · Score: 0

    Frist psot!

    "the-alamo-doesn't-have-a-basement dept."

    Um, what???

  4. Give it time - it is CPU bound right now by lee1026 · · Score: 1

    The CPU usage of the program when used with a good video card is 25% on my quad core machine, implying it is CPU bound right now. That means if they can get the CPU overhead down, even a little bit, they will stand to get huge gains.

    1. Re:Give it time - it is CPU bound right now by Enry · · Score: 2, Insightful

      Wait, what?

      If the CPU were running at 100%, then it would be CPU bound. Perhaps you meant to say it's GPU bound?

    2. Re:Give it time - it is CPU bound right now by Stumpeh · · Score: 2, Informative

      But not if it's only running on a single core. Then it'd obviously max out at 25% on a quad core machine, provided he's got nothing else running.

    3. Re:Give it time - it is CPU bound right now by Anonymous Coward · · Score: 0

      It is CPU bound -- he's running a quad-core machine, so a single process can only take 25%

  5. Obvious by evilviper · · Score: 4, Interesting

    This is the most obvious and boring insight they could possibly offer... Everyone with the slightest interest knows this already.

    The low quality of hardware-based video encoder cards is a very well-known fact, and those MPEG encoders cards are just ASICs on a PCI card, almost exactly the same hardware as your video card.

    The point of offering up APIs for GPUs, and AMD's attempt to integrate the GPU ASIC with the CPU via HyperTransport, is aimed at improving things, however.

    x264 does a good job because it's an open source project, with several skilled and interested individuals continually tweaking the code to improve quality and performance. Once hardware-based video encoding routines aren't hidden in closed-source firmware on a dedicated card, the same development effort can step up and improve HARDWARE encoding now, exactly as they have with software.

    Not only can quality be significantly improved, you can expect performance to improve significantly as well, even with greater quality. The initial implementation of any codec is always relatively poor performing, and low quality, so this wouldn't even be an insightful observation if it was comparing x264 with any other software based encoder... The only difference is that a new software h.264/AVC encoder would be SLOWER than x264, as well as being much lower quality.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  6. Compression isn't really parallel by SpazmodeusG · · Score: 2, Informative

    To know how the next pixel should be compressed you must know the statistical likelihoods of the previous pixels. So compression is a really linear operation. You could have threads that work from each keyframe of the video independently but that still isn't ideal for graphics cards.
    From the CUDA guide-
    "Every instruction issue time, the SIMT unit selects a warp that is ready to execute and issues the next instruction to the active threads of the warp. A warp executes one common instruction at a time, so full efficiency is realized when all 32 threads of a warp agree on their execution path."
    So if you have code that isn't SIMD-able you are really only using 1/32 available threads per unit of branching code.

    1. Re:Compression isn't really parallel by SeekerDarksteel · · Score: 4, Informative

      Uh...you space multiplex rather than time multiplex to parallelize encoding. Motion estimation, e.g., is quite parallelizable.

      --
      The laws of probability forbid it!
    2. Re:Compression isn't really parallel by Silverlancer · · Score: 1

      Most of the operations in video encoding are most definitely parallelizable on both a large and small scale, both in regards to frame-based threading, used by many encoders and decoders, but also especially in regards to SIMD (x264 has tens of thousands of lines of handwritten assembly).

    3. Re:Compression isn't really parallel by SpazmodeusG · · Score: 1

      Fair enough, it does indeed use SIMD instructions.
      One thing I notice though is that the SIMD instructions are used for the modelling the data and creating statistical probabilities for what the next lot of data will be. Other aspects such as the arithmetic/variable length encoding are very linear.
      So it follows a loop
      {
      Get data block (linear)
      Model data (SIMD-able)
      Statistically Predict (SIMD-able)
      Entropy encode (linear)
      Write encoded data block (linear)
      } while( there's data )


      That entire loop must be run sequentially and whilst individual elements of that loop can be helped by using parallel processing the algorithm certainly isn't parallel in its entirity.
      So what is really needed for this type of encoding is a processor that can handle some SIMD for a short while and can quickly pass back the results of the SIMD operations to a processor that can handle linear operations very effectively.
      This to me is an example of something that is ideal for a mainstream CPU with some SIMD ability built-in rather than a GPU which wants everything to be SIMD-able.

    4. Re:Compression isn't really parallel by Silverlancer · · Score: 1

      That isn't really how video encoding works; the only "probabilities" are in the CABAC entropy encoder, which is handled via the method of arithmetic coding (which indeed isn't SIMD'd).

    5. Re:Compression isn't really parallel by philipgar · · Score: 3, Informative

      uh huh, tens of thousands of lines of asm....

      ~/x264-snapshot-20080812-2245/common/x86$ wc -l *.asm
            165 cabac-a.asm
              91 cpu-32.asm
              51 cpu-64.asm
            437 dct-32.asm
            223 dct-64.asm
            316 dct-a.asm
            874 deblock-a.asm
            659 mc-a2.asm
            933 mc-a.asm
            428 pixel-32.asm
          1615 pixel-a.asm
            600 predict-a.asm
            383 quant-a.asm
            968 sad-a.asm
            519 x86inc.asm
            124 x86util.asm
          8386 total

    6. Re:Compression isn't really parallel by Macman408 · · Score: 1

      So if you have code that isn't SIMD-able you are really only using 1/32 available threads per unit of branching code.

      In addition to what's already been said, there are other techniques that can be used when your code does in fact need to branch. For example, you can take BOTH paths, and then later pick the result from the path you want. This is common when you have lots of parallel hardware, whether made for you in a GPU, or in hardware you're designing yourself, like an ASIC or FPGA. So if you have

      if( A ) {
        Z = B + C;
      } else {
        Z = B - C;
      }

      then you have instructions (or hardware) that perform B+C, separate instructions (or additional hardware) that perform B-C, and then a special instruction (in a GPU) or a multiplexer (for an FPGA or ASIC) that selects one of those two results (based on A) and puts it in Z. (I just wanted to add another set of parentheses here because I didn't think there were enough, so you can ignore this side remark.)

      In this way, you do most of your computation in parallel (assuming your computation is a little bit more complex than just an addition/subtraction), although you end up doing twice as much work. Obviously, there's a limit to how many different paths you can compute this way before you lose the advantage.

    7. Re:Compression isn't really parallel by J_Darnley · · Score: 1

      Please upgrade, you are missing 433 lines.

    8. Re:Compression isn't really parallel by Silverlancer · · Score: 2, Informative

      x264 uses an abstraction method in order to lump enormous amounts of assembly into very small amounts of space. But when all the macros are expanded, it gets much, much, much larger. For example, almost all the SSE/MMX assembly is abstracted away into macros, so a few macros can be used to take a single generic function and expand it into SSE or MMX. Same with 32-bit vs 64-bit. When you expand it all fully, it is indeed tens of thousands of lines.

    9. Re:Compression isn't really parallel by nullchar · · Score: 1

      Yeah but those tens of thousands of lines aren't exactly hand-coded then are they? It appears the developers have only hand-coded the ~9k lines as listed above.

      Still, that is a fair amount of assembly code done by hand, relative to most modern programs written in 3rd and 4th generation languages (that might use only a handful of hand-coded assembly).

    10. Re:Compression isn't really parallel by Silverlancer · · Score: 1

      They were hand coded before we used the macros to abstract them (which involved deleting over half the code!). Of course, that's not all of it, since a significant amount of assembly has been written since the abstraction was done. Though you're also not counting the Altivec assembly, which is rather significant also.

  7. Apples and Oranges by Louis+Savain · · Score: 1, Interesting

    Comparing a GPU, an SIMD (single instruction, multiple data) vector processor, to a CPU, a superscalar sequential processor, is like comparing apples and oranges. Sure, they are both fruits but they don't taste the same. Using the term 'general-purpose' to describe a GPU is pushing the limits of what a GPU is. Certainly, it can run general-purpose programs but much faster at running what it was designed to run, data-parallel applications. A GPU does not have to have a fast clock because it makes up for it by doing a lot of operations in parallel.

    A CPU, OTOH, can have a very fast clock but even if it has a superscalar architecture, it cannot come close to the performance of a GPU on data-parallel apps.

    It is obvious that neither the GPU nor the CPU are universal processors and, IMO, that is an unforgivable sin. Having both of these types of processors in the same machine is asking for trouble. They require two incompatible programming models. Programming such a beast is like pulling teeth with a crowbar. Only a few are good at it and that is not good for the industry. What is needed is a fast vector processor that can run in MIMD (multiple instruction, multiple data) mode. This way, it would have no trouble running general-purpose apps just as fast as data-parallel apps. The problem is that such a processor would need a radically different programming model, one that is specifically designed for fine-grain MIMD prosessing. None of the current programming tools would work with it. Still, that's the future of parallel computing. There is no getting around this.

    Herading the Impending Death of the CPU

    1. Re:Apples and Oranges by evilviper · · Score: 1

      Comparing a GPU, an SIMD (single instruction, multiple data) vector processor, to a CPU, a superscalar sequential processor, is like comparing apples and oranges.

      To be fair, modern superscalar CPUs, particularly x86 (or x86-64), have extensively optimized SIMD units, in addition to their sequential/general purpose operations. The very reason Core2 outperformed its Opteron counterparts is because of much better SIMD performance. That generally means SSE instructions, but there are other options as well. And you can be absolutely sure x264 heavily utilizes those SSE instructions, in addition to every other feature of the CPU it can.

      Having both of these types of processors in the same machine is asking for trouble. They require two incompatible programming models.

      Integrating SIMD instructions into CPUs seems to have gone off without a hitch, and no world-ending upheavals. Similar, while it would be a mess to program specifically for an ASIC/GPU, it wouldn't be at all difficult to just have a few "multimedia" instructions in your code, and have the compiler opt to route them to the CPU for processing. ie. you're still programming for the CPU, and only using the GPU as if it's an independent CPU subsystem, much like an x87/FPU.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    2. Re:Apples and Oranges by Silverlancer · · Score: 3, Interesting

      This isn't 1990 anymore; CPUs have SIMD just as graphics cards do. A modern CPU doing even a brute-force exhaustive motion search can come out on par with a GPU in terms of performance. And if you use sequential elimination instead of a brute-force search (which gives a mathematically equivalent output), a single Core 2 Quad can outperform a quad-SLI set of top-end graphics cards. Sequential elimination, however, despite being SIMD-able, is not well-suited to the threading model of CUDA and similar APIs, and so probably cannot be implemented reasonably on a GPU.

      This concept applies to many algorithms--the brute-force method is easily implementable on a GPU, but a faster and algorithmically smarter method is not well-suited to such an architecture.

    3. Re:Apples and Oranges by Louis+Savain · · Score: 0

      Integrating SIMD instructions into CPUs seems to have gone off without a hitch, and no world-ending upheavals.

      Maybe not the end of the world but if a solution already existed as you seem to imply, Intel, AMD, Microsoft and the others would not be spending tens of millions of dollars a year to find a solution. The parallel programming problem is not just a problem. It is a crisis. Stanford Computer science professor, Kunle Olukotun, said recently, "If I were the computer industry, I would be panicked, because it's not obvious what the solution is going to look like and whether we will get there in time for these new machines" (source: CNN Money).

    4. Re:Apples and Oranges by evilviper · · Score: 2, Informative

      but if a solution already existed as you seem to imply, Intel, AMD, Microsoft and the others would not be spending tens of millions of dollars a year to find a solution. The parallel programming problem is not just a problem. It is a crisis.

      You seem to not understand the difference (or that there is a difference) between multi-threaded programming, and SIMD data processing.

      The former requires dividing a single application up into independent parts (threads), where no one part needs to wait for the output of another, yet all are doing important processing, that comes together in the end to accomplish something the user wanted.

      The later simply requires recognizing what algorithms you are performing repeatedly and sequentially in a program (like video processing) and using the appropriate instructions to send those commands to the SIMD unit in the CPU (or, potentially, the GPU, as I was hypothesizing in my last comment) so that they will be performed much more quickly, rather than each operation sequentially.

      The former (threading) is a complex problem, which isn't remotely as earth-shattering as the crazy tech press would have you believe. It's just biting the CPU manufacturers in the ass, because that's the only way they know how to make more money, and the rest of the world isn't coming along as quickly as they'd like. But most importantly, it isn't useful for, nor relevant to GPUs.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    5. Re:Apples and Oranges by Louis+Savain · · Score: 0

      You seem to not understand the difference (or that there is a difference) between multi-threaded programming, and SIMD data processing.

      I don't knwo what I wrote in my previous comments that led you to believe that I may not know the difference between multithreading and SIMD. You should read my blog. I can't stand threads. Nice talking to you.

    6. Re:Apples and Oranges by Bert64 · · Score: 2, Informative

      Yes, Core2 seems to have much better SSE units than the AMD chips, but this only really manifests itself when running code optimized to use SSE... And that's usually hand optimized assembly, as compilers aren't generally good at generating SSE code yet.

      John the ripper SSE2 mode on a core2 is 2-3 times faster than the generic compile...
      John the ripper SSE2 mode on an AMD (tested on a quad core phenom and dual core opterons) is slightly slower than the generic compile with gcc 4.3 and -O3.

      The core2 beats a similarly clocked phenom by a significant margin on the sse2 code (2.3ghz cores, 2200k vs 1600k per core) but the AMD is considerably quicker running the gcc compiled code

      The big question is, how much of the code you run is optimised for the SSE units found in modern processors, and how much of it uses it at all? How much of the precompiled software you download is compiled to run on a 386, and thus makes no use whatsoever of modern processor features?

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    7. Re:Apples and Oranges by evilviper · · Score: 1

      compilers aren't generally good at generating SSE code yet.

      GCC certainly isn't, but GCC is more or less the slow dog in the race. ICC does quite a bit better.

      And it doesn't necessarily have to be hand written ASM. Intrinsics seem to be gaining a bit more popularity in modern programs.

      The big question is, how much of the code you run is optimised for the SSE units found in modern processors, and how much of it uses it at all?

      I'd bet a significant portion of the CPU-intensive programs out there, particularly multimedia, can utilize it. x264 and the like certainly can.

      How much of the precompiled software you download is compiled to run on a 386, and thus makes no use whatsoever of modern processor features?

      You can "-mtune" for modern CPUs to get a big boost when running on those CPUs, while still retaining full binary compatibility with x386 systems.

      I still think AMD CPUs are generally a better choice, but mainly because of power consumption and multiprocessor memory bandwidth. Sheer number crunching performance usually isn't paramount.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  8. Still in infancy... by slashuzer · · Score: 1

    It will take at least another 18 months before GPU encoding becomes seamless and the ideal solution for most users.

    Intel is working on its own GPU, I am sure that they will exploit multimedia handling capabilities (video/photoshop) as one of the selling points of that GPU.

  9. Clone DVD Mobile by DigiShaman · · Score: 0

    I've said it before, and I'll say it again. Slysoft's Clone DVD Mobile rocks! Just point to your ripped DVD files, choose your profile (PSP, Ipod, PS3...etc), resolution, and quality. Let it bake in the oven for a few hours and it's done.

    And yes, I purchased the software after the trial period because I love it so much.

    Basically, it's a GUI interface to the mencoder application that's freely available.

    --
    Life is not for the lazy.
    1. Re:Clone DVD Mobile by MacColossus · · Score: 2, Insightful

      So if you had tried Handbrake before posting you would see you don't need to first rip the dvd's. You wouldn't have to buy slysoft. You furthermore would be able to choose ipod, psp, etc as a setting for output.

    2. Re:Clone DVD Mobile by Chris+Snook · · Score: 4, Funny

      So you paid money for a GUI that selects command-line options?

      I'm in the wrong line of work.

      --
      There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
    3. Re:Clone DVD Mobile by Silverlancer · · Score: 1

      There is a huge business made around building payware GUIs that (often silently, without giving any credit, sometimes violating GPL/LGPL) do nothing but use open-source tools to do their work. This is especially true in video encoding where there is are almost no cheap proprietary tools--only the extremely widely used open source solutions and extremely expensive "professional" ones (with some rare exceptions like DivX and Nero). Usually these GUIs are much worse than the free ones, but a sucker's born every minute.

    4. Re:Clone DVD Mobile by DigiShaman · · Score: 1

      So you paid money for a GUI that selects command-line options? I'm in the wrong line of work.

      I appreciate the humor. But seriously, have you seen how long a command can be with all the extra switches and whatnot? It can get up to 180+ characters long!

      Sure, I could write batch files or script it. However, I'm always trying new options and feature combinations. For me, a simple GUI that I can run with is what I want.

      Simply put. My computer works for me. I shouldn't have to work for it.

      --
      Life is not for the lazy.
    5. Re:Clone DVD Mobile by WDot · · Score: 1

      But if you're choosing a "profile, resolution, and quality," you could use Handbrake for free. It does all of those things. If you don't want to touch command line stuff, don't. Handbrake's GUI will generate it all for you. Plus it's open source.

    6. Re:Clone DVD Mobile by yuna49 · · Score: 2, Informative

      You mean, like these?

      http://ffmpeg.mplayerhq.hu/shame.html

      I happened to look at ConvertXtoDVD the other day. While ffmpeg itself is licensed under the LGPL, ConvertXtoDVD also appears to use both libpostproc and libswscale which are both GPL. The ffmpeg licensing page states, "If those parts get used the GPL applies to all of FFmpeg."

      I don't see any LICENSE.txt file nor any mention of the GPL or the LGPL in the version of the product I downloaded. Running strings against the binaries looking for things like "public" doesn't bring it up either.

  10. Not a valid comparison by Anonymous Coward · · Score: 0

    I have been working on h.264 video codecs for a large organization for 4 years. This is not a valid comparison; it does not tell you only about GPU vs. CPU, because the differences between the code bases are enormous. Many parts of an h.264 encoder can be parallelized - motion estimation, transform, intra prediction/decision. Others can't - CABAC because of the algorithm, the deblocking filter because the standard is broken. If you take an optimized encoder and offload just the easy parts like motion estimation to a GPU you can get about 30-50% speed improvement. If you did a really good job and/or your original code wasn't very good you might get a 2x speed gain with the same encoder. I doubt you'll ever see much more than that though, with a general interface like CUDA or DX/OpenGL.

    1. Re:Not a valid comparison by maglor_83 · · Score: 1

      Not only that, but x264 is one of the very best h.264 encoders out there. You could compare it to most other CPU based encoders and it would also come up trumps. Does this mean encoding on a CPU is better than encoding on a CPU?

      Man I wish it was LGPL instead of GPL though.

    2. Re:Not a valid comparison by Anonymous Coward · · Score: 0

      x264 is a perfect example of software that should be gpl. and it pleases me that it is :)

    3. Re:Not a valid comparison by Silverlancer · · Score: 3, Interesting

      LGPL vs GPL is not actually a very big issue in my experience. I spent the summer working at Avail Media, a company that uses x264 for real-time 1080i/720p broadcast encoding for IPTV and cable television (and also funds a large portion of x264 development). They use x264 in their encoding boxes--yet their main application is proprietary! This is done by having an extremely simple open-source wrapper which is statically linked to x264; the raw frames to be encoded are passed to it over a pipe by the main program. This completely bypasses the limitations of the GPL without violating the spirit of it, since anyone who wants to can still read the source code of the wrapper, modify it, and recompile it as necessary and still use it with the main application.

    4. Re:Not a valid comparison by Sique · · Score: 2, Informative

      This is done by having an extremely simple open-source wrapper which is statically linked to x264; the raw frames to be encoded are passed to it over a pipe by the main program. This completely bypasses the limitations of the GPL without violating the spirit of it, since anyone who wants to can still read the source code of the wrapper, modify it, and recompile it as necessary and still use it with the main application.

      Moreso, that is exactly how proprietary software is supposed to interact with GPL software. See Mere Aggregation, especially the last paragraph:

      By contrast, pipes, sockets and command-line arguments are communication mechanisms normally used between two separate programs. So when they are used for communication, the modules normally are separate programs.

      --
      .sig: Sique *sigh*
    5. Re:Not a valid comparison by emj · · Score: 1

      I'm pretty sure that wouldn't fly very well in a court. You are still linking to the wrapper which has to be GPL. IANAL but you could state that includig the binary in your workflow is linking.

    6. Re:Not a valid comparison by Dr_Barnowl · · Score: 2, Interesting

      Their wrapper is required to be GPL ; but since they don't distribute it, the source distribution clauses are not in effect.

      Their commercial software pipelines frames into their wrapper ; they are separate processes, not linked, and thus their use does not violate GPL.

      Otherwise you could argue that because you opened a Word document in OOo, that Word was now required to be GPL because it had emitted data that was now being consumed by a GPL application.

    7. Re:Not a valid comparison by chrb · · Score: 1

      Their commercial software pipelines frames into their wrapper ; they are separate processes, not linked, and thus their use does not violate GPL

      Well, if they don't distribute, then the GPL indeed doesn't apply. But if it they do, then an argument could definitely be made that the GPL would apply if the GPL code were an essential part of the software as a whole (ie. it couldn't be replaced), and they were distributing both sets of code together as a single software suite. The GPL license doesn't say anything about code running in a different process being automatically excluded from the licensing requirements, it only talks about "derivative work", without specifying exactly what that means. The same argument exists over the question of closed source kernel modules - many prominent kernel developers believe that they're illegal. The GPL FAQ says:

      "I'd like to incorporate GPL-covered software in my proprietary system. Can I do this by putting a 'wrapper' module, under a GPL-compatible lax permissive license (such as the X11 license) in between the GPL-covered part and the proprietary part?

      No. The X11 license is compatible with the GPL, so you can add a module to the GPL-covered program and put it under the X11 license. But if you were to incorporate them both in a larger program, that whole would include the GPL-covered part, so it would have to be licensed as a whole under the GNU GPL.

      The fact that proprietary module A communicates with GPL-covered module C only through X11-licensed module B is legally irrelevant; what matters is the fact that module C is included in the whole."

    8. Re:Not a valid comparison by chrb · · Score: 2, Insightful

      There's no simple answer here. As the FSF say in the answer you link to "This is a legal question, which ultimately judges will decide." And you missed the rest of the answer following your quote "But if the semantics of the communication are intimate enough, exchanging complex internal data structures, that too could be a basis to consider the two parts as combined into a larger program." It could certainly be argued in a court that distributing an application that performs video encoding by transferring commands and data frames across an IPC link constitutes a derivative work, especially if there's no way for the encoding application to work if the GPL component were removed (as would be the case here). The mere aggregation clause was meant to apply more to clear cut cases like distributing a Linux distribution, where differently licensed unconnected software like, say, emacs and skype, could be incorporated on the same CD.

    9. Re:Not a valid comparison by m50d · · Score: 1

      Does too violate the spirit of the GPL. The point is that the main program should be GPL as well; if they wanted it to behave the way you describe, it would have been LGPL-licensed.

      --
      I am trolling
    10. Re:Not a valid comparison by Anonymous Coward · · Score: 0

      Stallman and friends argue that those cases are different (Other software but same situation), for two reasons:
      1) The company's software cannot function without x264, while Word can function without OpenOffice
      2) The GPL doesn't talk specifically about linking, but rather about derivative works. Static linking is just a sure way of identifying derivatives.
      Therefore the company's software is derivative from x264 and must adhere to the GPL.

      This is why GPL v2 includes exceptions for runtime libraries shipped with the OS and needed to run GPLd software. Otherwise, Sun et al could countersue saying Emacs is a derivative of Solaris, because it used Sun's C library back before GNU libc existed.

      They haven't been able to pursue this because:
      1) It messes up the popularity of GPLd servers. Any software that depends on one would have to be GPLd even if it ran on a completely different computer. Hasn't stopped them from going after runtime linkers of GNU readline though.
      2) Opposition from the community when they first proposed this for GPL v3. IIRC they are working on a modified license that does include it, but it's separate from GPL
      3) It totally loses when there is a standardized interface. All your company would have to do is to write another proprietary server program that receives the data via a pipe and writes it to disk uncompressed and can be used as a drop in replacement. That would mess up point 1 above, and you could argue that that was the original version, and your x264 driver is derived from this and your software and not the other way around.

    11. Re:Not a valid comparison by Silverlancer · · Score: 1

      No, the wrapper is not being linked to. Linking has a very specific meaning--and linking was not being done. Calling a binary via "exec" is not linking. If it was, the GPL would truly be a dangerous license! (And yes, Avail does distribute software containing GPL'd products, its not internal-use-only).

  11. They're not encoding, they're transcoding by Animats · · Score: 2, Informative

    They're not encoding video. They're transcoding it. They're starting from one compressed representation and outputting another compressed representation. (Now, with twice the artifacts!)

    The good test for this is football. The players, ball, and field are all moving in different directions. If the motion compensation gets that right, it's doing a very good job.

    1. Re:They're not encoding, they're transcoding by Silverlancer · · Score: 1

      No, they're encoding. Transcoding means you're reusing syntax elements from the original video to inform the encoder; i.e. you're not entirely decoding it (not repeating all the process of encoding). What they're doing is encoding, because they're decoding it entirely into a raw video stream, and then sending that into the encoder.

      I wouldn't say football's real challenge is motion either--motion search is a rather simple part of most encoders and IMO definitely not the biggest challenge. The challenge of football is not decimating the grass, which requires both a strong adaptive quantization algorithm and also benefits from psychovisual optimizations that bias towards sharpness (such as x264's experimental "psy-RD").

    2. Re:They're not encoding, they're transcoding by Deadplant · · Score: 1

      No, they're encoding. Transcoding means you're reusing syntax elements from the original video to inform the encoder;

      No, transcode means decoding one format and encoding into another format. You may have had a program or project that took advantage of shortcuts in that process but those techniques are not part of the definition of the word transcode.

  12. Apples and Oranges. by aXi · · Score: 0

    They say that they are comparing apples to apples ? Well not according to my humble opinion: closed- and open-source products are like apples are to oranges. First of all the open source product is usually not hampered by intentionally crippling due to MPAA restrictions. Secondly the closed source app can not depend on fixing problems at short notice.

  13. Forget GPUs by necro81 · · Score: 1

    Tell me when I can get a PCI card with a one or more Cell co-processors to do the heavy lifting.

  14. Which Graphics Card? by Jthon · · Score: 1

    Did anyone catch what GPU/graphics card they used? The article mentions they used a Q6600 ($185) as their test CPU but it makes no mention of which GPU they ran with.

    Did they run this on an 9800GT? 8800GT? 8600?

    To make this a fair comparison they should be running the test on a system with a quadcore and the lowest end GPU for the CPU test. Then run the same comparison on a low end Intel CPU (same price as that low end GPU from above) and a GPU priced about the same as their Q6600.

    This would fit better with comparing what NVIDIA's been claiming with their optimized PC campaign.

  15. gpu decoding... by Anonymous Coward · · Score: 0

    although this is a thread about GPU h.264 ENcoding, here is a program that uses the GPU for h.264 DEcoding: http://mpc-hc.sourceforge.net/