Slashdot Mirror


Which Open Source Video Apps Use SMP Effectively?

ydrol writes "After building my new Core 2 Quad Q6600 PC, I was ready to unleash video conversion activity the likes of which I had not seen before. However, I was disappointed to discover that a lot of the conversion tools either don't use SMP at all, or don't balance the workload evenly across processors, or require ugly hacks to use SMP (e.g. invoking distributed encoding options). I get the impression that open source projects are a bit slow on the uptake here? Which open source video conversion apps take full native advantage of SMP? (And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)"

26 of 262 comments (clear)

  1. x264 by Anonymous Coward · · Score: 3, Insightful

    x264 use slices and scales pretty well across multiple cores. I use it on windows via megui, but you could easily use it in Linux as well. You could use mencoder to pipe out raw video to a fifo and use x264 to do the actual conversion, for instance.

  2. Re:ffmpeg by Albanach · · Score: 4, Insightful

    Or just convert 2 videos at once, or 4 for a quad core etc. They did suggest they have lots to convert, and it's a pretty easy way to get all available cores working hard.

  3. Re:ffmpeg by Z00L00K · · Score: 1, Insightful

    And it may or may not be useful to actually rune more than one thread per kernel. It depends on the encoder and application how many threads you shall run, so the best is to test with 1, 2 and 4 threads per kernel.

    --
    If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
  4. Re:Simple... by Cyrano+de+Maniac · · Score: 1, Insightful

    I'm still not sure where this idea that "multi-threaded programming is hard" comes from. It's not. It seems that most people are just afraid of it because they're not familiar with it.

    Or perhaps I just overestimate the mental capacity of most programmers? Having looked at a lot of code, there may be merit to that theory.

    --
    Cyrano de Maniac
  5. Load balancing: Why? by DigitAl56K · · Score: 4, Insightful

    don't balance the workload evenly across processors

    Why is balancing the load evenly important, as long as one thread is not bottlenecking the others? Loading a particular core or set of cores might even be beneficial depending on the cache implementation, especially when other applications are also contending for CPU time.

    Sure, a nice even load distribution might be an indicator for good design, but it doesn't have to apply in every case. I don't think software should be designed so you can be pleased with the aesthetics of the charts in task manager.

    1. Re:Load balancing: Why? by Scottie-Z · · Score: 2, Insightful

      Because, ideally, all four cores should be running at 100% -- the idea is to make maximal use of your available resources, right?

    2. Re:Load balancing: Why? by DigitAl56K · · Score: 4, Insightful

      It's still possible to load all cores 100%.

      A video decoder that I'm working with, for example, currently uses only as many threads as necessary for real-time playback. So for example if one core can do the job only one core is used. If the decoder looks like it might start falling behind more threads are given work to do. Ultimately, if your system is failing to keep up all cores will be fully leveraged.

      However, so long as only some cores are required the others are 100% available to other processes, including their cache (if it's independent). I'm not sure how power management is implemented but perhaps it's even possible for the unused cores to do power saving, leading to longer batter life for laptops/notebooks, etc.

      the idea is to make maximal use of your available resources, right?

      No, the idea is to make the best use of your resources. I'm not trying to say that load balancing is wrong. I'm just saying that processes that don't appear to be balanced are not necessarily poorly designed or operating incorrectly.

  6. F(next) = F(current) + Delta(F(current:next)) by Lumenary7204 · · Score: 5, Insightful

    The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.

    Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.

    For example, suppose I have a table with four columns -- three holding input values (A, B, and C) and one holding an output value (X). If the data in a given row of the table has nothing to do with the data in any other row, multi-threading works efficiently, because none of the threads are waiting for data from any of the other threads. If I want to process multiple rows at once, I simply spawn additional threads.

    On the other hand, for data such as MPEG video, the composition of the next frame is equal to the composition of the current frame, plus some delta transformation - the changed pixels.

    This introduces a dependency which precludes efficient multi-threaded processing, because each succeeding frame depends on the output of the calculations used to generate the prior frame. Even if more than one core is dedicated to processing the video stream, one core would wind up waiting on another, because the output from the first core would be used as the input to the second.

    1. Re:F(next) = F(current) + Delta(F(current:next)) by Omega996 · · Score: 4, Insightful

      theoretically, couldn't an encoder scan the data stream for keyframes, chunk the data from keyframe to the next keyframe, and then queue up the keyframe+delta information for multiple cores? That way, each core has something to do that isn't dependent upon the completion of something else.
      i'd think that n-1 cores/threads/whatever to process the chunked data, and the last core/thread/whatever to handle overhead and i/o scheduling would run pretty nicely on a multi-core machine.

    2. Re:F(next) = F(current) + Delta(F(current:next)) by John+Betonschaar · · Score: 4, Insightful

      You could of course split each frame in slices, and process these in parallel. Or skip the video N frames between each core, with N being the number of frames between MPEG keyframes. Or have core 1 do the luma and core 2 and 3 the chroma channels. Or pipeline the whole thing and have core 1 do the DCT, core 2 the dequant etc. and have core 3 reconstruct the output reference frame while core 1 already starts the next frame.

      Plenty of ways to parallelize decoding, and even more for encoding...

    3. Re:F(next) = F(current) + Delta(F(current:next)) by init100 · · Score: 2, Insightful

      I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.

      Is this even needed if you use multi-pass encoding? At least for XviD, IIRC the first pass is used to accumulate statistics used to allocate the proper bit budget to each frame. Then the individual processes should be able to use the statistics file from the first pass to get the bit allocation for their current GOP in the second pass.

    4. Re:F(next) = F(current) + Delta(F(current:next)) by benwaggoner · · Score: 2, Insightful

      You can encode GOPs independently. I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.

      That's a pretty painful constraint for anything other than very flat constant bitrate encoding. You really want to be able to move bits between GOPs to optimize for consistant quality.

  7. Re:Simple... by everphilski · · Score: 1, Insightful

    Amen

    If you truly understand the problem domain you are operating in, parallelism becomes readily apparent. Implementing it isn't difficult even on old code, again, if you truly understand where the parallelism exists.

  8. Re:Simple... by Anonymous Coward · · Score: 0, Insightful

    And writing code that can scale and balance across n number of cores/threads is extremely hard.

    You're overgeneralizing. Sometimes it's hard, and sometimes it's dirt simple easy.

  9. Re:ffmpeg by m0rph3us0 · · Score: 4, Insightful

    On a two processor system this would result in multi-threading being off.

  10. Re:Simple... by Cyrano+de+Maniac · · Score: 3, Insightful

    Exactly. Too many people assume that any given programmer can write any given program. What isn't generally realized (at least by the masses) is that programming really is about acquiring expertise in a particular domain and then solving problems in that domain through the use of computer programs. Generally some of the most effective programs I've seen have been written, on their first pass, by a person with intimate domain knowledge, and mediocre programming/computer knowledge. The program then becomes a standout when someone with intense programming and computer architecture knowledge improves the code from there (they need not be a subject domain expert, but it helps).

    I do take issue with sexconker assuming that I "just don't get it". Heh. If s/he only knew. Whatever, no biggie. I do agree that distributed algorithms are generally more difficult to implement/design than non-distributed, but that's not exactly the same thing as serial versus parallel algorithms (non-distributed generally involves access to data through a common address space, distributed doesn't, though even those pseudo-definitions come up a bit short).

    Again and again I read in industry rags and on various web sites that multi-threaded programming is hard, and nobody knows how to do it, and that it's difficult to debug, and all that. I believe what they're really saying is "The set of programmers who are accustomed to multi-threaded programming/debugging is (relatively) small, and thus applications aren't going to make good use of the shift to multicore CPU packages." Familiarity with a skill, and the supply of labor familiar with said skill, is distinct from it being easy or hard.

    Anyway, I stand by my belief that parallel programming is not as difficult as most people are led to believe. Some problems don't lend themselves well to parallel solutions, or don't merit the added complexity, but many many of them do. In ten years time I predict that most computer programming education will assume the use of threading, and that anyone who isn't competent with threading will severely limit their own job prospects.

    --
    Cyrano de Maniac
  11. Re:ffmpeg by sick_soul · · Score: 2, Insightful

    Just want to inform you that threads nor any other
    multiprogramming mechanisms are necessary for
    responsive user interfaces,
    and that IO multiplexing in particular does not require
    threads at all.

    You can solve both with threads, but you don't have to.
    And in most common cases it is much better not to;
    it seems that threads continue to be one of the most
    misused and misunderstood of the programming concepts.

  12. Re:ffmpeg by Anonymous Coward · · Score: 1, Insightful

    or just set it to the number of cores, set all the threads to low priority and let the OS do the scheduling. You know, the way things have been done for years.

  13. Not as simple as you would think by sjf · · Score: 4, Insightful

    As other commenters have said, decoding video is not, per se, a trivially parallelized algorithm. Especially for modern codecs with lots of temporal encoding. MJPEG would be easily parallelized, buy you'd have to be dealing with fairly ancient sources...MediaComposer 1 for instance.

    However, there are different classes of "video app" that are good targets for parallelization. Real world video editing for instance: consider multiple streams of video with overlays, rotations, effects etc. Video and audio decoding can happen in parallel, you can pipeline the effects stages so that each effect is handed off to another core. Modern video editing systems do this with aplomb.

    I'm from the commercial end of this so, I can't comment much on open source alternatives. But I will say that a lot of the algorithms in certain products are highly tuned to the particular CPU type.
    And they're smart enough to distribute work across only as many cores as actually exist.

    Finally. Don't forget that optimization is hard. You have to consider the speed of the hard drive, the cost of sharing data between threads and cpu caches and a bunch of other real constraints. Any half decent cpu of the last five years or so can easily decode most video faster than it can be read and written to disk. So long as this is true, you won't get any benefit from parallelization.

  14. Re:ffmpeg by hedwards · · Score: 5, Insightful

    Apple has spent a lot of time and money convincing everybody that they don't sell PCs, they sell Macs. I'm not sure what the point of arguing with both the general public as well as Apple is.

    At this point, the term PC does not include Apple computers. It's a change to the definition which happens when the vast majority of people decide amongst themselves that the definition should change.

    In terms of the topic at hand, most video apps really should be capable of using multiple cores, tasks of this sort are quite easy to finish in parallel. Either by doing ever n frames or subdividing the image into a number of regions which can be completed separately and joined at the end before writing the frame to disk.

  15. Re:ffmpeg by Albanach · · Score: 2, Insightful

    I thought about that but, seriously, transcoding is usually CPU limited. I'd really suspect it'd take a lot of simultaneous encoding to make it I/O bound.

  16. Re:ffmpeg by 3vi1 · · Score: 5, Insightful

    No - HP did (for their calculators), way before there "was" an Apple.

    Also, I don't even think Apple marketing would agree with you - or they wouldn't have "I'm a Mac... and I'm a PC" adverts.

  17. Re:ffmpeg by TheLink · · Score: 2, Insightful

    Ah but figuring out "make" might require too much wetware CPU time for most people ;).

    "Why is it not working? Oops messed up tabs and spaces", etc.

    --
  18. Re:ffmpeg by slimjim8094 · · Score: 2, Insightful

    Perhaps. But threads are far more versatile - if they're done well.

    So our video app has a sound-processing thread, a video processing thread, and a UI thread. If it's implemented well (don't read or write twice, have a common buffer), it'll run with the same or near performance as a one-threaded program on a one-processor/core system.

    But on a multicore/processor system no extra work is needed to take advantage of the cores. If we have three cores, it'll run automatically across cores for a massive performance gain. And we automatically take advantage of scheduling improvements.

    Yes, it can be done crappily. But threads exist for a very good reason and writing your program in one thread is more complex and far, far less flexible

    --
    I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
  19. Re:ffmpeg by fprintf · · Score: 2, Insightful

    It seems we are in the same place years and years later. Way back when overclocking Celeron 400s was the rage, I bought a multi-processor motherboard to run twin Intel Pentium IIs. I bought a SuSE Linux package after reading that Windows 95 would not support dual processors... you can see where this is going... except for rolling my own kernel and a few other things (like compiling code), the system largely ran on one processor even with SMP turned on in the kernel.

    So it seems we have similar complaints 8 or more years later. How disappointing. I only wish I knew how to program to the level where I could help solve this.

    --
    This post brought to you by your friendly neighborhood MBA.
  20. Re:ffmpeg by Ultra64 · · Score: 2, Insightful

    Ok, I've got to hear this one.
    What the fuck does threading have to do with video quality?