Which Open Source Video Apps Use SMP Effectively?
ydrol writes "After building my new Core 2 Quad Q6600 PC, I was ready to unleash video conversion activity the likes of which I had not seen before. However, I was disappointed to discover that a lot of the conversion tools either don't use SMP at all, or don't balance the workload evenly across processors, or require ugly hacks to use SMP (e.g. invoking distributed encoding options). I get the impression that open source projects are a bit slow on the uptake here? Which open source video conversion apps take full native advantage of SMP? (And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)"
Use the -threads switch.
Interested in open source engine management for your Subaru?
transocde uses separate processes for everything.
My blog
You have to design it as SMP from the ground up, you cannot just hack in it later. Not to forget that multi-threated programming is hard. Give it a few years and there will be more OSS solutions. Multi core processors are not mainstream for that long.
x264 use slices and scales pretty well across multiple cores. I use it on windows via megui, but you could easily use it in Linux as well. You could use mencoder to pipe out raw video to a fifo and use x264 to do the actual conversion, for instance.
simple answer: www.x264.nl
...makes excellent use of multiple cores. It is however Mac-only. Interestingly, what it does is split a file into chunks and spawns multiple ffmpeg processes to do the conversion. Which is to say, perhaps you can do some (relatively simple) scripting with ffmpeg that will do the job.
The secret to creativity is knowing how to hide your sources. - Albert Einstein
x264 and avisynth can make pretty decent use of threads. check out meGUI.
x264 via meGUI from Doom9 is what I use to compress HD-DVD and BD movies - also on a quad core. I have some tutorials posted out and about on how I'm doing it. Near as I can tell you cannot dupe the process on Linux due to the crypto - Slysoft's AnyDVD-HD is needed.
Playback - I use XBMC for Linux. It is also SMP enabled using the ffmpeg cabac patch. the developers of this project have been VERY aggressive at taking cutting edge improvements to the likes of ffmpeg and incorporating them into the code. Since Linux has no video acceleration of H.264 SMP really helps on high bitrate video!
Build it, Drive it, Improve it! Hybridz.org
don't balance the workload evenly across processors
Why is balancing the load evenly important, as long as one thread is not bottlenecking the others? Loading a particular core or set of cores might even be beneficial depending on the cache implementation, especially when other applications are also contending for CPU time.
Sure, a nice even load distribution might be an indicator for good design, but it doesn't have to apply in every case. I don't think software should be designed so you can be pleased with the aesthetics of the charts in task manager.
Beggars can't always be choosers, but you can always be a prick.
But I actually really hope the person who "asked Slashdot" this dies in a fire. Honestly. Is Google THAT broken these days?
Yes, this is a troll. Mark it as such and feel the peace. I don't mean to "troll" as such, I don't care for replies. I just reiterate my first sentence. Fire, die in one. Use your God/creation given brain next time.
Since his suggestion was to do some scripting that does essentially what VisualHub does using ffmpeg I'm not sure I see how he missed the Open Source requirement.
Handbrake has always used both of the cores on my system for transcoding.
OP is asking for open source tools. You cited a commercial one that doesn't provide source.
VisualHub (the front-end app) may be closed, but ffmpeg is LGPL.
And the GP was suggesting using ffmpeg, not VisualHub.
How can I believe you when you tell me what I don't want to hear?
The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.
For example, suppose I have a table with four columns -- three holding input values (A, B, and C) and one holding an output value (X). If the data in a given row of the table has nothing to do with the data in any other row, multi-threading works efficiently, because none of the threads are waiting for data from any of the other threads. If I want to process multiple rows at once, I simply spawn additional threads.
On the other hand, for data such as MPEG video, the composition of the next frame is equal to the composition of the current frame, plus some delta transformation - the changed pixels.
This introduces a dependency which precludes efficient multi-threaded processing, because each succeeding frame depends on the output of the calculations used to generate the prior frame. Even if more than one core is dedicated to processing the video stream, one core would wind up waiting on another, because the output from the first core would be used as the input to the second.
Huh? I am using AGK and my CPU never does anything. It is always waiting for I/O. I must be doing something wrong...
And told him how it uses an open source program in an easily-replicatable way.
Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content). These are called keyframes (K) and the delta frames (called P and I frames) are generated against them. Because of this, it is really easy to apply parallel processing to video encoding.
Is there anything out there that can play a high-bitrate obese .mkv Blueray backup rip efficiently on 2 or 4 cores?
The mpeg algorithm is called DCT Cosine. If this is parallaizable, then mpeg encoding/decoding should be, although there is no way a general processor can beat an asic in silicon.
Part of the reason you find a lack of SMP is because it actually negatively impacts the quality of the encode (Though not greatly). Alot of the time it's there, but hidden. The encoder looks at the frames around the current one being encoded and changes it's output based on what is found, to make things run smoother on future frames. When you start adding additional threads you have to somehow break up the file into sections , or have each thread do sequential frames. The result being the encoder can't use it's wizardry to it's full effect.
You don't say if you're running on Windows or Linux or something else. If you are running on Windows, the latest versions of VirtualDub have made big improvements to SMT/SMP encoding.
VirtualDub home
VirtualDub 1.8.1 announcement
VirtualDub downloads
Make sure you grab 1.8.3 - 1.8.1 was pretty good, but had a few teething problems. 1.8.2 has a major regression which is fixed in 1.8.3. The comments in the 1.8.1 announcement contain a few important tips for using the new features (some of which I posted BTW).
The two major new features that would be of interest to you are:
1. You can run all VirtualDub processing in one thread, and the codec in another. This works very well in conjunction with a multi-threaded codec - this one change improved my CPU utilitisation from approx 75% to 95% on my dual-core machines - with an equivalent increase in encoding performance.
2. VD now has simple support for distributed encoding. You can use a shared queue across either multiple instances of VD on a single machine, or across multiple machines (must use UNC paths for multiple machines). Each instance of VD will pick the next job in the queue when it finishes its current job. Instances can be started in slave mode (in which case they will automatically start processing the queue).
I use 3 machines for encoding (all dual-core). With VD 1.8.x I start VD on two of the machines in slave mode, and one in master mode. I add jobs to the queue on the master instance, and the other two instances immediately pick up the new jobs and start encoding. When I've added all the jobs, I then start the master instance working on the job queue.
To achieve a similar effect on your quad-code, start two instances of VD on the same machine - one slave, the other master.
It's not perfect (if you've only got one job, you won't use your maximum capacity) but it has greatly simplified my transcoding tasks, and reduced the time to transcode large numbers of files.
I've noticed a lot of talk about commandline options, but not the nice guis that use them. Avidemux is open source, cross-platform, gives you a decent interface, and uses multithreaded libraries like ffmpeg and x264 on the backend to do the encoding, so it generally makes optimal use of your multicore system.
-- sudo.ca
As posted elsewhere, it is difficult to divide a project up that is really pretty linear. Instead, you should try to do more jobs at once. Encode four videos at once.
And surprisingly, this is not something that we can blame on BillG. XP has supported multiple processors from the beginning. Multi processor motherboards were just too expensive for the average consumer. It wasn't until the P4 Xtreme that multi core became a reasonably priced option. For once, Microsoft was actually ahead of the curve in providing support for a technology BEFORE the market really needed it.
If you do a lot of H.264 conversion, look into picking up a hardware encoder. There's the Turbo.264; it's Mac-only, but I'm fairly sure it's a rebranded PC device. Plug into a USB port, and it speeds up H.264 encoding -- even on single-core systems. Imagine that with your quad-core. It's not a free solution, but if you find yourself doing a *lot* of encodes, it may be worth your money.
(And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)
Tell you what...why don't you FedEx your videos to a developer and ask him to do it for you? I'm sure they'd be happy to help a nice, polite, motivated person like yourself.
Stasis is death. Embrace change.
As other commenters have said, decoding video is not, per se, a trivially parallelized algorithm. Especially for modern codecs with lots of temporal encoding. MJPEG would be easily parallelized, buy you'd have to be dealing with fairly ancient sources...MediaComposer 1 for instance.
However, there are different classes of "video app" that are good targets for parallelization. Real world video editing for instance: consider multiple streams of video with overlays, rotations, effects etc. Video and audio decoding can happen in parallel, you can pipeline the effects stages so that each effect is handed off to another core. Modern video editing systems do this with aplomb.
I'm from the commercial end of this so, I can't comment much on open source alternatives. But I will say that a lot of the algorithms in certain products are highly tuned to the particular CPU type.
And they're smart enough to distribute work across only as many cores as actually exist.
Finally. Don't forget that optimization is hard. You have to consider the speed of the hard drive, the cost of sharing data between threads and cpu caches and a bunch of other real constraints. Any half decent cpu of the last five years or so can easily decode most video faster than it can be read and written to disk. So long as this is true, you won't get any benefit from parallelization.
Any video re encode is gonna involve a bunch of steps. Run em' together on different cores. There ya go.
Once you actually learn something about this somewhat, in Linux anyway, black art you'll find you can use all your cores no problem.
The version of Cinelerra from heroinewarrior.com uses SMP. It's highly dependant on the supporting libraries & who implemented the feature. In the worst case, use renderfarm mode & nodes for each processor. Sometimes the libraries work in SMP mode & sometimes they don't. Sometimes the feature was intended for everyone to use on any number of processors & sometimes it was written for 1 person's cheap single processor.
Now I'm a bit curious.
Given that all of the "usual suspects" of encoding apps support SMP on almost every platform, and have done so for quite some time, what was this guy using that didn't support it?
ffmpeg and x264 are just about the only players in town these days.
-- If you try to fail and succeed, which have you done? - Uli's moose
Devede is a really good gui and adds a lot of functionality. And uses mencoder, which as of 1.0rc2 implements SMP quite nicely. I've been using it the last few days for family videos, on an AMD64 X2, and it is working flawlessly using both cores.
As posted elsewhere, it is difficult to divide a project up that is really pretty linear. Instead, you should try to do more jobs at once. Encode four videos at once.
Do you mean split a 100 movie into four 25 minute chunks, encode one chunk on each core, and concatenate them? Great idea.
Cheers. I also found these Acidrip patches. PS In case anyone missed it, I really meant to ask about the front end GUI/script tools rather than the engines. PPS I'm actually using Mandriva.
especially when dealing with tv shows there is often blank frames before commercial breaks. by scanning for several blank frames and then chopping at that point, the video file can easily be divided up into chunks for parallel processing.
I get the impression that open source projects are a bit slow on the uptake here?
Open source isn't alone in this regard. Many closed source applications also lag behind. Obviously there are exceptions but many apps just haven't caught up to multi-cores, whether that be just 2 (which is ancient tech by now) or 8 cores in a single system.
this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
But Mac users have been living with SMP since 2001
Just for reference:
UNIX System V R4-MP 1993
Windows NT 1993
OS/2 2.11 1993
Linux 2.0 1996
Avidemux
I may have missed something, but in light of the article here: http://tech.slashdot.org/article.pl?sid=08/05/31/1633214, and the wealth of information being offered in this topic, if you are willing to re-make something like ffmpeg to take advantage of the processing capability of your video card you may achieve tremendous efficiency for your task. (My test blew up from mis-managing memory, but before it did it dynamically allocated 22 or 23 threads..the results were uncertain because the system crashed before logging the current status. This is just a concept-learning test written off the cuff in Java, so a real engineered system ahould be able to do something significant. I will probably get back to it when my current workload slows down.) I'm assuming that if you're doing video work you don't have a lame video card, but the video card should be mostly idle during the conversion process.
"The mind works quicker than you think!"
Actually motion compensation parallelises very well and is one of the most time consuming stages in compression. Effectively the image is split into N macro-blocks, say 16x16 pixels in size. The closest match to each of these macro-blocks is searched for in the following (and/or preceding) frame withing a +/- X/Y pixel range. Each macroblock search (a typical image may have 1200 macroblocks) can be parallelised. Therefore up to 1200 cores could be used to parallelise motion-compensation. Also the Quantisation stage can be parallelised on a macro-block level, as can the DCT(mpeg1&2, H263) or wavelet(mpeg4, H264) transform. The only stage that needs to be done serially is the final Huffman encoding.
There are two main apps FOSS video converters need to be aware of:
winff - great gui for FFMPEG, makes batch conversion simple. Soon to appear in Debian/Ubuntu repos. vlc also includes a more basic video conversion wizard gui, but can't compete with winff. win and lin versions available.
DeVeDe- Already mentioned here I see. Makes DVD and SVCD/VCD/*CD creation under linux simple. Again, another cross platform FOSS app.
Both these apps have SMP support, but only as good as their respective ffmpeg and mencoder backends.
The new QT4 version of KDEnlive is a total re-write of the app and is said to be SMP friendly but has yet to have a proper release.
like kdawson actually posted something interesting, who let him at the coffee pot?
Multi-threaded programming is getting to be like artificial intelligence. People flip out about how hard it is, and when you point out mundane, useful, easy kinds of multi-threading, they say, "Well, that's not really what I was talking about." If multi-threading only means scalable performance-critical code, or code with lots of fine-grained locking, or code written with no language or library support, then hell yeah, it's hard. Multi-threaded programming is full of hard problems, but you can get plenty of work done without ever facing up to them.
There is a MPlayer fork called MPlayerXP. The purpose of the fork is to make a multithreaded version of MPlayer.
http://mplayerxp.sourceforge.net/
Many of today's video codecs compress data by only storing the differences between frames. As such they do not lend themselves well to that type of splitting up.
But in practice, how much space are you going to lose by inserting only three extra keyframes into a 100 minute film? Look at the three keyframes that a four-core encoder would insert, then compare that to how many cuts in a film already need a keyframe after them. If you're worried that this will insert too many extra keyframes once encoding scales up to dozens of cores, you could just have one core finding cuts and the rest encoding each interval between cuts.
Virtualdub is free, open source and is quite capable of running with several processors.
As for encoding, I'm not yet interested in x264 because of the weak processor (only a D805 dual core ) but I am using the XVID experimental build from Koepi.info (http://www.koepi.info) which has SMP support.
It maxes out my two cores and you can specify in the configuration how many threads it should use.
As for decoding, I'm Media Player Classic Home Cinema which has a DXVA codec built in (hardware decoding).
On 1080p videos that the video card can decode hardware (ATI 4850), the CPU usage is about 10-12%. If it's not possible to decode hardware, the decoding is passed to CoreAVC which uses about 45% of CPU (and yes, it's smp enabled).
He's not even right about Apple either, the first MP Mac was the PowerMac 9500, released in 1995 with the SMP option released in 1996.
Source
Menzoberranzan Networks
On a dual-CPU system, you will see 100% CPU usage on both when using dvdrip/transcode. I would love to see how it looks on a quad-core system.
I actually at first thought you ment open source video editing tools, at which point, i was going to say good luck.
I have had similar experiences trying to find multi-core 64 bit video encoders / converters / editors. The problem actually is not usually the application, but the codec. The codec has to be written to take advantage of multi-core systems and 64 bit extensions, not just the program using the codecs. I think XVid is one of the few codecs that actaully has this ability, but if I remember right, the 64 bit code project is not as active as the main project and is usually several versions behind. I actually finally went back to the 32 bit codec as it was less buggy.
While using the 64-bit version of Vista, the workload seems to get balanced out between the cores as well. I am not sure if that is due to how the processor works, how the OS works, or how the software works, but all the video editing tools I have used seem to balance the workload over both cores rather well. This is true when using Adobe Premiere, Canopus Procoder, XMpeg, and other encoders / decoders. So while it may not be OPTIMIZED for it, it certainly does take advantage of it. Of course, your milage may vary.
Not at all, I literally mean four different movies. So four 100 minute movies, one for each core.
How many external DVD-ROM drives would one have to buy to make this effective? I don't think a typical tower PC case for the home market can hold four DVD-ROM drives.
SMP enabled H.264 decoder, works for me! @3GHZ I can play most anything no problem. I do tend to transcode my BD and HD-DVD rips though to save space.
Build it, Drive it, Improve it! Hybridz.org
I am new to Slashdot.
Wow, These comments make me feel really stupid.