Which Open Source Video Apps Use SMP Effectively?

← Back to Stories (view on slashdot.org)

Which Open Source Video Apps Use SMP Effectively?

Posted by kdawson on Wednesday July 23, 2008 @08:54AM from the on-the-one-core-on-the-other-core dept.

ydrol writes "After building my new Core 2 Quad Q6600 PC, I was ready to unleash video conversion activity the likes of which I had not seen before. However, I was disappointed to discover that a lot of the conversion tools either don't use SMP at all, or don't balance the workload evenly across processors, or require ugly hacks to use SMP (e.g. invoking distributed encoding options). I get the impression that open source projects are a bit slow on the uptake here? Which open source video conversion apps take full native advantage of SMP? (And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)"

262 comments

ffmpeg by bconway · 2008-07-23 08:55 · Score: 5, Informative

Use the -threads switch.

--
Interested in open source engine management for your Subaru?
1. Re:ffmpeg by pak9rabid · 2008-07-23 08:58 · Score: 1
  
  Agreed. ffmpeg worked quite nicely for me during my DVD-ripping heyday. Although, it seems that it would rip audio and video in separate threads. While an improvement over the traditional, linear way of doing things, I would still see 1 CPU maxed out (video encoding), while the CPU encoding audio was only at about 1/3 capacity.
2. Re:ffmpeg by morgan_greywolf · 2008-07-23 08:58 · Score: 5, Informative
  
  Similarly, mencoder supports threads=# where # is something between 1 and 8.
  
  --
  My blog
3. Re:ffmpeg by Albanach · 2008-07-23 09:03 · Score: 4, Insightful
  
  Or just convert 2 videos at once, or 4 for a quad core etc. They did suggest they have lots to convert, and it's a pretty easy way to get all available cores working hard.
4. Re:ffmpeg by Z00L00K · 2008-07-23 09:06 · Score: 1, Insightful
  
  And it may or may not be useful to actually rune more than one thread per kernel. It depends on the encoder and application how many threads you shall run, so the best is to test with 1, 2 and 4 threads per kernel.
  
  --
  If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
5. Re:ffmpeg by fm6 · 2008-07-23 09:16 · Score: 1, Interesting
  
  So why is threading off by default? In a CPU-intensive application like this, multithreading always makes sense, even on a single-core system.
6. Re:ffmpeg by sp332 · 2008-07-23 09:19 · Score: 3, Informative
  
  And it may or may not be useful to actually rune more than one thread per kernel. It depends on the encoder and application how many threads you shall run, so the best is to test with 1, 2 and 4 threads per kernel.
  Isn't that per-core, not per-kernel?
7. Re:ffmpeg by mweather · 2008-07-23 09:22 · Score: 5, Informative
  
  Apple computers ARE PCs. They coined the damn term.
8. Re:ffmpeg by Z00L00K · 2008-07-23 09:46 · Score: 1
  
  Of course... Not my best day today! Maybe I shall think more of that pillow...
  
  --
  If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
9. Re:ffmpeg by m0rph3us0 · 2008-07-23 09:47 · Score: 4, Informative
  
  No it doesn't the only time you want to use multi-threading in a single CPU environment is because asynchronous methods for IO are unavailable or the code would be too difficult to re-architect to use asynchronous IO. If the application is seriously IO bound threads can even make the situation worse by causing random IO patterns.
  Ideally, the number of threads a program uses should be no more than the number of processors available. Otherwise, you are wasting time context switching instead of processing.
10. Re:ffmpeg by obstalesgone · 2008-07-23 09:48 · Score: 1
  
  Threading has an overhead, making it a waste of resources if you don't have multiple cores. Multi-threading on a single core is always slower than single-threading. There is a common misconception about this because multi-threaded applications can feel more responsive, but in fact, they take longer to accomplish the same unit of work.
11. Re:ffmpeg by tzot · 2008-07-23 09:50 · Score: 1
  
  He could mean "threads per kernel task", but I wouldn't fathom how one controls that. In any case, I believe you are right.
  
  --
  I speak England very best
12. Re:ffmpeg by sexconker · 2008-07-23 09:56 · Score: 1
  
  And potentially kill I/O in the process.
13. Re:ffmpeg by i.r.id10t · 2008-07-23 09:56 · Score: 2, Informative
  
  Yup, with separate disks to work on to remove (mostly) the disk i/o contention, just let each process run happily away.
  
  --
  Don't blame me, I voted for Kodos
14. Re:ffmpeg by kesuki · 2008-07-23 09:57 · Score: 0
  
  then the most logical way to do things is to count the number of cores, and do threads -1 from that total.
  always leave 1 core free... by default. with quad cores out and 8 cores promised, and no sign of things changing, it's time to rethink defaults.
  oh hey, any open source program that supports multiple instances doesn't even need to thread, just run x copies, if it's a batch encode/transcode process... but threading is useful for n-pass encoding.
  
  --
  https://www.gnu.org/philosophy/free-sw.html
15. Re:ffmpeg by sexconker · 2008-07-23 09:59 · Score: 0, Troll
  
  Too bad Final Cut Pro is trash.
  Any of the various free and extensible encoders / tools are infinitely better than FCP for video conversion.
  Editing video with lame effects and such is another story, since the free (open) shit tends to not have GUIs worth a damn. But what do you expect? It's all geared at converting commercial stuff for piracy.
16. Re:ffmpeg by init100 · 2008-07-23 09:59 · Score: 1
  
  That's exactly what I do. I also wrote a scheduler in Python that starts new jobs when the previous ones are completed. It keeps the number of running encoding processes equal to the number of processors/cores.
  To get the optimal scheduling order, it figures out the length of each input file (using midentify from the mplayer/mencoder distribution), and then sorts the jobs so that the longest jobs will be processed first (it assumes that processing time is roughly proportional to input file length (in seconds, not bytes)). This minimizes the time when one or more cores will be kept idle by remaining jobs. After all jobs are finished, it optionally powers the system down, which is nice when you're running jobs at night.
17. Re:ffmpeg by m0rph3us0 · 2008-07-23 10:06 · Score: 4, Insightful
  
  On a two processor system this would result in multi-threading being off.
18. Re:ffmpeg by civilizedINTENSITY · 2008-07-23 10:26 · Score: 2, Interesting
  
  strange that quoting history correctly and in context gets you modded flamebait...
19. Re:ffmpeg by ydrol · 2008-07-23 10:26 · Score: 4, Informative
  
  Darn, I forgot a minor detail in my question. I was really asking about the various front-end apps (dvd::rip, k9copy, acidrip etc), I got the impression that none seem to notice they are running on an SMP platform and pass the necessary switches by default to the backend.
  Some may argue this is a good thing, but for the time being SMP is the way forward for faster processing as MHz has maxed out, in consumer PCS. So when they start buying octo-core CPUs they dont expect it to run at 1/8th speed by default.
  I was also being a bit lazy. I could have checked up on each app in turn, but I asked /. instead.
20. Re:ffmpeg by Tanktalus · 2008-07-23 10:29 · Score: 5, Interesting
  
  That sounds like a lot of work... I just used make:
  
  %.mpg: %.avi tovid -ntsc -dvd -noask -ffmpeg -in "$<" -out "$(basename $@)" all: $(subst .avi,.mpg,$(wildcard */*.avi))
  
  Then I just ran "make -j4". All four processors working like mad, with a minimal of effort.
  (You may need to change the wildcard for your own scenario.)
21. Re:ffmpeg by Macman408 · 2008-07-23 10:31 · Score: 1
  
  In a CPU-intensive application like this, multithreading always makes sense, even on a single-core system.
  No, it doesn't. Performing a task in multiple threads always has some amount of communication overhead. Depending on the type of task being performed and the algorithm being used, that overhead can vary quite a bit. In any case, a multithreaded app will do at least a little bit more work, and in some of the worse cases, it might have a lot of conflicts over shared data, causing a significant slowdown. I'd expect to see anywhere from a couple percent performance hit all the way up to 50% less than an ideal speedup for a particularly bad application. The benefit of multithreading comes when you can be running multiple threads at once - so a 2-threaded app on one CPU might run at 0.95 times the speed of the same app when single-threaded... But, run it on two cores, and it runs at 1.9 times the speed of the single-threaded version. (waves hands, ignoring lots of variables)
  Of course, there are a lot of variables - like is the single-threaded version actually doing the same thing? It might skip all the communication steps or mutex locking that a multithreaded application would do, or it might be doing them anyway. And the OS's scheduler and cache have an effect, too. If you have multiple threads fighting over one CPU (and its cache), they can slow each other down if the cache isn't large enough to hold the working set of both threads at the same time. The TLB can also suffer in the same way.
  Moral of the story: it's an extremely rare application that will get a speedup on a single-core CPU (without simultaneous multithreading - hyperthreading is a lot like adding a second core).
22. Re:ffmpeg by Fred_A · 2008-07-23 10:52 · Score: 2, Interesting
  
  Is creating a copy of my DVD for my Cowon D2 piracy now ?
  Legally it probably is in many places since I'm probably not even allowed to read them on my PC (Linux), but still...
  
  --
  
  May contain traces of nut.
  Made from the freshest electrons.
23. Re:ffmpeg by TeacherOfHeroes · 2008-07-23 10:54 · Score: 1
  
  Ideally, the number of threads a program uses should be no more than the number of processors available. Otherwise, you are wasting time context switching instead of processing.
  An exception to this kind of rule should really be made for graphical user interfaces. In the case of GUI applications, time wasted in context switching is less important than keeping the UI responsive and the user happy.
  Any kind of heavy lifting (IO-blocking or otherwise) should really be done on a different thread than the one that is responsible for handling the user interface. This allows the user interface to stay responsive, providing the user with feedback (progress bar, time estimate, reassurance that the programme hasn't locked up, etc...) and the ability to cancel the work in progress.
  Sometimes you can modify the resource intense code to play nice with a GUI by working in chunks, but usually this is, IMHO, the wrong approach, since it means changing the logic/implementation of a programme to cater to a specific UI.
24. Re:ffmpeg by Anonymous Coward · 2008-07-23 11:11 · Score: 3, Informative
  
  If thread 1 is doing work while thread 2 is blocked (io, semaphores, etc), then multithreading will be faster.
25. Re:ffmpeg by sexconker · 2008-07-23 11:15 · Score: 2, Informative
  
  If you're making another copy of it to play on another device (format shifting or whatever bullshit term they used), yeah, you can probably get sued for it if some asshat wants to target you.
  Illegal? No.
  Wrong? Hell no.
  My point is that encoding apps often exist separately from editing apps (such as FCP). This is due in large part to piracy, especially when talking about free/open encoders and sites like doom9.
  Pirates are not concerned with editing/creating, they're concerned with copying and converting/compressing. Likewise, the writers of such tools/interfaces are concerned with pleasing a certain crowd of people...
  I was calling FCP trash in terms of a converter/encoder. It's much more geared toward editing. Thus, I think the ACs mention of FCP in this conext is not very pertinent.
26. Re:ffmpeg by VGPowerlord · 2008-07-23 11:16 · Score: 4, Informative
  
  True, but in most contexts, "PC" is the shortened form of IBM-compatible PC (which is really outdated), and is usually just stands for Windows these days.
  
  --
  GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
27. Re:ffmpeg by sick_soul · 2008-07-23 11:25 · Score: 2, Insightful
  
  Just want to inform you that threads nor any other
  multiprogramming mechanisms are necessary for
  responsive user interfaces,
  and that IO multiplexing in particular does not require
  threads at all.
  You can solve both with threads, but you don't have to.
  And in most common cases it is much better not to;
  it seems that threads continue to be one of the most
  misused and misunderstood of the programming concepts.
28. Re:ffmpeg by elgaard · 2008-07-23 11:27 · Score: 2, Informative
  
  I have not tried it. But e.g. k9copy uses mencoder.
  So if you just put something like "x264ops=threads=auto" in you mencoder.conf file it might work also from k9copy.
  k9copy also have a settings menu where you can tune options to mencoder for various codecs.
29. Re:ffmpeg by SimonTheSoundMan · 2008-07-23 11:29 · Score: 2, Informative
  
  Yeah, Compressor is pretty damn good. It doesn't just use all your cores, but it can also distribute the workload to other machines on a network. Whole render farms.
  
  Logic Node is somewhat better, however it only does audio, we have two eight core Mac pro's and three Xserv machines in our studio. The Xserve machines will be binned when the new version of Logic Pro supporting GPU processing the audio is out.
30. Re:ffmpeg by fatphil · 2008-07-23 11:33 · Score: 1
  
  Ugh, no - that's just another "ugly hack", even if it is in the man page.
  
  --
  Also FatPhil on SoylentNews, id 863
31. Re:ffmpeg by maglor_83 · 2008-07-23 11:35 · Score: 5, Funny
  
  On a single core system this would result in not being able to run anything!
32. Re:ffmpeg by Anonymous Coward · 2008-07-23 11:38 · Score: 1, Insightful
  
  or just set it to the number of cores, set all the threads to low priority and let the OS do the scheduling. You know, the way things have been done for years.
33. Re:ffmpeg by hedwards · 2008-07-23 12:08 · Score: 5, Insightful
  
  Apple has spent a lot of time and money convincing everybody that they don't sell PCs, they sell Macs. I'm not sure what the point of arguing with both the general public as well as Apple is.
  At this point, the term PC does not include Apple computers. It's a change to the definition which happens when the vast majority of people decide amongst themselves that the definition should change.
  In terms of the topic at hand, most video apps really should be capable of using multiple cores, tasks of this sort are quite easy to finish in parallel. Either by doing ever n frames or subdividing the image into a number of regions which can be completed separately and joined at the end before writing the frame to disk.
34. Re:ffmpeg by Albanach · 2008-07-23 12:08 · Score: 2, Insightful
  
  I thought about that but, seriously, transcoding is usually CPU limited. I'd really suspect it'd take a lot of simultaneous encoding to make it I/O bound.
35. Re:ffmpeg by hedwards · 2008-07-23 12:13 · Score: 1
  
  Threading is sometimes broken on the OS, or sometimes it varies between revisions.
  FreeBSD for instance has been in the middle of changes to the threading system and there was a bug in the 6.x branch which wasn't in either 7.x or current. Defaulting to off if you're not sure how well threading is going to be handled is probably better than defaulting to something that could be broken.
  Anybody who knows that they need threading and decides to turn it on is likely to know whether or not threading is broken. Or to discover that it's broken and not enable it the next time.
36. Re:ffmpeg by Anonymous Coward · 2008-07-23 12:14 · Score: 0
  
  Apparently enabling the option for SMP is an ugly hack..
37. Re:ffmpeg by 3vi1 · 2008-07-23 12:30 · Score: 5, Insightful
  
  No - HP did (for their calculators), way before there "was" an Apple.
  Also, I don't even think Apple marketing would agree with you - or they wouldn't have "I'm a Mac... and I'm a PC" adverts.
38. Re:ffmpeg by networkBoy · 2008-07-23 12:57 · Score: 3, Informative
  
  I hit I/O throttling when I do the following:
  * rip 2 dvds (two DVDR Drives)
  * transcoding previous DVD rips to XVID
  * Moving completed rips to server over 1 Gbps Ethernet link.
  At this point I can see CPU load start to drop as PCI bus I/O saturates.
  At no point do I hit disk I/O or memory limits.
  Disks are non-RAID non-striped, but rips are to separate disks (thus DVDA rips to HDA DVDB to HDB) and server upload pulls from whatever disk is not currently transcoding (transcode file on HDA, when done start transcode on HDB and move file from HDA).
  -nB
  
  --
  whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
39. Re:ffmpeg by Anonymous Coward · 2008-07-23 13:13 · Score: 0
  
  Actually, that is a good suggestion.
  Encoding multiple videos at a time is more efficient than encoding one at a time using multiple cores. It will finish faster.
40. Re:ffmpeg by hellwig · 2008-07-23 13:21 · Score: 1
  
  I was getting about 70 FPS with -threads 2 on my Athlon X2. I removed the -threads switch, and started getting about 90+ FPS, that's right, with 1 thread. I was even able to run two conversions at once, and each got about 80-90 FPS conversion rates (though I think a shared-resource violation caused a BSOD).
  
  Any idea why a single-thread would be more efficient than 2 threads on a dual-core AMD machine? Could it have been the Windows XP binary I was using (sorry, Debian/Ubuntu doesn't understand the NVIDIA RAID5 I'm using, so I could't install linux on my machine). Could it be the fact I was using a 32-bit binary on 64-bit Windows? Could it just be Windows?
  
  On a more broad topic, since Intel introduced hyperthreading in the P4 3+ years ago, why are so many programs still single-threaded? I mean for chrysaque, what's the point of buying a 4-core machine when 99% of applications still don't support it? Yeah, it's nice to convert a video to DIVX and browse the web at the same time, but I have enough computers to make that happen regardless. Seems to me that this problem isn't simply restricted to open-source video applications.
  
  --
  Eggs
  Milk
  Bread
  Cat Litter
  Soda
  ...
41. Re:ffmpeg by TheLink · 2008-07-23 13:23 · Score: 2, Insightful
  
  Ah but figuring out "make" might require too much wetware CPU time for most people ;).
  
  "Why is it not working? Oops messed up tabs and spaces", etc.
  --
  
  Too many replies beneath your current threshold
42. Re:ffmpeg by slimjim8094 · 2008-07-23 13:51 · Score: 2, Insightful
  
  Perhaps. But threads are far more versatile - if they're done well.
  So our video app has a sound-processing thread, a video processing thread, and a UI thread. If it's implemented well (don't read or write twice, have a common buffer), it'll run with the same or near performance as a one-threaded program on a one-processor/core system.
  But on a multicore/processor system no extra work is needed to take advantage of the cores. If we have three cores, it'll run automatically across cores for a massive performance gain. And we automatically take advantage of scheduling improvements.
  Yes, it can be done crappily. But threads exist for a very good reason and writing your program in one thread is more complex and far, far less flexible
  
  --
  I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
43. Re:ffmpeg by 1tsm3 · 2008-07-23 14:06 · Score: 1
  
  Unfortunately the -threads switch doesn't work for xvid in the stable mencoder release. At least that's the case with the latest mencoder that ships with Fedora 9. The -thread works only in the development version in the svn. Isn't xvid supposed to be better at encoding quality than ffmpeg? That's pretty much the reason I'm not using ffmpeg.
  
  --
  -ItsME
44. Re:ffmpeg by MadnessASAP · 2008-07-23 14:07 · Score: 4, Informative
  
  If I may offer a suggestion, I'm not too sure on what your setup is but on mine I have 2 DVD drives each on separate IDE buses and 2 SATA drives (also on separate buses) rip from the DVD to drive 1 and encode from drive 1 to 2. OF course it all depends on a variety of factors but using that certainly helped that.
  
  --
  I may agree with what you say, but I will defend to the death your right to face the consequences of saying it.
45. Re:ffmpeg by Nikker · 2008-07-23 14:18 · Score: 4, Informative
  
  Running multiple cores with an ide interface is going to kill you regardless because you are only encoding in memory not really storing much there. Basically you have a cap of about 40MB/s for anything larger than about 40MB.
  
  --
  A loop, by its nature, continues. If that didn't make sense, start reading this sentence again.
46. Re:ffmpeg by Anonymous Coward · 2008-07-23 14:24 · Score: 1, Interesting
  
  I don't think you're actually trolling, so I'll bite.
  Jahshaka has quite a good GUI - it needs a bit of stability and a few extra features, but it's free so who am I to complain?
47. Re:ffmpeg by adamchou · 2008-07-23 14:41 · Score: 1
  
  According to the make manpage, -j only threads make, not ffmpeg. -j [jobs], --jobs[=jobs] Specifies the number of jobs (commands) to run simultaneously. If there is more than one -j option, the last one is effective. If the -j option is given without an argument, make will not limit the number of jobs that can run simultaneously.
48. Re:ffmpeg by ksheff · 2008-07-23 15:21 · Score: 5, Informative
  
  That's the point. If the xvid encoder is single threaded, then to keep all the cores busy, one must run multiple instances of ffmpeg with each one encoding a different file. For the given Makefile, that is what make will do when the -j switch is used.
  
  --
  the good ground has been paved over by suicidal maniacs
49. Re:ffmpeg by Daengbo · 2008-07-23 16:19 · Score: 1
  
  Devede has a box for "Use optimizations for multi-core CPUs."
  
  --
  Put identity in the browser.
50. Re:ffmpeg by SoupIsGoodFood_42 · 2008-07-23 17:05 · Score: 1
  
  Those adds are new. Apple as been around for quite a while. Things change. Not saying Apple coined the term...
51. Re:ffmpeg by QuoteMstr · 2008-07-23 17:21 · Score: 3, Informative
  
  You're still missing the OP's point. Let me spell it out for you:
  Say you have four videos to encode, and four cores.
  1) You can either use one core at a time and encode one video at a time. Let's say that takes time T.
  2) You can encode one video at a time, but use all four cores while doing it. Your total time is T/4.
  3) You can encode four videos at a time, one on each core. Your total time is T/4.
  The OP was advocating strategy #3. It's a fine approach.
52. Re:ffmpeg by Anonymous Coward · 2008-07-23 17:26 · Score: 0
  
  "Apple computers ARE PCs. They coined the damn term."
  In 1985 when I got my fist computer, if you said to anyone you got a PC they knew it was an IBM or compatible. And when you went into a computer store in that time frame the software was marked PC or Apple. That is the way it was.
53. Re:ffmpeg by QuoteMstr · 2008-07-23 17:31 · Score: 2, Interesting
  
  Yes, you can use threads well. But with less effort (taking into account synchronization and debugging), you can make the asynchronous tasks independent programs instead of threads. Your video and sound processing threads sound like perfect candidates for being made into independent programs.
  A task being an independent program affords several advantages. For example, it's easier to test an independent program, especially in a test harness. An independent program can be run by itself. And it's very clear what an independent program's data dependencies are. There is no risk of accidentally racing in memory access, assuming the programs don't share memory. Don't do that.
  Performance simply is not a problem. Any modern operating system will have IPC primitives that are more than good enough.
  For something like a video processing application, all three programs sharing file descriptors open to a video buffer sounds ideal. And before you complain that "disk access" is slow: on modern operating systems, main memory is just a cache for the disk anyway. With a modern page cache, using a disk file well be just as efficient as pretending you can keep arbitrarily large data structures in memory. See Varnish's architecture.
  Even if you must use threads, you should always program them as if they were independent programs, use message-passing, sockets, and so on for communication, and treat the shared address space more as a dangerous misfeature than a communication medium.
54. Re:ffmpeg by SilverJets · 2008-07-23 17:38 · Score: 1
  
  In the case of the OP make isn't compiling anything. Make can do much more than just run your compiler.
55. Re:ffmpeg by init100 · 2008-07-23 18:18 · Score: 1
  
  I didn't post a link since I don't have it available online. Maybe I should put it somewhere. :)
56. Re:ffmpeg by init100 · 2008-07-23 18:24 · Score: 1
  
  I know Make well enough to do that, except I didn't want the progress output from several different encoders mixed up, which I assume would happen if I had used Make.
57. Re:ffmpeg by Anonymous Coward · 2008-07-23 22:45 · Score: 0
  
  dvd::rip has done this by default for years, and years, and years. I used it to rip DVDs back in the day on a dual celeron 466 running on an ABit BP6 board. It maxed both CPUs no trouble.
58. Re:ffmpeg by eastlight_jim · 2008-07-23 23:29 · Score: 1
  
  The problem with strategy three is that it only gives a total time of T/4 if all the videos are the same length. Let's say that there's 4 videos of 30 minutes, 1 hour, 1 hour 30 and 2 hours length. Assume that one core can encode one video at real time (i.e. take 1 hour to encode 1 hour's footage), the times taken by each strategy are:
  1) 30 minutes + 1 hour + 1 hour 30 + 2 hours = 5 hours
  2) 5 hours / 4 = 1.25 hours
  3) Core 1 finished in 30 minutes - then sits idle for 1 hour 30.
  Core 2 finished in 1 hour - then sits idle for 1 hour.
  Core 3 finished in 1 hour 30 - then sits idle for 30 minutes.
  Core 4 finished in 2 hours. No idle time.
  Total time to completion of job: 2 hours (45 minutes slower than option 3 with 3 hours idle core time wasted).
  Given the option between options 1 and 3 I'll choose number 3. If, however, someone offered me option 2 I'd take that in preference to number 3.
59. Re:ffmpeg by fprintf · 2008-07-23 23:53 · Score: 2, Insightful
  
  It seems we are in the same place years and years later. Way back when overclocking Celeron 400s was the rage, I bought a multi-processor motherboard to run twin Intel Pentium IIs. I bought a SuSE Linux package after reading that Windows 95 would not support dual processors... you can see where this is going... except for rolling my own kernel and a few other things (like compiling code), the system largely ran on one processor even with SMP turned on in the kernel.
  So it seems we have similar complaints 8 or more years later. How disappointing. I only wish I knew how to program to the level where I could help solve this.
  
  --
  This post brought to you by your friendly neighborhood MBA.
60. Re:ffmpeg by evilviper · 2008-07-24 00:37 · Score: 0, Troll
  
  Or just convert 2 videos at once, or 4 for a quad core etc. They did suggest they have lots to convert, and it's a pretty easy way to get all available cores working hard.
  More to the point, multi-threading introduces overhead, and quality loss in the encoded video. So, if you do have at least as many videos as you have cores, encoding each, single-threaded, on it's own core, will be slightly faster, and slightly higher quality than encoding those same videos sequentially, using a codec with multi-threading.
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
61. Re:ffmpeg by Anonymous Coward · 2008-07-24 00:49 · Score: 0
  
  the -j option with make is for compiling on X number of cores...
62. Re:ffmpeg by ncc74656 · 2008-07-24 01:29 · Score: 1
  
  Similarly, mencoder supports threads=# where # is something between 1 and 8.
  
  When encoding H.264, though, it only tends to max out two cores. That's good enough on a dual-core box, but with a quad-core box, you're leaving cycles on the table. (Yes, I already knew about the threads option and have set it accordingly. If you have multiple videos to encode, you could do two of them at once...but that's what the submitter is trying to avoid.)
  
  --
  20 January 2017: the End of an Error.
63. Re:ffmpeg by skeeto · 2008-07-24 01:34 · Score: 1
  
  Mark Dominus over at A Universe of Discourse came up with a program called runN which would accomplish a similiar task but without having to write out a specific Makefile for it.
  You provide a command, number of jobs, and a bunch of files to run it on (after a "--" argument),
  
  # Untar files in parallel, 4 at a time runN -n 4 tar -xzf -- *.tar.gz # Encode 4 video at a time (from parent's example, but I don't know how tovid actually works) runN -n 4 tovid -ntsc -dvd -noask -ffmpeg -in -- *.avi
  
  Of course, your Makefile option provides a bit more control as you can specify an output filename, and if you must specify an output filename, runN will not work. In any case, any solution which uses processes instead of threads is a good solution.
  M. Crane wrote a follow-up sometime later describing how to make it more useful (and make the above examples actually work).
64. Re:ffmpeg by Ultra64 · 2008-07-24 01:41 · Score: 2, Insightful
  
  Ok, I've got to hear this one.
  What the fuck does threading have to do with video quality?
65. Re:ffmpeg by mweather · 2008-07-24 02:21 · Score: 1
  
  Last I checked, OSX runs on IBM-PC hardware.
66. Re:ffmpeg by squizzar · 2008-07-24 03:15 · Score: 1
  
  I'm no expert, but essentially you need to understand how to do safe multi-threading.
  Look at stuff like:
  http://en.wikipedia.org/wiki/Dining_philosophers_problem
  Learn about semaphores, mutexes, spinlocks and other crazy magic and why you need them. Then you're set to start writing multi-threaded applications. Modern Operating Systems by Andrew Tanenbaum was a recommended book whilst I was at uni, and might be quite helpful. PThreads is used a lot on unix systems, Java has a lot of concurrency and synchronisation stuff built so there's a good few places you can play with things.
67. Re:ffmpeg by Anonymous Coward · 2008-07-24 04:20 · Score: 0
  
  Strange? Bullocks. Apparently you don't live in Christendom, or any religious society. We deal with this everyday. Where exactly do you live?
68. Re:ffmpeg by fm6 · 2008-07-24 04:32 · Score: 1
  
  Just want to inform you that threads nor any other
  multiprogramming mechanisms are necessary for
  responsive user interfaces,
  Huh? Suppose your program is transferring a large amount of data from A to B (copying a file, displaying a graphics-intensive web page, etc.) and you want the user interface to remain response during this activity — if only to allow the user a chance to cancel it. How do you continue to respond without some kind of "multiprograming method"?
69. Re:ffmpeg by Tetsujin · 2008-07-24 04:46 · Score: 1
  
  Ok, I've got to hear this one.
  What the fuck does threading have to do with video quality?
  Couldn't tell you for sure - but the multi-threaded version of a task may be a whole different algorithm. So if you're not splitting the job on a convenient boundary (like CPU1 for audio, CPU2 for video, for instance - if you're using more than one CPU for a single task) then it may be a different algorithm with a different implementation. That's where you can see behavior diverge.
  
  --
  Bow-ties are cool.
70. Re:ffmpeg by jannesha · 2008-07-24 05:12 · Score: 2, Funny
  
  Clearly correct, as highlighted by Apple's own advertising:
  http://en.wikipedia.org/w/index.php?title=Mac_vs_PC
  --jjj
71. Re:ffmpeg by sexconker · 2008-07-24 05:19 · Score: 1
  
  None of what you said makes sense.
72. Re:ffmpeg by La+Camiseta · 2008-07-24 07:10 · Score: 1
  
  Still, if it's a multi-cored system and from the make file it seems that there's going to be many files to transcode (I'd be willing to bet that it's DVD collection large), then I'd be willing to take the hit of a couple hours idle CPU time assuming that you can keep all cores going full throttle for an extended period of time.
  e.g. 48 hours running, 3 hours at the end with cpu time idle (3 cores idle for an hour or so) - 192 potential cpu hours (48*4), 189 hours used. 189/192=98.44% usage of the cpus - that's good enough for me. And the more files that you have to transcode, the better the usage of your cpus gets. In reality what's probably going to be your bottleneck will be your I/O, not the CPU time.
73. Re:ffmpeg by Anonymous Coward · 2008-07-24 07:38 · Score: 0
  
  A friend of mine did this, his IDE hard-drive died after two weeks or so.
  Now that NCQ is implemented in SATA hard drives, it shouldn't be a problem any more, though.
74. Re:ffmpeg by DarthJohn · 2008-07-24 16:31 · Score: 1
  
  MPEG-4 encoding is an iterative process.
  It basically works by only every once in a while having a full frame with every pixel, the rest of the time just storing the difference from the previous frame.
  A big part of making that look good is deciding on good locations for the full frames.
  The easiest way to split that job up on multiple threads is to keep a fixed full frame position so you have predictably sized chunks for each thread to crunch on.
  Keeping the full frames at constant intervals will either waste space or result in lower quality: some scenes can't go for long without new full frames, some don't change much and having them too often would be wasteful.
  It's a problem that's been worked on for quite some time, so I'd bet there are known solutions, but I'm not an expert. I do however have a slashdot account. :-)
75. Re:ffmpeg by evilviper · 2008-07-24 20:06 · Score: 1
  
  What the fuck does threading have to do with video quality?
  Video encoding is highly inter-dependent, and is not given to parallel processing. You can't split it across cores without making certain assumptions, and skipping (or performing out of order) certain calculations, which will reduce the quality/efficiency (at a give bitrate).
  For reference: http://www.vassilios-chouliaras.com/pubs/c25.pdf
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
76. Re:ffmpeg by grepper · 2008-07-24 22:54 · Score: 1
  
  mpeg-2 encoding is not threaded with ffmpeg, so option 3 is the best you will do in this case anyway. h264 encoding would of course benefit from strategy 2.
77. Re:ffmpeg by Rysc · 2008-07-24 23:17 · Score: 0
  
  xvid and ffmpeg are not comparable, so one cannot be better than the other.
  
  --
  I want my Cowboyneal
78. Re:ffmpeg by Anonymous Coward · 2008-07-25 02:18 · Score: 0
  
  cores, kernels ... squirrels don't mind
79. Re:ffmpeg by Anonymous Coward · 2008-07-25 12:06 · Score: 0
  
  Fast modern disks read/write in the 80-120 MB/s range. PCI busses are waaaaay faster than this.
  There is a huge probability that your disks saturates (most probably your DVD), not your bus. A single Express slot can handle more than 1 GB/s nowadays.
  Remember that a disk saturates, not when it reaches maximum theoretical throughput, but when it cannot serve requests faster. Max throughput is reached only with ideal (pure sequential) access pattern; even with a good filesystem, some seeks will delay I/Os, and with a bad one, seeks can quickly dominate, and you end up with awful performance. Yet your disk does reach saturation: that is, for the kind of load it's under.
  You should check stats such as the percentage of disk time during which requests are being serviced by the disk: if it's very close to 100%, then your disk is the performance bottleneck.
transcode, of course! by morgan_greywolf · 2008-07-23 08:55 · Score: 5, Informative

transocde uses separate processes for everything.

--
My blog
Simple... by Anonymous Coward · 2008-07-23 08:58 · Score: 0

You have to design it as SMP from the ground up, you cannot just hack in it later. Not to forget that multi-threated programming is hard. Give it a few years and there will be more OSS solutions. Multi core processors are not mainstream for that long.
1. Re:Simple... by Cyrano+de+Maniac · 2008-07-23 09:08 · Score: 1, Insightful
  
  I'm still not sure where this idea that "multi-threaded programming is hard" comes from. It's not. It seems that most people are just afraid of it because they're not familiar with it.
  Or perhaps I just overestimate the mental capacity of most programmers? Having looked at a lot of code, there may be merit to that theory.
  
  --
  Cyrano de Maniac
2. Re:Simple... by j00r0m4nc3r · 2008-07-23 09:21 · Score: 5, Informative
  
  Running multiple instances of the same code concurrently in multiple threads is simple. Even running mutually exclusive parts of the same code concurrently in separate threads is easy. Converting complex serial algorithms to effectively utilize multiple cores is generally not simple. And writing code that can scale and balance across n number of cores/threads is extremely hard. There are all sorts of synchronization issues to deal with, scheduling issues, data transport issues, etc.. and it becomes increasingly hard to debug code the more cores/threads you throw in. I think the stigma is justified.
3. Re:Simple... by everphilski · 2008-07-23 09:24 · Score: 1, Insightful
  
  Amen
  
  If you truly understand the problem domain you are operating in, parallelism becomes readily apparent. Implementing it isn't difficult even on old code, again, if you truly understand where the parallelism exists.
4. Re:Simple... by Anonymous Coward · 2008-07-23 09:35 · Score: 0, Insightful
  
  And writing code that can scale and balance across n number of cores/threads is extremely hard.
  You're overgeneralizing. Sometimes it's hard, and sometimes it's dirt simple easy.
5. Re:Simple... by Bert64 · 2008-07-23 09:47 · Score: 1
  
  Multi core no....
  But unix apps have been running on multi processor systems for years, and geeks have had access to such systems for years too. I did video encoding in 2000 on a quad cpu alphaserver and a dual cpu sparc, but i just did as someone else suggested and ran multiple encodes simultaneously.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
6. Re:Simple... by sexconker · 2008-07-23 10:05 · Score: 2, Informative
  
  How the hell is this modded interesting (as opposed to informative)?
  Do people really not know this stuff (thus making it interesting to them)?
  For the gp and the others who still don't get it.
  Multi-threaded programming (getting your shit to run in separate threads) is easy, now.
  Multi-threaded / distributed algorithms (getting your shit to do some coherent, useful shit while scaling well) are not easy at all.
7. Re:Simple... by brokenin2 · 2008-07-23 10:40 · Score: 1
  
  Yep.. you understood your problem domain, and easily recognized where parallelism existed. Then you stated your solution like a practical intelligent person, not like some moron trying to claim that everything is always simple because he is so damn smart that he transcodes all his videos using a neural interface to his own brain while he sleeps. It was simple you know, because brains are massively parallel, and can kick the shit out of your PC when it comes to overall processing power.
8. Re:Simple... by Cyrano+de+Maniac · 2008-07-23 11:20 · Score: 3, Insightful
  
  Exactly. Too many people assume that any given programmer can write any given program. What isn't generally realized (at least by the masses) is that programming really is about acquiring expertise in a particular domain and then solving problems in that domain through the use of computer programs. Generally some of the most effective programs I've seen have been written, on their first pass, by a person with intimate domain knowledge, and mediocre programming/computer knowledge. The program then becomes a standout when someone with intense programming and computer architecture knowledge improves the code from there (they need not be a subject domain expert, but it helps).
  I do take issue with sexconker assuming that I "just don't get it". Heh. If s/he only knew. Whatever, no biggie. I do agree that distributed algorithms are generally more difficult to implement/design than non-distributed, but that's not exactly the same thing as serial versus parallel algorithms (non-distributed generally involves access to data through a common address space, distributed doesn't, though even those pseudo-definitions come up a bit short).
  Again and again I read in industry rags and on various web sites that multi-threaded programming is hard, and nobody knows how to do it, and that it's difficult to debug, and all that. I believe what they're really saying is "The set of programmers who are accustomed to multi-threaded programming/debugging is (relatively) small, and thus applications aren't going to make good use of the shift to multicore CPU packages." Familiarity with a skill, and the supply of labor familiar with said skill, is distinct from it being easy or hard.
  Anyway, I stand by my belief that parallel programming is not as difficult as most people are led to believe. Some problems don't lend themselves well to parallel solutions, or don't merit the added complexity, but many many of them do. In ten years time I predict that most computer programming education will assume the use of threading, and that anyone who isn't competent with threading will severely limit their own job prospects.
  
  --
  Cyrano de Maniac
9. Re:Simple... by AlterRNow · 2008-07-23 11:50 · Score: 1
  
  As can be demonstrated in Windows XP. In theory, you should be able to run two tasks at once, right?
  So open up Notepad and set that process to 'Realtime' and watch as one core will max out and the other core is completely idle while Windows becomes nearly completely unresponsive ( even if you set Notepad to the second core ).
  At least, this is what it did when I tried it, naturally YMMV.
  
  --
  The disappearing pencil trick. Let me show you it.
10. Re:Simple... by Anonymous Coward · 2008-07-23 12:37 · Score: 0
  
  I've not done real time, but way back in the day, I used to do a lot of video cd encoding, and I used TMPGEnc, and I got in the habit of just queing up encodes over night and letting them run because it was faster to let a single app keep running in foreground at near real time that to have multiple instances running in the background. I continued when I started dealing with DVDs (transcoding Divs to MP2 to watch on my tv). I kept the same habit even when I moved to a dual core machine until the day I noticed that I was nearly maxing one core, but the other was sitting at 5%, so I tried a little experiment and divided my encoding into 2 batches, and opened 2 instances of the encoder and loaded a batch in each and let em rip. Encoding time halved. So it did indeed now max both cores, thus eliminating the need for SMP support. Like said before, sometimes the solution is simpler than you think.
11. Re:Simple... by Panaflex · 2008-07-23 18:21 · Score: 1
  
  While I agree with you in principle... some algorithms are NOT SIMPLE. Yes, we can write great code that handles "single instance" things like fetch web pages or dump report. We can write "distributed" solution systems using threads - like chess simulations, raytracers and nuclear physics.
  But if an algorithm has linear dependencies for forward state then threading it is much, much more difficult.
  
  --
  I said no... but I missed and it came out yes.
12. Re:Simple... by ivan256 · 2008-07-24 07:55 · Score: 1
  
  Converting complex serial algorithms to effectively utilize multiple cores is generally not simple.
  Fortunately, most algorithms that end-users care about don't fall into that category.
  Video encoding certainly doesn't fall into that category. It is almost trivial to split a video up into sections of length (total length/number of cores), and then concatenate the encoded sections after you're done. Transcoding is even easier. Core 0 decodes, Core 1 encodes.... Etc.
  Synchronization, scheduling, and data transport issues are largely the same as multi-threaded programming on a single core. The problems are well understood.
  Multi-threaded programming is hard because most programmers don't understand the theory. They only learned about the tools.
13. Re:Simple... by AlterRNow · 2008-07-24 20:56 · Score: 1
  
  I think it kind of misses the point of SMP though ( starting two instances I mean ). Isn't it supposed to be transparent to the end user?
  
  --
  The disappearing pencil trick. Let me show you it.
x264 by Anonymous Coward · 2008-07-23 08:58 · Score: 3, Insightful

x264 use slices and scales pretty well across multiple cores. I use it on windows via megui, but you could easily use it in Linux as well. You could use mencoder to pipe out raw video to a fifo and use x264 to do the actual conversion, for instance.
1. Re:x264 by kesuki · 2008-07-23 10:21 · Score: 0, Offtopic
  
  now if only h264 didn't use atrocious, buggy, awful non-burned in subtitles that don't render correctly if you're missing the 'fonts' (especially the foreign language fonts!) the encoder assumed everyone in the world has!
  there is a reason the non-burned in subtitles in DVDs are so atrocious looking it's so that EVERY DVD player could do subtitles right.
  I've even had non-burned in subtitles CRASH VLC media player!!! WTF do they only think of windows media player version 20.9.1029.3! or whatever it is they use?
  if a file is available in avi and MKV format i always go with AVI because the subtitles look so much better! they don't crowd each other, they don't 'grow larger that the screen border' when you maximize the screen, they don't 'stay so long you can't read the next piece of text'
  ugh i hate mkvs you know they could just burn in the subtitles, but because mkv is a container they don't bother.. it's like slacking on the subtitle quality control, they don't even need to preview how it works, cause it's not like people are going to say 'make a version 2 so i can see all the subtitles right!'
  
  --
  https://www.gnu.org/philosophy/free-sw.html
2. Re:x264 by TheDreadedGMan · 2008-07-23 10:31 · Score: 1
  
  what does this have to do with SMP video apps... plus, complain, but also you should report bugs with subtitles...
  Burned in Subtitles are great if you want to re-encode each language of the movie separately.
  If the movie player worked properly then it would look fine...
  Also, fonts should be selectable in the movie player, not locked to the subtitle file...
  WMP is v11 not v20... and AFAIK doesn't do subtitles out of the box, anyone?
3. Re:x264 by Anonymous Coward · 2008-07-23 10:49 · Score: 0
  
  Soft subtitles have a lot of advantages and its VLC's fault for rendering them incorrectly.
  If you'd like to watch them correctly, use Mplayer.
  I like SMplayer as a front end for Mplayer.
4. Re:x264 by kesuki · 2008-07-23 10:50 · Score: 1
  
  wmp does whatever you have codecs installed for it to do. and i am not a windows media player guy, i liked version 2 or whatever the small simple gui one... then they made it winamp, but without the 'classic' winamp skin (i always use winamps classic skin if i use it at all)
  the thing was it was a rant against containers like h264 and ogm containers sound great on the outside, as do 'non burned in subtitles' but in reality they suck hardcore, every media player handles containers differently, not to mention containers can contain malicious code such as launching a web browser to a malware site that tries to trick the user into downloading and running a trojan horse..
  but yeah i should taken away karma bonus on that post...
  
  --
  https://www.gnu.org/philosophy/free-sw.html
5. Re:x264 by Anonymous Coward · 2008-07-23 11:53 · Score: 0
  
  What I like about this comment is that you are essentially bitching about the quality of pirated copyrighted materials. Oh sure, you never came right out and said it, but come on.
  You may possibly win the "least grateful human on earth" award for that.
6. Re:x264 by Sangui · 2008-07-23 11:53 · Score: 1
  
  The term is softsubs.
  And the reason VLC crashed is because it is a shit program with terrible softsub support.
7. Re:x264 by TheLink · 2008-07-23 13:36 · Score: 1
  
  If VLC crashes due to subtitles, it's a flaw in VLC.
  
  VLC has a long way to go before I consider it a good player.
  --
  
  Too many replies beneath your current threshold
8. Re:x264 by coolsnowmen · 2008-07-24 06:43 · Score: 1
  
  ...not to mention containers can contain malicious code such as launching a web browser to a malware site that tries to trick the user into downloading and running a trojan horse...
  The container doesn't do that, it is the player that would chose to do such a thing faced with a codec request it doesn't recognize.
Simple question.. by Anonymous Coward · 2008-07-23 08:58 · Score: 0

simple answer: www.x264.nl
VisualHub... by e4g4 · 2008-07-23 08:59 · Score: 3, Informative

...makes excellent use of multiple cores. It is however Mac-only. Interestingly, what it does is split a file into chunks and spawns multiple ffmpeg processes to do the conversion. Which is to say, perhaps you can do some (relatively simple) scripting with ffmpeg that will do the job.

--
The secret to creativity is knowing how to hide your sources. - Albert Einstein
1. Re:VisualHub... by Anonymous Coward · 2008-07-23 09:22 · Score: 0
  
  If you are going to use platform specific you could just use Compressor in FCP. It will use your CPU power from other computers too after installing Qmaster (comes with FCP) on them. I often turn on my MacBook Pro and include it in the job and it shaves off ~40% of the time transcoding my video by using two computers (6 cores total). The CPU monitor bars on both machines shows all cores busy.
2. Re:VisualHub... by Anonymous Coward · 2008-07-23 10:24 · Score: 0
  
  Visualhub does NOT do this, it spawns a single ffmpeg process but with many threads, it doesn't split a file in to chunks. I think what your thinking of is when you do a h.264 conversion - what it does here is make good use of x264 and all of its parallisation.
  P.S. I currently alpha test for Techspanion
3. Re:VisualHub... by Anonymous Coward · 2008-07-23 11:48 · Score: 0
  
  This is a good way of doing it. Any app that keeps multiple threads busy at the *same time* is the the best way to make use of multiple cores. Not always an easy thing to do. Multithreaded doesn't mean it's going to make use of multiple cores because most multithreaded apps likely just dish off work to another thread while the others remain idle doing nothing. This is quite a change in s/w design IMO. It's very application dependent also, as you have to think of how processing can be split across threads equally rather.
4. Re:VisualHub... by Anonymous Coward · 2008-07-23 15:55 · Score: 0
  
  Macs support two cores? I didn't even know they supported two buttons! /rimshot
5. Re:VisualHub... by e4g4 · 2008-07-24 03:17 · Score: 1
  
  You are, of course, quite right - it uses the -threads flag in ffmpeg. The details may be wrong - but the original point still stands - ffmpeg does a great job with multiple cores.
  
  --
  The secret to creativity is knowing how to hide your sources. - Albert Einstein
x264 and avisynth by PhrostyMcByte · 2008-07-23 09:01 · Score: 2, Informative

x264 and avisynth can make pretty decent use of threads. check out meGUI.
1. Re:x264 and avisynth by figleaf · 2008-07-23 09:13 · Score: 1
  
  Yeah x264 is great. There is a slight quality degradation (albeit you have to look really hard to visually determine the difference) if you use multiple threads.
  I once used a batch file to encode several gigs of my family vacation MJPEG videos to H.264 using x264 in a single background thread over a period of 10 days.
  With some heavy-duty post processing (for noise removal etc) it encoded about a 1 GB source/day. There was no perf. degradation with my other apps (games, email etc.) on account of the video encode.
2. Re:x264 and avisynth by Elbart · 2008-07-23 09:14 · Score: 0
  
  LOL meGUI. An encoder-GUI which needs admin-rights on Vista. No comment.
3. Re:x264 and avisynth by Henriok · 2008-07-23 09:15 · Score: 1
  
  ffmpegX for OSX uses x264 and it's transcoding like mad on my eight core Mac Pro. A 2h Video_TS film conversion to iPhone-ready double pass h264/MPEG4.. in less 20 minutes. Using 720-760% CPU, i.e. just the right ammount for me that uses the machine for other tasks as well.
  
  --
  
  - Henrik
  
  - when the Shadows descend -
4. Re:x264 and avisynth by Parag2k3 · 2008-07-23 09:17 · Score: 1
  
  AviSynth is single threaded, so complicated avs's won't effectively use all possible threads.
5. Re:x264 and avisynth by figleaf · 2008-07-23 09:19 · Score: 1
  
  Thats not correct. The admin-rights are only needed to update Megui. Video encode works fine without admin permissions.
  You can install MeGUI in a non-standard location like c:\tools\megui and not require admin permissions to update.
6. Re:x264 and avisynth by PhrostyMcByte · 2008-07-23 09:48 · Score: 1
  
  The newer version supports SetMTMode which works quite well in many cases.
7. Re:x264 and avisynth by Anonymous Coward · 2008-07-23 09:55 · Score: 0
  
  ffmpegx is not opensource but shareware
8. Re:x264 and avisynth by Henriok · 2008-07-23 10:46 · Score: 1
  
  It's only the GUI that's shareware, what I just told everyone was that the open source codec x264 is threaded and performing very good on SMP systems.
  
  --
  
  - Henrik
  
  - when the Shadows descend -
9. Re:x264 and avisynth by nawcom · 2008-07-23 11:42 · Score: 1
  
  It's a GUI they want you to pay for. In fact they don't come with the open source apps mplayer/mencoder, ffmpeg and mpeg2enc, they make you download it separately. So, if you don't have issues at all with command lines, anything on ffmpegX is simple to run via the terminal, for free.
10. Re:x264 and avisynth by foxyshadis · 2008-07-23 12:17 · Score: 1
  
  There is no quality degradation with threads anymore in x264. Maybe you're thinking of the old threading scheme from over a year and a half ago, where there was a tradeoff? With the new method, threading neither reduces nor increases quality.
11. Re:x264 and avisynth by figleaf · 2008-07-23 14:02 · Score: 1
  
  So x264 has finally become deterministic?
  i.e. Differents runs with multiple threads create the output file?
12. Re:x264 and avisynth by Utopia · 2008-07-24 02:18 · Score: 1
  
  According to http://mewiki.project357.com/wiki/X264_Settings there is some degradation even with the new threading system.
  If this is not right you should update the wiki.
13. Re:x264 and avisynth by Elbart · 2008-07-24 06:03 · Score: 0
  
  Then why does the UAC always come up, although I've installed in a non-standard location?
Beat me to it! by BLKMGK · 2008-07-23 09:05 · Score: 4, Informative

x264 via meGUI from Doom9 is what I use to compress HD-DVD and BD movies - also on a quad core. I have some tutorials posted out and about on how I'm doing it. Near as I can tell you cannot dupe the process on Linux due to the crypto - Slysoft's AnyDVD-HD is needed.
Playback - I use XBMC for Linux. It is also SMP enabled using the ffmpeg cabac patch. the developers of this project have been VERY aggressive at taking cutting edge improvements to the likes of ffmpeg and incorporating them into the code. Since Linux has no video acceleration of H.264 SMP really helps on high bitrate video!

--
Build it, Drive it, Improve it! Hybridz.org
Load balancing: Why? by DigitAl56K · 2008-07-23 09:09 · Score: 4, Insightful

don't balance the workload evenly across processors
Why is balancing the load evenly important, as long as one thread is not bottlenecking the others? Loading a particular core or set of cores might even be beneficial depending on the cache implementation, especially when other applications are also contending for CPU time.
Sure, a nice even load distribution might be an indicator for good design, but it doesn't have to apply in every case. I don't think software should be designed so you can be pleased with the aesthetics of the charts in task manager.
1. Re:Load balancing: Why? by Scottie-Z · 2008-07-23 09:17 · Score: 2, Insightful
  
  Because, ideally, all four cores should be running at 100% -- the idea is to make maximal use of your available resources, right?
2. Re:Load balancing: Why? by DigitAl56K · 2008-07-23 09:34 · Score: 4, Insightful
  
  It's still possible to load all cores 100%.
  A video decoder that I'm working with, for example, currently uses only as many threads as necessary for real-time playback. So for example if one core can do the job only one core is used. If the decoder looks like it might start falling behind more threads are given work to do. Ultimately, if your system is failing to keep up all cores will be fully leveraged.
  However, so long as only some cores are required the others are 100% available to other processes, including their cache (if it's independent). I'm not sure how power management is implemented but perhaps it's even possible for the unused cores to do power saving, leading to longer batter life for laptops/notebooks, etc.
  
  the idea is to make maximal use of your available resources, right?
  No, the idea is to make the best use of your resources. I'm not trying to say that load balancing is wrong. I'm just saying that processes that don't appear to be balanced are not necessarily poorly designed or operating incorrectly.
3. Re:Load balancing: Why? by Anonymous Coward · 2008-07-23 09:38 · Score: 0
  
  Yes, the idea is to maximize the usage of your resources. But as the parent just said
  
  Loading a particular core or set of cores might even be beneficial depending on the cache implementation, especially when other applications are also contending for CPU time.
  He wrote a whole sentence just to tell you why you wouldn't need to post what you just posted, but you didn't read it.
4. Re:Load balancing: Why? by Anonymous Coward · 2008-07-23 09:45 · Score: 0
  
  What an absolutely ridiculous question. Balancing the load isn't an exercise in aesthetics. It's about minimizing the run-time. If you give three of your CPU's 10% of the work and the other one gets 70% then your task going to take 70% of the time it took when running serially. That's four spanking new CPU's that don't even do the job tiwce as fast as the original.
  If, on the other hand, you balance everything nicely, your task will take 25% of the time. That 4x speed up is guaranteed to make you feel a lot better about the mullah you just parted with. Of course, when you realize that there's more to life than ripping video and that you could do it all in batch at night anyway, you'll no doubt suffer a little buyer's remorse, but that's another story.
5. Re:Load balancing: Why? by DigitAl56K · 2008-07-23 10:18 · Score: 1
  
  What an absolutely ridiculous question.
  What an absolutely ridiculous answer. Go read my other post:
  http://tech.slashdot.org/comments.pl?sid=623707&cid=24311347
  Thanks.
6. Re:Load balancing: Why? by Anonymous Coward · 2008-07-23 10:32 · Score: 0
  
  Why? Heat! Balancing load is balancing heat.
7. Re:Load balancing: Why? by fgouget · 2008-07-23 23:22 · Score: 1
  
  Because, ideally, all four cores should be running at 100% -- the idea is to make maximal use of your available resources, right?
  Right. If your code is not maxing out the CPU, then start a few extra threads running 'while (1);'. Your resource usage will be much improved!
  Remember: 100% CPU usage be the sign that the code is awfully inefficient. And <100% CPU usage may be the sign that your I/O handling is inefficient.
Re:Which part of Open Source didn't you get? by Anonymous Coward · 2008-07-23 09:13 · Score: 0

Beggars can't always be choosers, but you can always be a prick.
I know this is wrong to say by Anonymous Coward · 2008-07-23 09:15 · Score: 0, Flamebait

But I actually really hope the person who "asked Slashdot" this dies in a fire. Honestly. Is Google THAT broken these days?
Yes, this is a troll. Mark it as such and feel the peace. I don't mean to "troll" as such, I don't care for replies. I just reiterate my first sentence. Fire, die in one. Use your God/creation given brain next time.
Re:Which part of Open Source didn't you get? by phuul · 2008-07-23 09:16 · Score: 2, Informative

So is ffmpeg not open source? It uses the LGPL license and from their license FAQ:

"FFmpeg is licensed under the GNU Lesser General Public License (LGPL). However, FFmpeg incorporates several optional modules that are covered by the GNU General Public License (GPL), notably libpostproc and libswscale. If those parts get used the GPL applies to all of FFmpeg. Read the license texts to learn how this affects programs built on top of FFmpeg or reusing FFmpeg. You may also wish to have a look at the GPL FAQ. "
Since his suggestion was to do some scripting that does essentially what VisualHub does using ffmpeg I'm not sure I see how he missed the Open Source requirement.
Handbrake by vfs · 2008-07-23 09:18 · Score: 5, Informative

Handbrake has always used both of the cores on my system for transcoding.
1. Re:Handbrake by Anonymous Coward · 2008-07-23 09:38 · Score: 1, Informative
  
  Handbrake has always used both of the cores on my system for transcoding.
  ... and is only good for transcoding DVDs. Sure it's nice and simple for that one thing, but I assume the submitter wants more than that.
2. Re:Handbrake by catmistake · 2008-07-23 09:57 · Score: 4, Informative
  
  that's because Handbrake uses ffmpeg
  
  --
  The Admin and the Engineer
3. Re:Handbrake by crmarvin42 · 2008-07-23 10:53 · Score: 2, Informative
  
  It's good for Video_TS folders in general. In fact, a handful of DVD's can't be ripped directly from the disk using handbrake and need to be copied to HD via something like MacTheRipper before being transcoded by Handbrake. I don't know what format the guy is trying to transcode from, but most people only need to transcode DVD's.
  
  --
  Bureaucracy expands to meet the needs of the expanding bureaucracy.-Oscar Wilde
4. Re:Handbrake by MsGeek · 2008-07-23 11:17 · Score: 1
  
  But the program will NOT transcode from .VOB to .DV. That's all I want to do. I want to point Handbrake to a VOB and have it transcode direct to .DV, particularly Final Cut-friendly .DV. Yeah I'm on a Mac. MacBook Core 2 Duo (Merom) 2GHz. I converted from .VOB to .MKV, then I took the .MKV into VisualHub. The transcode in VisualHub died silently towards the end. Fail.
  There has GOT to be a better way. On Mac. I'm willing to learn command line apps to do this if I can take a .VOB and convert direct to .DV.
  
  --
  Knowledge is power. Knowledge shared is power multiplied.
5. Re:Handbrake by SoupIsGoodFood_42 · 2008-07-23 17:12 · Score: 1
  
  Except if you use the x264 option, which I have no idea of how it affects this issue.
6. Re:Handbrake by SoupIsGoodFood_42 · 2008-07-23 17:17 · Score: 1
  
  The other thing it doesn't do in general is take any .vob file(s) that isn't part of a full DVD rip.
7. Re:Handbrake by Anonymous Coward · 2008-07-23 17:22 · Score: 1, Informative
  
  Handbrake uses x264, though you can use ffmpeg if you want. And Handbrake can take as many cores as you want to throw at it...
8. Re:Handbrake by catmistake · 2008-07-24 03:00 · Score: 1
  
  Hmm... I always thought it used ffmpeg to use x264.
  
  --
  The Admin and the Engineer
9. Re:Handbrake by Anonymous Coward · 2008-07-24 04:31 · Score: 0
  
  I have found that Handbrake's (or FFMPEG's?) H.264 encoder uses ~100% of CPU time, while his DivX encoder only uses ~80% on a Intel Core 2 Duo machine.
10. Re:Handbrake by SoupIsGoodFood_42 · 2008-07-25 15:46 · Score: 1
  
  They're listed as separate encoders in the GUI, but that's hardly definitive.
Re:Which part of Open Source didn't you get? by pushing-robot · 2008-07-23 09:20 · Score: 4, Informative

OP is asking for open source tools. You cited a commercial one that doesn't provide source.
VisualHub (the front-end app) may be closed, but ffmpeg is LGPL.
And the GP was suggesting using ffmpeg, not VisualHub.

--
How can I believe you when you tell me what I don't want to hear?
F(next) = F(current) + Delta(F(current:next)) by Lumenary7204 · 2008-07-23 09:21 · Score: 5, Insightful

The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.
For example, suppose I have a table with four columns -- three holding input values (A, B, and C) and one holding an output value (X). If the data in a given row of the table has nothing to do with the data in any other row, multi-threading works efficiently, because none of the threads are waiting for data from any of the other threads. If I want to process multiple rows at once, I simply spawn additional threads.
On the other hand, for data such as MPEG video, the composition of the next frame is equal to the composition of the current frame, plus some delta transformation - the changed pixels.
This introduces a dependency which precludes efficient multi-threaded processing, because each succeeding frame depends on the output of the calculations used to generate the prior frame. Even if more than one core is dedicated to processing the video stream, one core would wind up waiting on another, because the output from the first core would be used as the input to the second.
1. Re:F(next) = F(current) + Delta(F(current:next)) by Lumenary7204 · 2008-07-23 09:25 · Score: 2, Informative
  
  Note that the above example is about the video component only of a single MPEG audio/video stream.
  There is no reason that an encoder/decoder can't process audio in one thread and video in another, thereby using more than one core (which has already been discussed in other posts relating to this article).
2. Re:F(next) = F(current) + Delta(F(current:next)) by Omega996 · 2008-07-23 09:36 · Score: 4, Insightful
  
  theoretically, couldn't an encoder scan the data stream for keyframes, chunk the data from keyframe to the next keyframe, and then queue up the keyframe+delta information for multiple cores? That way, each core has something to do that isn't dependent upon the completion of something else.
  i'd think that n-1 cores/threads/whatever to process the chunked data, and the last core/thread/whatever to handle overhead and i/o scheduling would run pretty nicely on a multi-core machine.
3. Re:F(next) = F(current) + Delta(F(current:next)) by Anonymous Coward · 2008-07-23 09:37 · Score: 0
  
  Learn about I-, P- and B-Frames before you write long-winded bologna about stuff you clearly don't understand.
4. Re:F(next) = F(current) + Delta(F(current:next)) by ZachPruckowski · 2008-07-23 09:38 · Score: 1
  
  MPEG uses keyframes, right? So you'll still have a full frame in there every few frames. When I play back a MP4 I encoded, I wind up with something like a full frame every second or two (with the intermediate frames being the transformations you mentioned). So you can split at those frames. That's not infinitely parallel, but if we split it up by minute-sized segments, we'd have 90-150 segments (based on movie length), which is plenty for any prosumer computer for the foreseeable future, and even plenty for smaller clusters (that's 30 quad-cores or so).
5. Re:F(next) = F(current) + Delta(F(current:next)) by Zygfryd · 2008-07-23 09:40 · Score: 2, Informative
  
  http://en.wikipedia.org/wiki/Group_of_pictures
  You can encode GOPs independently. I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.
6. Re:F(next) = F(current) + Delta(F(current:next)) by John+Betonschaar · 2008-07-23 09:41 · Score: 4, Insightful
  
  You could of course split each frame in slices, and process these in parallel. Or skip the video N frames between each core, with N being the number of frames between MPEG keyframes. Or have core 1 do the luma and core 2 and 3 the chroma channels. Or pipeline the whole thing and have core 1 do the DCT, core 2 the dequant etc. and have core 3 reconstruct the output reference frame while core 1 already starts the next frame.
  Plenty of ways to parallelize decoding, and even more for encoding...
7. Re:F(next) = F(current) + Delta(F(current:next)) by Anonymous Coward · 2008-07-23 09:45 · Score: 0
  
  Doesn't MPEG have key frames? Surely each core could grind out work units of N delta frames, starting from different key frames?
8. Re:F(next) = F(current) + Delta(F(current:next)) by tjugo · 2008-07-23 09:50 · Score: 1
  
  Your explanation is not accurate.
  Most video compression techniques including MPEG set a maximum number frames between base frames. A base frames can be decoded without any information about previous or future frames.
  All the motion vectors or deltas are calculated against the closest previous base frame. Theoretically you can parallelize the decoding into the total number of base frames your video stream has. If you are decoding a 60 minute video encoded using a base frame every 1s you can split the job into 3600 independent tasks.
  Video decoding in nature is well suited for multi-threads systems.
9. Re:F(next) = F(current) + Delta(F(current:next)) by ubergeek65536 · 2008-07-23 09:53 · Score: 1
  
  You're just plain wrong. There are lots of ways to fully use SMP when either decoding or encoding MPEG streams. Not only are frames grouped starting with a jpg like keyframe using something called a GOP each frame is constructed of blocks which are usually 16 pix square of which each block can be processed on a different thread. All the audio streams and video streams can be processed by multiple threads too.
10. Re:F(next) = F(current) + Delta(F(current:next)) by Anonymous Coward · 2008-07-23 09:59 · Score: 0
  
  you totally missed what he wanted to do. He wanted video conversion. This means generally 1-N decode and 1-N encode processes. Multithreading is well suited to that, since you can very easily split the two.
  Most apps however do not do it this way, as the author pointed out, because instead of having a managed buffer that is shared its far easier to just have 1 function that basically does a decodeSrc(); encodeDst(); and loop. It ends up being inexperience or laziness that results in this.
11. Re:F(next) = F(current) + Delta(F(current:next)) by liusu119 · 2008-07-23 10:01 · Score: 1
  
  The dependency only holds with in 1 segment between keyframes during decoding. For encoding, there is no such thing as F(current), the encoder should know all of F. Even though current MPEG encoder implementation may not be like this, but for static transcoding, the delta of all frames could be computed with no dependency to each other. So I don't think it's a limitation on MPEG itself. It's a problem of how frames are served and how the encoder takes frames.
12. Re:F(next) = F(current) + Delta(F(current:next)) by semiotec · 2008-07-23 10:02 · Score: 1
  
  The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
  Not quite true. Someone above already explained some of this re VisualHub.
  The video data/frame at 0:00 is very likely completely unrelated to the data/frame at 5:00, thus you can simply chop up the raw file into a number of segments and process them in parallel.
  Some clever stitching is probably required to put the whole thing back together in the end.
  
  Multi-threading is most efficient when it is applied to discrete data sets that have little or no dependency on each other.
  Exactly, so you chop up the raw input into segments and they become discrete data sets.
13. Re:F(next) = F(current) + Delta(F(current:next)) by init100 · 2008-07-23 10:13 · Score: 2, Insightful
  
  I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.
  Is this even needed if you use multi-pass encoding? At least for XviD, IIRC the first pass is used to accumulate statistics used to allocate the proper bit budget to each frame. Then the individual processes should be able to use the statistics file from the first pass to get the bit allocation for their current GOP in the second pass.
14. Re:F(next) = F(current) + Delta(F(current:next)) by Anonymous Coward · 2008-07-23 10:20 · Score: 0
  
  I would think that each GOP could be worked on by separate cores. Additionally, macroblocks in a frame can also be I, P, or B...you'd think an encoder could have cores work on their own macroblocks as well.
15. Re:F(next) = F(current) + Delta(F(current:next)) by Anonymous Coward · 2008-07-23 10:22 · Score: 0
  
  Your explanation is overly simplistic.
  If you chunk by all the frames between the I frames, each one of those chunks can be encoded on different core. Depending on the stream format, the GOP (group of pictures) is mandated. So it is fairly straight forward to parallelize encoding.
  http://en.wikipedia.org/wiki/MPEG-1#Frame.2FPicture.2FBlock_Types
16. Re:F(next) = F(current) + Delta(F(current:next)) by Fry-kun · 2008-07-23 11:01 · Score: 1
  
  But MPEG has keyframes - you need them for scene changes and error recovery. There's one at least every few seconds. For offline video, the threads can work on different keyframes & their respective deltas.
  For online video, it's harder.. but still can be done. Similar to how two-videocard setups work, you can split the image into pieces and have each CPU work on a particular piece, since there's little relation between . Of course it becomes very hard to scale beyond a certain point... but 2-4 cores/CPUs should be doable, algorithm-wise.
  
  --
  Did you know that "FTW" ("for the win") is a direct translation of "Sieg Heil"?
17. Re:F(next) = F(current) + Delta(F(current:next)) by Lumenary7204 · 2008-07-23 11:04 · Score: 1
  
  > Most video compression techniques including MPEG set a maximum number
  > frames between base frames. A base frames can be decoded without any
  > information about previous or future frames.
  >
  > All the motion vectors or deltas are calculated against the closest
  > previous base frame.
  Yeah, I forgot to include the whole keyframes thing... My bad. I should have said "not always."
  However, the problem with keyframes is that their placement is often artificial; i.e., every 30 frames or so.
  This limits the "compressability" of the stream by forcing the stream to carry a full-size frame for each second of video (at 30 frames/second). Note that I am not talking about the "lossy" compression applied to the pixels of the keyframe itself.
  In some instances, the video stream could be compressed much more efficiently by analyzing the stream and including only a few keyframes during certain scenes, and using a full compliment of keyframes during more "active" scenes.
  A good example of where this method could be used is the scene in Star Wars IV: A New Hope where the camera is looking at a sand dune, and Threepio is slowly revealed as he climbs the dune toward the camera from the other side. You could probably get away with just a few keyframes -- perhaps one for every 150 frames -- for the bulk of the scene. One the other hand, the space battle sequences over Endor in Star Wars VI: Return of The Jedi, being much more active scenes, would probably require one keyframe per 30 frames.
  This could, of course, impose quite a performance penalty when originally encoding the video, as the encoder would need to cache quite a few frames -- let's say 1,200 frames (40 seconds at 30 frames/second) -- then analyze the chain of frames to determine how quickly the scene is changing. After the current 1,200 frame "window" is processed, the encoder would shift the window 120 frames downstream (for a 1,080 frame overlap), and repeat the process.
  (Note again, for simplicity, I am not including audio synchronization as part of the discussion. Some formats require regular keyframes to keep the audio and video synchronized.)
18. Re:F(next) = F(current) + Delta(F(current:next)) by Anonymous Coward · 2008-07-23 11:49 · Score: 0
  
  This is not true. Only some of the frames depend upon previous frames. It is trivial to parallelize this by splitting the movie apart at the "I" frames. Each "I frame" can be rendered independently of the prior video stream. This is how you can start watching a show from satellite tv in the middle of the show(You just wait for the first I frame, and start displaying from there).
19. Re:F(next) = F(current) + Delta(F(current:next)) by sog_abq · 2008-07-23 12:10 · Score: 1
  
  Perhaps I'm over-simplifying the problem, but: You could split the image frame into sections (perhaps halves if you only have two cores) and have separate threads processing each half in parallel and then assemble them later on with a third thread. Assuming that you can support the data bandwidth, then you have decreased your processing time because the processing queues and the assembling thread are all running in parallel, which means you're limited only by the number of core available and the data bandwidth to get the bits to the cores. Since this seems so obvious, what am I missing?
20. Re:F(next) = F(current) + Delta(F(current:next)) by foxyshadis · 2008-07-23 12:24 · Score: 2, Informative
  
  Several implementations of this exist: For x264, there's x264farm (which includes network encoding as well).
21. Re:F(next) = F(current) + Delta(F(current:next)) by fatphil · 2008-07-23 12:44 · Score: 1
  
  Just think of pipelining. If you buffer enough input, there's always enough work for the output stage(s) to work on.
  
  --
  Also FatPhil on SoylentNews, id 863
22. Re:F(next) = F(current) + Delta(F(current:next)) by adisakp · 2008-07-23 13:23 · Score: 1
  
  The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
  
  You can also break down a frame into discrete rectalinear regions and have each separate region be performed by various threads. This block-based approach is a simple (although probably not 100% optimal) way to get parallelism with an operation involving only two buffers (current and output) in any 2D filter / transform from MPEG frame decoding to JPEG decompression to a Photoshop filter.
  
  For a bit better performance, make sure to align the blocks so you're not getting cache sharing misses on the block boundaries if possible.
23. Re:F(next) = F(current) + Delta(F(current:next)) by benwaggoner · 2008-07-23 13:38 · Score: 2, Insightful
  
  You can encode GOPs independently. I think the only dependency between GOP encoding processes is bit allocation, which probably works well enough if you simply assign each process an equal share of the total bit budget.
  That's a pretty painful constraint for anything other than very flat constant bitrate encoding. You really want to be able to move bits between GOPs to optimize for consistant quality.
  
  --
  
  My video compression blog
24. Re:F(next) = F(current) + Delta(F(current:next)) by zaffir · 2008-07-23 14:03 · Score: 1
  
  Totally not true. Just slice the screen into N parts, where N is the number of cores you have. Ta-da, problem solved.
  
  --
  "Upon attaching the waterblock to my penis, I began to notice that I know nothing about computers." -- JRockway
25. Re:F(next) = F(current) + Delta(F(current:next)) by Trixter · 2008-07-23 14:31 · Score: 1
  
  The problem with MPEG encoding and decoding is that the data itself is not well suited to multi-threaded analysis.
  Spoken like someone who has never written an encoder. There are dozens of opportunities to thread, even if it's just as simple as each stage breaking a frame up into N bands, one for each core. DCT transforms, motion vector synthesis, etc. are not dependent on the entire frame being encoded as a whole (although they do need to be assembled correctly once all threads finish).
26. Re:F(next) = F(current) + Delta(F(current:next)) by sunderland56 · 2008-07-23 15:45 · Score: 1
  
  theoretically, couldn't an encoder scan the data stream for keyframes
  Well, no. The input is uncompressed video - there are no keyframes. One of the jobs that an encoder has to do is to generate keyframes - i.e. to decide where to insert one. This decision has little to do with the input video.
  (It is generally efficient to put a keyframe where the input video changes dramatically - for instance, a cut from one scene to another - but that can't be your only algorithm).
  MPEG video is based on macroblocks - 16 x 16 square blocks of pixels - and it is possible to split the processing of these blocks (or more commonly horizontal bands of these blocks) between different processors. However, this requires one parent process to handle the splitting of the input data and the assembling of the final output data - so the parallelism must be built into the algorithm.
27. Re:F(next) = F(current) + Delta(F(current:next)) by im_thatoneguy · 2008-07-23 18:23 · Score: 1
  
  To take it further usually MPEG interframe compression only spans about 5-10 frames. In which case you could say... start at the end and beginning and work towards one another.
28. Re:F(next) = F(current) + Delta(F(current:next)) by Bert64 · 2008-07-23 20:27 · Score: 1
  
  But the audio is much simpler to encode, resulting in maybe 20% usage of the second core while the first is 100% used encoding video...
  The best bet really, is to just encode multiple files at once, use something like xargs to ensure that there are always 4 encoding jobs running (assuming 4 cores), as someone else said - multi threading works better when there is no interdependency of the data.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
29. Re:F(next) = F(current) + Delta(F(current:next)) by gnasher719 · 2008-07-23 23:45 · Score: 1
  
  All the motion vectors or deltas are calculated against the closest previous base frame. Theoretically you can parallelize the decoding into the total number of base frames your video stream has. If you are decoding a 60 minute video encoded using a base frame every 1s you can split the job into 3600 independent tasks.
  The problem is that you usually don't actually want to _decode_ a video stream, you want to display it. So having thread #3599 decode the frames that you want to see one hour from now is quite pointless. You will always have to keep decoded frames in RAM until they are displayed (and are not needed when decoding further frames), so that is limiting you.
  
  1080p needs about 3 MByte per frame, that is 90MByte at 30 fps. Lets say you have two cores, each capable of decoding 20 fps. You'd start them both decoding the first and the second set of frames, and wait a while until you start displaying. You might have to stop decoding because you run out of memory to store more frames (and you wouldn't want them to be paged out; 90 MByte per second is a bit much for paging). I guess you need enough RAM to store the longest sequence of frames.
  
  Now if you are transcoding from one format to another, that is a different matter.
30. Re:F(next) = F(current) + Delta(F(current:next)) by evilviper · 2008-07-24 00:12 · Score: 1
  
  You could of course split each frame in slices, and process these in parallel.
  That is commonly how it's done, but you do lose some bitrate efficiency that way, and it doesn't parallelize perfectly as many operations need to be performed across the whole frame after the slices are reassembled, before the next frame/slices can be processed. x264 did this, but (somewhat recently) dumped this method for something better.
  
  Or skip the video N frames between each core, with N being the number of frames between MPEG keyframes.
  No (remotely decent) MPEG encoder made in the past 2 decades uses a fixed GOP-size. That's horribly inefficient.
  
  Or have core 1 do the luma and core 2 and 3 the chroma channels.
  Do *WHAT* to the luma and chroma channels?
  
  Or pipeline the whole thing and have core 1 do the DCT, core 2 the dequant etc.
  That sounds like a core relay race... each one sitting idle most of the time, as it waits for data from the previous. It doesn't matter how fast you can do DCT and quantization, since the (slow) motion estimation, scene-change detection, and rate control first need to process the frame to decide how to change what happens to the next frame.
  Parallel quantization is useless, because quant levels are different for each frame due to bitrate constraints and frame complexity. Parallel DCT is useless, since you need to know whether it's going to be an I, P, or B frame, and the differential from the previous frame, to do anything. etc.
  
  Plenty of ways to parallelize decoding, and even more for encoding...
  A little bit of knowledge is a dangerous thing... You've mostly come up with lots of ways to keep the CPUs maxed-out, doing busy work that can't help speed-up encoding at all.
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
31. Re:F(next) = F(current) + Delta(F(current:next)) by John+Betonschaar · 2008-07-24 02:41 · Score: 1
  
  That is commonly how it's done, but you do lose some bitrate efficiency that way, and it doesn't parallelize perfectly as many operations need to be performed across the whole frame after the slices are reassembled, before the next frame/slices can be processed. x264 did this, but (somewhat recently) dumped this method for something better.
  Video encoders chop up frames into slices anyway for different reasons (different slice characteristics, concealment for transmission errors). Full-frame slices were common for MPEG-1 but modern codecs almost invariably use sliced frames now.
  
  No (remotely decent) MPEG encoder made in the past 2 decades uses a fixed GOP-size. That's horribly inefficient.
  I think you are wrong about that, lots of encoders still use fixed size GOPS. There's nothing inherently 'inefficient' about it, as fixed GOP *size* does not imply fixed-GOP *structure*. Even then, with 2-pass encoding (which you want anyway if you're concerned about quality) you already figured out the video characteristics after the 1st pass, including GOP structure, so parallelization on the GOP level is still an option.
  
  Do *WHAT* to the luma and chroma channels?
  Luma and chroma-U/chroma-V are independent, so you can process them in parallel. Modern codecs can even use different motion vectors for luma/chroma, so with these you can even parallelize motion estimation.
  
  That sounds like a core relay race... each one sitting idle most of the time, as it waits for data from the previous. It doesn't matter how fast you can do DCT and quantization, since the (slow) motion estimation, scene-change detection, and rate control first need to process the frame to decide how to change what happens to the next frame.
  It was only a suggestion, of course you would be limited by the most expensive stage, but assigning different encoder/decoder stages to different cores is still a whole lot better than single-threaded encoding/decoding. It's exactly what heterogenuous multicore systems like the Cell BE use, with cores to spare and autonomous DMA transfers.
  
  Parallel quantization is useless, because quant levels are different for each frame due to bitrate constraints and frame complexity. Parallel DCT is useless, since you need to know whether it's going to be an I, P, or B frame, and the differential from the previous frame, to do anything. etc.
  I wasn't talking about parallelizing quantization (which is cheap anyway) or DCT.
  
  A little bit of knowledge is a dangerous thing... You've mostly come up with lots of ways to keep the CPUs maxed-out, doing busy work that can't help speed-up encoding at all.
  Excuse me, but isn't keeping the cores busy exactly what you want, as long as each core does something useful? It's idle cores that don't speed up anything.
  As for the 'little bit of knowledge': I've written my own MPEG-1 parallel decoder for the PS3 so I guess I know what I'm talking about. Although not nearly complete or useful for anything other than educational purposes, it does use some of the techniques I described with fine results.
32. Re:F(next) = F(current) + Delta(F(current:next)) by evilviper · 2008-07-24 19:58 · Score: 1
  
  Video encoders chop up frames into slices anyway for different reasons (different slice characteristics, concealment for transmission errors). Full-frame slices were common for MPEG-1 but modern codecs almost invariably use sliced frames now.
  Simply not true. Some may, but most certainly do not. I can tell you for a fact that x264 does not, neither do any of the ffmpeg encoders.
  
  There's nothing inherently 'inefficient' about it, as fixed GOP *size* does not imply fixed-GOP *structure*.
  Fixed GOP size requires you to place an I-frame at a specific interval. That I-frame placement will invariably NOT fall on a scene change, therefore wasting a substantial number of bits. And no, most encoders do not use a fixed GOP size (just a maximum, and sometimes also a minimum).
  
  Even then, with 2-pass encoding (which you want anyway if you're concerned about quality) you already figured out the video characteristics after the 1st pass, including GOP structure, so parallelization on the GOP level is still an option.
  Yes, that's true. You can get a speed-up, but solely on the second-pass.
  
  Luma and chroma-U/chroma-V are independent, so you can process them in parallel. Modern codecs can even use different motion vectors for luma/chroma, so with these you can even parallelize motion estimation.
  
  I wasn't talking about parallelizing quantization (which is cheap anyway) or DCT.
  Well then, I'm still waiting to hear exactly what operations you think can be parallelized.
  
  As for the 'little bit of knowledge': I've written my own MPEG-1 parallel decoder for the PS3 so I guess I know what I'm talking about.
  A DECODER is vastly different from an ENCODER. A decoder has very few iterative processes that can't be parallelized well, while an encoder has a very large number.
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
33. Re:F(next) = F(current) + Delta(F(current:next)) by Omega996 · 2008-07-25 02:02 · Score: 1
  
  did the OP mention he was working with uncompressed streams? I missed that. I must admit I am guilty of presuming he was using some sort of codec, dv or mpeg2.
Max CPU? by HaeMaker · 2008-07-23 09:22 · Score: 1

Huh? I am using AGK and my CPU never does anything. It is always waiting for I/O. I must be doing something wrong...
1. Re:Max CPU? by reikoshea · 2008-07-23 09:27 · Score: 1
  
  I am of the same mind. On my E6400 Disk IO is my biggest bottleneck when running conversions. The 10k Drive boosted conversion speed from 3x to about 3.7x for a 700MB 1hr TV show. Not a great result, but not bad either.
2. Re:Max CPU? by Barny · 2008-07-23 12:34 · Score: 2, Interesting
  
  With video conversion faster storage (not low latency) is the big winner, with huge cache being a close second.
  If you want the fastest video encodes with no care to cost, get an 8-way pci-e raid card and 8 laptop sata HDD, small and very very fast in a stripe raid.
  
  --
  ...
  /me sighs
3. Re:Max CPU? by Anonymous Coward · 2008-07-23 19:51 · Score: 0
  
  Huh? I am using AGK and my CPU never does anything. It is always waiting for I/O. I must be doing something wrong...
  I'd say so. Benchmark your HDDs and stuff and see whether you've got an unexpected bottleneck. DVD video spec is a max of 10.something MB/sec, which any hard disk should do without stressing. If you're I/O bound something must be a little strange.
Re:Which part of Open Source didn't you get? by mweather · 2008-07-23 09:24 · Score: 1

And told him how it uses an open source program in an easily-replicatable way.
keyframes by Anonymous Coward · 2008-07-23 09:29 · Score: 5, Informative

Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content). These are called keyframes (K) and the delta frames (called P and I frames) are generated against them. Because of this, it is really easy to apply parallel processing to video encoding.
1. Re:keyframes by Anonymous Coward · 2008-07-23 09:45 · Score: 0
  
  Your HDV camera's MPEG output will be a long GOP with the key frame every 15. And if you are going to be doing fancy editing effects, it's often best just to get out of the GOP format and edit with a keyframe codec such as ProRes 422.
2. Re:keyframes by DigitAl56K · 2008-07-23 10:14 · Score: 4, Informative
  
  Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content).
  That is not true for MPEG-4 unless you have specifically constrained the I/IDR interval to an extremely short interval, and doing so severely impacts the efficiency of the encoder because I-frames are extremely expensive compared to other types.
  Keyframes are usually inserted when temporal prediction fails for some percentage of blocks, or using some RD evaluation based on the cost of encoding the frame. Therefore unless the encoder has reached the maximum key interval the I frame position requires that motion estimation is performed, and thus you can't know in advance where to start a new GOP.
  In H.264 due to multiple references you would certainly have issues to contend with since long references might cross I-frame boundaries, which is why there is the distinction of "IDR" frames, and this would certainly not be possible threading at keyframe level.
  Granted, for MPEG1&2 encoders threading at keyframes is a possibility, although still not one I'd personally favor.
3. Re:keyframes by statemachine · 2008-07-23 10:42 · Score: 1
  
  How did you get a +5 Informative when you're wrong?
  First off, which MPEG spec has a K-frame? An I-frame is not a delta frame, it's more like your "keyframe." P and B are the delta frames.
  Secondly, there's very little to parallelize if you're working with open Groups of Pictures (GOP), that is to say every GOP references into the next GOP. If you have closed GOPs, then you can do this a little better by putting the next GOP on another core/CPU.
  But will you gain a significant speedup? The problem is not just chugging away on code. It's all the data that needs to fly around. Your core will be IO bound while your data cache and bus gets hammered.
  You'll find more benefits from encoding shortcuts than you will by simply flinging another core at it.
4. Re:keyframes by elgaard · 2008-07-23 11:15 · Score: 1
  
  Yes, but you would only need one keyframe per cpu/core.
  E.g. on a dualcore let one core encode the first half and the other core the second half.
5. Re:keyframes by TwinkieStix · 2008-07-23 11:26 · Score: 3, Informative
  
  This may be true for sending entire frames to threads, but in mpeg4, frames are broken up into chunks. Motion vectors are created that allow these chunks to move about the image from frame to frame. Other filters are used to remove blockiness, compress the image, do motion detection and macroblock detection, and do various other tasks. MPEG4, especially H.264, can be easily multi-threaded: http://ietisy.oxfordjournals.org/cgi/content/abstract/E88-D/7/1623 http://adsabs.harvard.edu/abs/2004SPIE.5308..384L http://www.electronicsweekly.com/Articles/2007/05/02/41296/aspex-targets-parallel-processor-at-blu-ray-dvd.htm When doing a two-pass encode, this is even easier because the keyframes are discovered on the first (faster) pass, so (if encoding already couldn't be threaded) it could by taking advantage of the known keyframe markers in at least the second pass. But, that's not necessary. I use handbrake to create H.264 videos under Linux all the time on my dual core machine, and both processors stay between 80%-90% utilization from start to finish regardless of the number of passes.
6. Re:keyframes by srw · 2008-07-23 12:11 · Score: 2, Informative
  
  Slight correction: in MPEG, the keyframes are called I-Frames. The delta frames are B and P frames. Most MPEG2 encoders that I have used default to a 15 frame GOP.
7. Re:keyframes by DigitAl56K · 2008-07-23 12:30 · Score: 1
  
  I agree that MPEG4 can be easily multithreaded, but it is not threaded by entire GOPs, as the GP suggests, in any encoder that I know of for the reasons I gave. Frame-level and slice-level threading are the two common techniques. I do actually work on MPEG-4 codecs.
8. Re:keyframes by Anonymous Coward · 2008-07-23 13:45 · Score: 1, Informative
  
  These are called keyframes (K) and the delta frames (called P and I frames) are generated against them.
  This is essentially correct, but a little bit off in the terminology. The keyframes are called Intra-coded (I) frames. The delta frames are called Predictive-coded (P) or Bidirectionally-predictive-coded (B) frames depending on whether or not they depend on frames from the future.
9. Re:keyframes by Prefader · 2008-07-23 14:41 · Score: 1
  
  Actually, keyframes are called "I" frames. There is no such thing as a "K" frame in an MPEG video GOP. The other two types are predective ("P" frames) and bi-predictive ("B" frames).
10. Re:keyframes by Prefader · 2008-07-23 14:49 · Score: 1
  
  Wow - I swear that reply from an hour before mine was NOT there 10 minutes ago.
11. Re:keyframes by evilviper · 2008-07-24 00:31 · Score: 1
  
  Actually, the MPEG stream resets itself every n frames or so (n is often a number like 8, but can vary depending on the video content).
  "Or so" and "can vary" being the key points.
  Until the encoder processes the entire video, and decides where it is most appropriate to place "keyframes" (I-frames), you can't parallelize anything. The same goes for rate-control (unless you're strictly CBR, which is similarly terribly wasteful).
  This is why x264 performs it's first pass almost entirely on a single core (perhaps throwing 20% of the work onto a second core), and only on the second pass is it fully multi-threaded.
  
  --
  Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
What about playing? by Godji · 2008-07-23 09:30 · Score: 1

Is there anything out there that can play a high-bitrate obese .mkv Blueray backup rip efficiently on 2 or 4 cores?
1. Re:What about playing? by LordHatrus · 2008-07-23 11:04 · Score: 1
  
  both mplayer and VLC run that sort of thing fine on my E6700 (2 cores) without noticeably dropping frames, even in action scenes. Are you using hardware acceleration? It helps a lot...
2. Re:What about playing? by Godji · 2008-07-23 11:35 · Score: 1
  
  Well, I'm using Linux, and as far as I know, the video card does absolutely nothing (well, maybe scaling). What acceleration are you talking about? I have the exact same CPU by the way, and it works fine on most things. But try something _really_ high bitrate - like 5 GB for 30 minutes or something - and see what happens. It stutters, but only slightly, so if the other core was actually doing something to help, it would be fine.
3. Re:What about playing? by Barny · 2008-07-23 12:40 · Score: 1
  
  With the windows media player classic, you can indeed have your video hardware speed things up, it does it by making a 2 triangle direct3d window and rendering the video stream as a texture, with today's low end video cards this takes a load off the CPU having to do overlays in 2d windows.
  Also (and I know its not OSS) but corecodec does a great job, ffmpeg under windows is very bad at threading h.264 content, to the point where a fast AMD dual core will struggle with 1080p, but corecodec plays it back smooth. Also they have a history of being OSS project friendly (after the initial knee-jerk by a law dog of theirs of course).
  
  --
  ...
  /me sighs
MPEG Algorithm by c0d3r · 2008-07-23 09:38 · Score: 1

The mpeg algorithm is called DCT Cosine. If this is parallaizable, then mpeg encoding/decoding should be, although there is no way a general processor can beat an asic in silicon.
1. Re:MPEG Algorithm by Thundersnatch · 2008-07-27 08:50 · Score: 1
  
  The mpeg algorithm is called DCT Cosine.
  No, the Discrete Cosine Transform is just one mathematical operation used by MPEG video encoders. It (and its inverse) are generally available as efficient vector-oriented hardware instructions in recent CPUs.
  Much of the CPU time used during an MPEG2 or 4 encode is spent on motion estimation (essentially finding similar blocks of pixels in the current frame from other frames). Motion estimation is trivially parallelizable if you have shared read access to all the frames in memory. One thread searches for matches in frame 1, another searches in frame #2, etc.
Of it's there, but hidden because is a hinderance by Anonymous Coward · 2008-07-23 09:39 · Score: 0

Part of the reason you find a lack of SMP is because it actually negatively impacts the quality of the encode (Though not greatly). Alot of the time it's there, but hidden. The encoder looks at the frames around the current one being encoded and changes it's output based on what is found, to make things run smoother on future frames. When you start adding additional threads you have to somehow break up the file into sections , or have each thread do sequential frames. The result being the encoder can't use it's wizardry to it's full effect.
Windows? VirtualDub 1.8.x + ffdshow-tryouts by tdelaney · 2008-07-23 09:44 · Score: 3, Informative

You don't say if you're running on Windows or Linux or something else. If you are running on Windows, the latest versions of VirtualDub have made big improvements to SMT/SMP encoding.
VirtualDub home
VirtualDub 1.8.1 announcement
VirtualDub downloads
Make sure you grab 1.8.3 - 1.8.1 was pretty good, but had a few teething problems. 1.8.2 has a major regression which is fixed in 1.8.3. The comments in the 1.8.1 announcement contain a few important tips for using the new features (some of which I posted BTW).
The two major new features that would be of interest to you are:
1. You can run all VirtualDub processing in one thread, and the codec in another. This works very well in conjunction with a multi-threaded codec - this one change improved my CPU utilitisation from approx 75% to 95% on my dual-core machines - with an equivalent increase in encoding performance.
2. VD now has simple support for distributed encoding. You can use a shared queue across either multiple instances of VD on a single machine, or across multiple machines (must use UNC paths for multiple machines). Each instance of VD will pick the next job in the queue when it finishes its current job. Instances can be started in slave mode (in which case they will automatically start processing the queue).
I use 3 machines for encoding (all dual-core). With VD 1.8.x I start VD on two of the machines in slave mode, and one in master mode. I add jobs to the queue on the master instance, and the other two instances immediately pick up the new jobs and start encoding. When I've added all the jobs, I then start the master instance working on the job queue.
To achieve a similar effect on your quad-code, start two instances of VD on the same machine - one slave, the other master.
It's not perfect (if you've only got one job, you won't use your maximum capacity) but it has greatly simplified my transcoding tasks, and reduced the time to transcode large numbers of files.
1. Re:Windows? VirtualDub 1.8.x + ffdshow-tryouts by Anonymous Coward · 2008-07-23 10:46 · Score: 0
  
  Ugh.. VDub essentially uses vfw 16-bit technology and stores files in an avi container. It only supports more recent codecs (xvid/divx) through hacks in the avi format (e.g. storing frames out of order). H.264 is only partially supported (e.g. no b-frame pyramid), and hence vfw support for it was dropped in x264 r581. You shouldn't use VDub for anything other than lossless (intra-frame) encoding and move on to CLI + a sensible container (mp4/mkv) for your encoding work.
2. Re:Windows? VirtualDub 1.8.x + ffdshow-tryouts by trawg · 2008-07-23 11:30 · Score: 1
  
  Holy shit! Somehow I missed all these VDub releases. Thanks for the notice.
  Out of interest, what sort of stuff are you encoding from/to? Are you aware of any mpeg4/h264 codecs that will work happily in Virtualdub?
3. Re:Windows? VirtualDub 1.8.x + ffdshow-tryouts by tdelaney · 2008-07-23 14:57 · Score: 1
  
  Primarily anime, but other stuff as well. What I tend to do is use mencoder to trancode the source material (usually h264/x264/XviD) to Huffyuv video/PCM audio (single episode goes from 150MB to 7GB!), then use VirtualDub to add hard subtitles and encode as XviD (using ffdshow-tryouts). I do the mencoder bit because if you've got a VBR audio stream, it's quite likely to end up out of sync (esp. coming from a Matroska container) if you use VirtualDubMod to extract the streams. So I don't. Can't remember the parameters I pass to mencoder off the top of my head.
  I don't tend to do this much nowadays though as I've got one of my machines running TVersity (DLNA server) and transcoding and subtitling on-the-fly to stream to my PS3 (using ffdshow-tryouts filters). I now only do it if I need to play something on an XviD-compatible DVD player, or it's only got VobSub subtitles (ffdshow-tryouts has a bug where it will only display the very first line in the subtitles).
  Try the latest ffdshow-tryouts H.264 encoder (take the latest SVN build, or beta 5 which has just come out) and see how you go.
avidemux by Unit3 · 2008-07-23 09:46 · Score: 5, Informative

I've noticed a lot of talk about commandline options, but not the nice guis that use them. Avidemux is open source, cross-platform, gives you a decent interface, and uses multithreaded libraries like ffmpeg and x264 on the backend to do the encoding, so it generally makes optimal use of your multicore system.

--
-- sudo.ca
1. Re:avidemux by packman · 2008-07-23 22:50 · Score: 1
  
  Exactly what I was thinking, works brilliantly on both my Windows and Linux boxes, and uses all cores.
Do more jobs rather than one job more quickly by myz24 · 2008-07-23 09:48 · Score: 1, Informative

As posted elsewhere, it is difficult to divide a project up that is really pretty linear. Instead, you should try to do more jobs at once. Encode four videos at once.
1. Re:Do more jobs rather than one job more quickly by TheSync · 2008-07-23 13:49 · Score: 1
  
  As posted elsewhere, it is difficult to divide a project up that is really pretty linear. Instead, you should try to do more jobs at once. Encode four videos at once.
  This is possible, but you might end up being hard drive I/O limited before you are CPU limited (depending on video resolution and codec type). But assuming you are not...
  With any long-GOP codec, you could split your movie into GOPs and encode one GOP on a core at a time. Or just split the video into four segments and encode each segment on a core.
  A more brain-dead way is to split the screen into four quadrants, compress each, and join them up. This is less efficient compression (no motion vectors over the quadrant boundaries), but it is a way to parallelize live encoding.
Re:Use Mac OS X... by Anonymous Coward · 2008-07-23 09:56 · Score: 0

And surprisingly, this is not something that we can blame on BillG. XP has supported multiple processors from the beginning. Multi processor motherboards were just too expensive for the average consumer. It wasn't until the P4 Xtreme that multi core became a reasonably priced option. For once, Microsoft was actually ahead of the curve in providing support for a technology BEFORE the market really needed it.
Also consider this. by SignOfZeta · 2008-07-23 10:25 · Score: 2, Interesting

If you do a lot of H.264 conversion, look into picking up a hardware encoder. There's the Turbo.264; it's Mac-only, but I'm fairly sure it's a rebranded PC device. Plug into a USB port, and it speeds up H.264 encoding -- even on single-core systems. Imagine that with your quad-core. It's not a free solution, but if you find yourself doing a *lot* of encodes, it may be worth your money.
1. Re:Also consider this. by Anonymous Coward · 2008-07-23 10:47 · Score: 0
  
  One question, is the wikipedia article right and this device is mac only and 800x600 is the maximum resolution?, the alternatives look way better.
2. Re:Also consider this. by SignOfZeta · 2008-07-23 11:38 · Score: 1
  
  I've never had any high-def content to throw at it, so I've never noticed that resolution limitation. The Turbo.264 is Mac-only, but I heard somewhere that it's but a rebranded version of a PC product.
Lazy much? by SleepyHappyDoc · 2008-07-23 10:49 · Score: 0, Flamebait

(And before you ask, no, I don't want to pick up the code and add SMP support myself, thanks.)
Tell you what...why don't you FedEx your videos to a developer and ask him to do it for you? I'm sure they'd be happy to help a nice, polite, motivated person like yourself.

--
Stasis is death. Embrace change.
1. Re:Lazy much? by X0563511 · 2008-07-23 11:06 · Score: 1
  
  It's refreshing to see that, rather than having us all answer questions and think about it, only to THEN find out he doesn't want to do any work.
  
  --
  For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
2. Re:Lazy much? by ydrol · 2008-07-23 11:44 · Score: 1
  
  I simply dont have the time to grok video encoding AND efficient SMP alogrithms, and do my day job, but I do want to use them.
  And FWIW I have contributed patches in the past to both the avidemux AND nzbget prejects , and they have been accepted, but these have addressed more trivial aspects of the software.
3. Re:Lazy much? by X0563511 · 2008-07-23 13:39 · Score: 1
  
  Hey, I know exactly what you mean - and what I said was directed at the parent, not to you... sorry if that came across wrong!
  
  --
  For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Not as simple as you would think by sjf · 2008-07-23 11:51 · Score: 4, Insightful

As other commenters have said, decoding video is not, per se, a trivially parallelized algorithm. Especially for modern codecs with lots of temporal encoding. MJPEG would be easily parallelized, buy you'd have to be dealing with fairly ancient sources...MediaComposer 1 for instance.
However, there are different classes of "video app" that are good targets for parallelization. Real world video editing for instance: consider multiple streams of video with overlays, rotations, effects etc. Video and audio decoding can happen in parallel, you can pipeline the effects stages so that each effect is handed off to another core. Modern video editing systems do this with aplomb.
I'm from the commercial end of this so, I can't comment much on open source alternatives. But I will say that a lot of the algorithms in certain products are highly tuned to the particular CPU type.
And they're smart enough to distribute work across only as many cores as actually exist.
Finally. Don't forget that optimization is hard. You have to consider the speed of the hard drive, the cost of sharing data between threads and cpu caches and a bunch of other real constraints. Any half decent cpu of the last five years or so can easily decode most video faster than it can be read and written to disk. So long as this is true, you won't get any benefit from parallelization.
1. Re:Not as simple as you would think by LordKronos · 2008-07-24 01:59 · Score: 1
  
  As other commenters have said, decoding video is not, per se, a trivially parallelized algorithm. Especially for modern codecs with lots of temporal encoding.
  But don't pretty much all video algorithms use keyframes, meaning that you can start from the keyframe without needing any prior data? So you just give each thread a block starting from the next keyframe and let it work. Rather than having every thread working on the current point of the video, you just have threads preparing the next block of video so that when you need it, it's already done. This would work just as well, with the exception of when you are doing playback (rather than background transcoding) and a single core is too slow to process in realtime. When you first start playback or skip to a random spot, there will be a slight delay while the single core prepares the first block.
2. Re:Not as simple as you would think by tatsu69 · 2008-07-24 02:11 · Score: 1
  
  H.264 does use keyframes but the way it works is in such a way that only a certian type of keyframe would work the way you suggest. It is perfectly possible to have a keyframe that by itself doesn't refer to anything else but you wouldn't be able to decode any frames after that one because following frames could refer to an even earlier keyframe. Each frame that can be predicted from is put into a queue and rotated out as newer frames enter the queue (with some other exceptions). So you can have, for instance, 100 frames you could choose to predict from, 10 of which could be keyframes.
  You could search the stream for a special type of keyframe that tells it that no frame following this one refers to anything prior. But the standard doesn't say you ever have to have one of those except for the first frame of the file.
3. Re:Not as simple as you would think by sjf · 2008-07-25 06:42 · Score: 1
  
  How do you find the key frames in the data?
  By reading through the file identifying them, which entails file IO which we've established is slower than decode.
  And, for reasons pointed out elsewhere, don't try this with MPEG4, H.264 or DiVX. Which pretty much leaves MPEG1 and 2.
  The point is that you'd do so much work parsing the stream in order to chunk it that you might as well just decode it.
  Particularly if you can spend the time while the file IO is blocked decoding frames.
Lazy by PenGun · 2008-07-23 11:52 · Score: 1

Any video re encode is gonna involve a bunch of steps. Run em' together on different cores. There ya go.
Once you actually learn something about this somewhat, in Linux anyway, black art you'll find you can use all your cores no problem.
heroinewarrior.com by heroine · 2008-07-23 12:09 · Score: 2, Informative

The version of Cinelerra from heroinewarrior.com uses SMP. It's highly dependant on the supporting libraries & who implemented the feature. In the worst case, use renderfarm mode & nodes for each processor. Sometimes the libraries work in SMP mode & sometimes they don't. Sometimes the feature was intended for everyone to use on any number of processors & sometimes it was written for 1 person's cheap single processor.
Hmm by moosesocks · 2008-07-23 12:35 · Score: 1

Now I'm a bit curious.
Given that all of the "usual suspects" of encoding apps support SMP on almost every platform, and have done so for quite some time, what was this guy using that didn't support it?
ffmpeg and x264 are just about the only players in town these days.

--
-- If you try to fail and succeed, which have you done? - Uli's moose
1. Re:Hmm by ydrol · 2008-07-23 13:29 · Score: 1
  
  A lot of the GUI front ends dont seem to have obvious options to pass the flag through to the engine. Devede mentioned below is one that does, and the kind of thing I was looking for.
If you are looking for dvd authoring... by edutiao · 2008-07-23 12:45 · Score: 1

Devede is a really good gui and adds a lot of functionality. And uses mencoder, which as of 1.0rc2 implements SMP quite nicely. I've been using it the last few days for family videos, on an AMD64 X2, and it is working flawlessly using both cores.
1. Re:If you are looking for dvd authoring... by ydrol · 2008-07-23 13:26 · Score: 1
  
  Excellent. Just the type of thing I was looking for. A gui frontend that sensibly passes the thread options to the engine!
Split the movie into chunks by tepples · 2008-07-23 13:47 · Score: 1

As posted elsewhere, it is difficult to divide a project up that is really pretty linear. Instead, you should try to do more jobs at once. Encode four videos at once.
Do you mean split a 100 movie into four 25 minute chunks, encode one chunk on each core, and concatenate them? Great idea.
1. Re:Split the movie into chunks by bm_luethke · 2008-07-23 20:04 · Score: 1
  
  Many of today's video codecs compress data by only storing the differences between frames. As such they do not lend themselves well to that type of splitting up. Each frame is not processed independent from the other. You can split it into those 25 minute chunks but if each thread has to wait on the last to finish then you actually lost.
  Many introduce the idea of "keyframes" - basically frames that are stored in their whole in order to "reset" whatever drift has happened because of the compression. Older codecs tend to have them at specific places - say every X frames. However newer codecs do a statistical analysis as the video is compressed and only inset them when needed (they kill performance and compression ratios on the final product). This, again, stops one from chunking up the video into parts and then compressing.
  From what I understand (not that I'm remotely in said field) for much of the professional world the above isn't so true - quality suffers somewhat from the compression and is noticeable at the theater. Therefore it is one of the so called "ridiculously parallel paradigms" - at the very least when learning parallel processing in other fields it was presented as such. That is each frame is processed identically and is totally independent of the one before it - that is more what you are thinking of.
  Most consumer level video streams are nearly 100% dependent on the frame before it being fully processed (again - starting from keyframes that usually are not be set before processing). This means that the process is almost entirely serial as each frame has to be done in succession. While it isn't that simple and there is *some* parallelism can be made even in the ones that set keyframes when they are needed (instead of at set intervals), it generally isn't really worth the effort - more often than not the splitting takes more effort (in processor power) than gained by the additional parallelism.
  If you are using an older codec then your mileage may vary, but then I find (as cheap as it is) storage to be a more expensive quantity than the encoding process. Not to mention that the extra processing power done once is *MUCH* more efficient than the extra done each time the video is watched.
  
  --
  ------- Sorry about the spelling, I suffer from two problems. Dyslexia makes it difficult to spell well, lazy makes it
2. Re:Split the movie into chunks by myz24 · 2008-07-24 01:29 · Score: 1
  
  Not at all, I literally mean four different movies. So four 100 minute movies, one for each core.
AcidRip patches by ydrol · 2008-07-23 13:48 · Score: 2, Informative

Cheers. I also found these Acidrip patches. PS In case anyone missed it, I really meant to ask about the front end GUI/script tools rather than the engines. PPS I'm actually using Mandriva.
blank frames? by Anonymous Coward · 2008-07-23 15:17 · Score: 0

especially when dealing with tv shows there is often blank frames before commercial breaks. by scanning for several blank frames and then chopping at that point, the video file can easily be divided up into chunks for parallel processing.
Not just open source by glitch23 · 2008-07-23 16:06 · Score: 1

I get the impression that open source projects are a bit slow on the uptake here?

Open source isn't alone in this regard. Many closed source applications also lag behind. Obviously there are exceptions but many apps just haven't caught up to multi-cores, whether that be just 2 (which is ancient tech by now) or 8 cores in a single system.

--
this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
Re:Use Mac OS X... by Anonymous Coward · 2008-07-23 16:44 · Score: 3, Informative

But Mac users have been living with SMP since 2001

Just for reference:
UNIX System V R4-MP 1993
Windows NT 1993
OS/2 2.11 1993
Linux 2.0 1996
Anonymous Coward by Anonymous Coward · 2008-07-23 16:56 · Score: 0

Avidemux
Additional threads, massive effectiveness? by meburke · 2008-07-23 18:16 · Score: 1

I may have missed something, but in light of the article here: http://tech.slashdot.org/article.pl?sid=08/05/31/1633214, and the wealth of information being offered in this topic, if you are willing to re-make something like ffmpeg to take advantage of the processing capability of your video card you may achieve tremendous efficiency for your task. (My test blew up from mis-managing memory, but before it did it dynamically allocated 22 or 23 threads..the results were uncertain because the system crashed before logging the current status. This is just a concept-learning test written off the cuff in Java, so a real engineered system ahould be able to do something significant. I will probably get back to it when my current workload slows down.) I'm assuming that if you're doing video work you don't have a lame video card, but the video card should be mostly idle during the conversion process.

--
"The mind works quicker than you think!"
Actually very parallelisable.... by Anonymous Coward · 2008-07-23 18:35 · Score: 0

Actually motion compensation parallelises very well and is one of the most time consuming stages in compression. Effectively the image is split into N macro-blocks, say 16x16 pixels in size. The closest match to each of these macro-blocks is searched for in the following (and/or preceding) frame withing a +/- X/Y pixel range. Each macroblock search (a typical image may have 1200 macroblocks) can be parallelised. Therefore up to 1200 cores could be used to parallelise motion-compensation. Also the Quantisation stage can be parallelised on a macro-block level, as can the DCT(mpeg1&2, H263) or wavelet(mpeg4, H264) transform. The only stage that needs to be done serially is the final Huffman encoding.
winff, devede and KDEnlive/QT4 by danboid · 2008-07-23 19:09 · Score: 1

There are two main apps FOSS video converters need to be aware of:
winff - great gui for FFMPEG, makes batch conversion simple. Soon to appear in Debian/Ubuntu repos. vlc also includes a more basic video conversion wizard gui, but can't compete with winff. win and lin versions available.
DeVeDe- Already mentioned here I see. Makes DVD and SVCD/VCD/*CD creation under linux simple. Again, another cross platform FOSS app.
Both these apps have SMP support, but only as good as their respective ffmpeg and mencoder backends.
The new QT4 version of KDEnlive is a total re-write of the app and is said to be SMP friendly but has yet to have a proper release.
OMG! by Anonymous Coward · 2008-07-23 21:30 · Score: 0

like kdawson actually posted something interesting, who let him at the coffee pot?
Stigma is never good by try_anything · 2008-07-23 23:46 · Score: 1

Multi-threaded programming is getting to be like artificial intelligence. People flip out about how hard it is, and when you point out mundane, useful, easy kinds of multi-threading, they say, "Well, that's not really what I was talking about." If multi-threading only means scalable performance-critical code, or code with lots of fine-grained locking, or code written with no language or library support, then hell yeah, it's hard. Multi-threaded programming is full of hard problems, but you can get plenty of work done without ever facing up to them.
MPlayerXP by onnerby · 2008-07-23 23:58 · Score: 1

There is a MPlayer fork called MPlayerXP. The purpose of the fork is to make a multithreaded version of MPlayer.
http://mplayerxp.sourceforge.net/
You add three keyframes. by tepples · 2008-07-24 00:31 · Score: 1

Many of today's video codecs compress data by only storing the differences between frames. As such they do not lend themselves well to that type of splitting up.
But in practice, how much space are you going to lose by inserting only three extra keyframes into a 100 minute film? Look at the three keyframes that a four-core encoder would insert, then compare that to how many cuts in a film already need a keyframe after them. If you're worried that this will insert too many extra keyframes once encoding scales up to dozens of cores, you could just have one core finding cuts and the rest encoding each interval between cuts.
VirtualDub by mariushm · 2008-07-24 00:42 · Score: 1

Virtualdub is free, open source and is quite capable of running with several processors.
As for encoding, I'm not yet interested in x264 because of the weak processor (only a D805 dual core ) but I am using the XVID experimental build from Koepi.info (http://www.koepi.info) which has SMP support.
It maxes out my two cores and you can specify in the configuration how many threads it should use.
As for decoding, I'm Media Player Classic Home Cinema which has a DXVA codec built in (hardware decoding).
On 1080p videos that the video card can decode hardware (ATI 4850), the CPU usage is about 10-12%. If it's not possible to decode hardware, the decoding is passed to CoreAVC which uses about 45% of CPU (and yes, it's smp enabled).
Re:Use Mac OS X... by Drizzt+Do'Urden · 2008-07-24 01:00 · Score: 1

He's not even right about Apple either, the first MP Mac was the PowerMac 9500, released in 1995 with the SMP option released in 1996.
Source

--
Menzoberranzan Networks
system monitor while using dvdrip/transcode by keneng · 2008-07-24 02:43 · Score: 1

On a dual-CPU system, you will see 100% CPU usage on both when using dvdrip/transcode. I would love to see how it looks on a quad-core system.
1. Re:system monitor while using dvdrip/transcode by gatkinso · 2008-07-26 10:21 · Score: 1
  
  On some Intel based systems with dual processors, enabling hyperthreading is a half assed way to emulate a 4 processor system.
  
  --
  I am very small, utmostly microscopic.
codecs by gravis777 · 2008-07-24 03:46 · Score: 1

I actually at first thought you ment open source video editing tools, at which point, i was going to say good luck.
I have had similar experiences trying to find multi-core 64 bit video encoders / converters / editors. The problem actually is not usually the application, but the codec. The codec has to be written to take advantage of multi-core systems and 64 bit extensions, not just the program using the codecs. I think XVid is one of the few codecs that actaully has this ability, but if I remember right, the 64 bit code project is not as active as the main project and is usually several versions behind. I actually finally went back to the 32 bit codec as it was less buggy.
While using the 64-bit version of Vista, the workload seems to get balanced out between the cores as well. I am not sure if that is due to how the processor works, how the OS works, or how the software works, but all the video editing tools I have used seem to balance the workload over both cores rather well. This is true when using Adobe Premiere, Canopus Procoder, XMpeg, and other encoders / decoders. So while it may not be OPTIMIZED for it, it certainly does take advantage of it. Of course, your milage may vary.
Four movies, sourced from what media? by tepples · 2008-07-24 10:55 · Score: 1

Not at all, I literally mean four different movies. So four 100 minute movies, one for each core.
How many external DVD-ROM drives would one have to buy to make this effective? I don't think a typical tower PC case for the home market can hold four DVD-ROM drives.
XBMC 4 Linux by BLKMGK · 2008-07-24 13:04 · Score: 1

SMP enabled H.264 decoder, works for me! @3GHZ I can play most anything no problem. I do tend to transcode my BD and HD-DVD rips though to save space.

--
Build it, Drive it, Improve it! Hybridz.org
Anonymous Coward by Anonymous Coward · 2008-07-24 15:41 · Score: 0

I am new to Slashdot.
Wow, These comments make me feel really stupid.