Slashdot Mirror


The End of Video Coding? (medium.com)

An anonymous reader writes: Netflix's engineering team has an insightful post today that looks at how the industry is handling video coding; the differences in their methodologies; and the challenges new comers face. An excerpt, which sums up where we are:

"MPEG-2, VC1, H.263, H.264/AVC, H.265/HEVC, VP9, AV1 -- all of these standards were built on the block-based hybrid video coding structure. Attempts to veer away from this traditional model have been unsuccessful. In some cases (say, distributed video coding), it was because the technology was impractical for the prevalent use case. In most other cases, however, it is likely that not enough resources were invested in the new technology to allow for maturity.

"Unfortunately, new techniques are evaluated against the state-of-the-art codec, for which the coding tools have been refined from decades of investment. It is then easy to drop the new technology as "not at-par." Are we missing on better, more effective techniques by not allowing new tools to mature? How many redundant bits can we squeeze out if we simply stay on the paved path and iterate on the same set of encoding tools?"

26 of 137 comments (clear)

  1. What else would one do? by MBGMorden · · Score: 5, Insightful

    Should they just adopt new and inferior solutions and hope for the best?

    To me this is the "science" part of Computer Science. Do research into new algorithms and methods of video encoding, but it would be stupid to start adopting any of that into actual products or live usage until and unless it tops the more traditional methods in performance.

    --
    "People who think they know everything are very annoying to those of us who do."-Mark Twain
    1. Re:What else would one do? by skids · · Score: 4, Interesting

      Should they just adopt new and inferior solutions and hope for the best?

      I think the idea here is that the follow-up science/engineering to academic initiatives doesn't actually get funded/done because the unoptimized first cut of a new methodology isn't instantly better than the state of the art. It's basically arguing that the technology is undergoing path dependence, which is no big surprise as it happens all the time in lots of areas.

      That said, the AV crowd has sure made a complete and utter mess of their formats. Piles of CODECs all with various levels of support for piles of video modes all bundled into piles of meta-formats with piles of methods for syncing up audio/ancillary/multistream... my eyes glaze over pretty quickly these days when faced with navigating that maze. Having options is awesome. Leaving them perpetually scattered around on the floor like a bunch of legos... not so much.

      (Still waiting to see someone with serious genetic-algorithms chops tackle lossless CODECs... there's a ready-made problem with a cut and dry fitness function right there.)

    2. Re:What else would one do? by war4peace · · Score: 3, Informative

      In many parts of the world, that's already standard. I got gigabit fiber to my home for cheap in a 3rd world country.
      So as far as me and my countrymen are concerned, we're good, even for 4K@60fps.

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    3. Re:What else would one do? by AmiMoJo · · Score: 4, Interesting

      Part of the problem is that we have hardware acceleration for certain operations, and if codecs want to do stuff outside that then performance can become an issue for playback. Most streaming devices don't have enough CPU power in their little ARMs to handle decoding, it has to be hardware accelerated by the GPU.

      Then again if anyone can argue successfully for hardware changes it's Netflix.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    4. Re:What else would one do? by swb · · Score: 2

      The dependence on hardware decoding is probably a major factor. Encoders want the largest possible audience and will always defer to the coding schemes with the least system impact and best performance, which will end up being hardware decoders.

      There's a lag between the development of a new coding scheme and its widespread availability as actual deployed silicon. The investment in silicon in dependent on encoder adoption and popularity, which may lag encoding development.

      Suddenly it looks like a no-win situation where new codecs have a hard time gaining entry when there are many good-enough codecs with hardware support.

    5. Re:What else would one do? by slew · · Score: 4, Interesting

      Well if you read the article and not the summary, the authors are discussing that there doesn't seem to be any fundamental changes coming anytime soon. Sure newer codecs are coming out but they are all the same approach. It's like if we discussing public key cryptography and the algorithms used. Imagine if RSA was the only real technique and the only new changes coming out were merely larger keys and that other techniques like elliptic curves didn't exist.

      I think your analogy is somewhat flawed. Public key cryptography was in somewhat of the same "rut" as video codec. Video codecs have been stuck on hybrid block techniques and Public key cryptography has been stuck using modulo arithmetic (RSA, and Elliptical curves both use modulo arithmetic although they depend on the difficulty of inverting different mathematical operations in modulo arithmetic).

      There are of course other hard math problem that can be used in public key cryptography (lattices, knapsack, error-correcting codes, hash based) and they languished for years until the threat of quantum computing cracking the incumbent technology...

      Similarly, I predict hybrid block techniques will likely dominate video encoding until a disruption (or in mathematical catastrophe theory parlance a bifurcation) shows the potential for being 10x better (because 1.2x or 20% better doesn't even pay for your lunch). It doesn't have to be 10x better out of the gate, but if it can't eventually be 10x better, why spend time optimizing it as much as hybrid block encoding. Nobody wants to be developing something that doesn't have legs for a decade or more. The point isn't to find something different for the sake of difference, it's to find something that has legs (even if it isn't better today).

      The problem with finding something with "legs" in video encoding, is that we do not fully understand video. People don't really have much of a theoretical framework to measure one lossy video compression scheme against another (except for "golden-eyes" which depend on what side of the bed you wake up on). Crappy measures like PSNR and SSIM to estimate the loss-ratio vs entropy are still being used because we don't have anything better. One of the reasons people stick to hybrid block coding is that the artifacts are somewhat known even if they can't be measured so it is somewhat easier to make sure you are making forward progress. If the artifacts are totally different (as they would likely be for a different lossy coding scheme), it is much more difficult to compare if you can't objectively measure it to optimize it (the conjoint analysis problem).

      So until we have better theories about what makes a better video codec, people are using "art" to simulate science in this area, and as with most art, it's mostly subjective and it will be difficult to convince anyone of a 10x potential if it is only 80% today. If people *really* want to find something better, we need to start researching more on the measurement problem and less about the artistic aspects. It's not that people haven't tried (e.g., VQEG, but simply very little has come from the efforts to date and there has been little pressure to keep the ball moving forward.

      In contrast, the math of hard problems for public key cryptography is a very productive area of research and the post-quantum-encryption goal has been driving people pretty hard.

      Generally speaking, if you measure it, it can be improved and it's easier to measure incremental progress than big changes on a different dimension.

    6. Re:What else would one do? by Sarten-X · · Score: 2

      To be fair, third-world countries have a slight advantage in that their infrastructure is all new and mostly modern, whereas the US is trying to piggyback on a lot of old POTS and first-gen fiber infrastructure.

      In a lot of cases, developing countries are completely skipping copper infrastructure, and building out wireless systems.

      --
      You do not have a moral or legal right to do absolutely anything you want.
    7. Re:What else would one do? by sexconker · · Score: 4, Funny

      it's means it is.

      It's been nice proving you wrong.

    8. Re:What else would one do? by war4peace · · Score: 2

      Exactly what happened in my country. But the more important aspect is competition. Here, competition is really tough, with at least 4-5 different (large) ISPs fighting over customers.

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
  2. Internal combustion engine by technophebe · · Score: 2

    Video codecs are not the only example of this, there are many.

  3. Article is much more interesting than summary by SuperKendall · · Score: 4, Insightful

    This is one case where the actual article is well worth reading, with a ton of links off to other areas to explore, and more interesting detail than the summary presents... well worth taking a look if you are at all interested in video compression and where the state of the art is going.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
    1. Re:Article is much more interesting than summary by Headw1nd · · Score: 2

      You are missing the point. The point is that the metric being used was chosen for simplicity instead of accuracy, and because the alternatives were expensive and time-consuming. Over time everything optimized around that metric, to the point where it's prohibitively difficult to make any gains. Now that we have the ability to create better metrics we should, because otherwise we risk overlooking actual performance gains because they aren't significant on the old metric.

    2. Re:Article is much more interesting than summary by davide+marney · · Score: 2

      Summary missed the big "aha" moment of the article, which was that academic researchers in new encoding techniques had been thinking that increasing the complexity of their algorithms by 3X was a hard limit, whereas production practitioners such as Netflix, Facebook, and Hulu were thinking that a 100x increase in complexity was the upper limit.

      --
      "We receive as friendly that which agrees with, we resist with dislike that which opposes us" - Faraday
    3. Re:Article is much more interesting than summary by alvinrod · · Score: 4, Insightful

      If humans truly are incapable of discerning the difference in a controlled study, doesn't that suggest that the test is flawed because it is being too strict for some arbitrary reason?

      To better illustrate what I mean,say I want to buy hosting for a service and want 99% uptime. However, the person considering providers throws out any without guarantees of 99.999% uptime. They're not actually doing what I want and I may end up paying more than I would otherwise need to for no good reason. Or suppose I have a machine that judges produce and will remove anything that it thinks shoppers won't purchase (as a result of appearance, bruising, etc.) so that I don't waste resources shipping it to a store that will eventually have to throw it out as unsold. I want that machine to be as exact as possible because if it's being more picky than the shoppers, that's wasted produce I could otherwise be selling.

  4. Huh? by Anonymous Coward · · Score: 3, Insightful

    Unfortunately, new techniques are evaluated against the state-of-the-art codec, for which the coding tools have been refined from decades of investment. It is then easy to drop the new technology as "not at-par." Are we missing on better, more effective techniques by not allowing new tools to mature?

    What a stupid statement.

    Is the expectation we adopt crappy replacements to "allow them to mature?"

    They can mature until they're as good as what we have, not replace it with something which doesn't work to give it room to grow into something which doesn't suck.

    Either you have a working replacement, or you have a good idea and a demo.

    "Not-at-par" means the latter -- you don't have a mature product, and nobody is going to adopt it if it can't do what they can do now. Saying "ti will eventually be awesome" tells me that eventually we'll give a damn, but certainly not now.

    It's bad enough I have to fight my vendors that I'm not accepting a beta-rewrite and suffering through their growing pains to get to the mature product they're trying to replace. I'm not your fucking beta tester, so please don't suggest I grab your steaming turd and live with it until you make it not suck.

    Boo hoo, immature technologies which don't cover what the technology they're trying to replace aren't being allowed to blossom into something useful. Make it useful, and then come to us.

  5. Misses the real problem. by Anonymous Coward · · Score: 2, Insightful

    Let's say for argumentation that a new and much more efficient video codec was just invented.

    The trouble is that it will immediately be locked up behind patents, free implementations will be sued, and it'll be packed with DRM and require per-play online-permission.

    Our main problem isn't technology, it's the legal clusterfuck that has glommed onto the technology landscape.

  6. I'd say we're moving at a pretty healthy pace. by PhrostyMcByte · · Score: 4, Insightful

    H.264 was king. Now we've got H.265 and AV1 which have not entirely replaced H.264 due to compatibility purposes, but have still gained significant traction.

    On the audio side, AAC replaced MP3, and Opus is set to replace AAC. Opus can generally reach the same quality as MP3 in less than half the bits!

    So I don't see this stagnation they talk about. These algorithms are generally straightforward and codec devs, even if they don't have a hyper-efficient implementation yet, will be able to see the benefit -- it's just a matter of investing in their time to develop high quality code and hardware for it.

  7. Missing a word: Research by UnknowingFool · · Score: 2

    Seriously the title and summary would have been much better and easier to understand if they used a single word "Research": "The End of Video Coding Research". The article discusses that while video coding use is pretty much everywhere, there hasn't been much progress or change made into newer standards despite lots of interest and investment. New codecs are coming out but there are all variations of the "block-based hybrid video coding structure" of MPEG-2/H.264/VP9, etc. Netflix is one company that would benefit from newer encoding standards.

    --
    Well, there's spam egg sausage and spam, that's not got much spam in it.
  8. Clients aren't getting any faster by rsilvergun · · Score: 3, Interesting

    they're getting more power efficient, but not much faster. I'm not expert, but from what I could tell the revolution in video encoding came because client hardware got a _lot_ faster at decoding high def video. That led to new codecs to take advantage of the increased power. I remember in 2005 needing special software to decode a 1080p stream on my GTX 240 video card and Athlon x64. By 2013 my phone could do it with VLC.

    --
    Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
    1. Re:Clients aren't getting any faster by pz · · Score: 4, Informative

      The revolution came from stable, standardized algorithms that allowed custom hardware to be built. Doing video decoding on general-purpose CPUs is never going to hold a candle to a custom H.264 chip.

      https://www.youtube.com/watch?...

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  9. They actually did. by DrYak · · Score: 5, Informative

    The "hired very large codec dev team" they were contributing to is called "AOMedia - Alliance for Open Medi", and one of the potential rabbit hole that got considered and worked on was Daala by Xiph (tons of new crazy idea, including stuff like extending block as lapped blocks, a perceptual vector quantisation that doesn't rely on residual coding, etc.)

    At the end of the day, the first thing that currently came out of AOMedia, by combining work such as Xph's Daala, Google's VP10 and Cisco's Thor, is AV-1.
    It's much tamer that what it could have been, but still incorporate some interesting idea.
    (they didn't go all the way to using the ANS entropy coders suggested more recently by experiment such as Daala, but at least replaced the usual arithmetic encoder with Daala's range encoder).

    By the time AV-2 gets out, we should see some more interesting stuff.

    Probably this speech was meant as a rousing speech to encourage developers to go crazy and try new stuff.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  10. The value of entropy and psychovisual perception by StandardCell · · Score: 5, Interesting

    At some point, you have to start asking why you need certain quality of experience in limited environments, and what infrastructure it takes to get there.

    The biggest ongoing cost for streaming movies today is CDN storage, in the sense of having enough bitrates and resolutions to be able to accommodate all target devices and connection speeds. As much as people would like to deliver an HD picture to a remote village in the Philippines over a mobile connection on a feature phone, it isn't feasible at the moment for two reasons: they don't need or care about that level of experience, and it isn't technically feasible. The goal of CDN storage is to ensure the edge delivers the content, and the industry has toyed with real-time edge transcoding/transrating to address some of these issues, but fundamentally we are dropping asymptotically to a point on visual quality for a given bitrate and amount of computing power that a codec can deliver at the playback device.

    In that sense, I'm shocked that Anne's post didn't mention Netflix's own VMAF, which is a composite measure of different flavors of PSNR, SSIM and some deep learning. But even here, the fundamental is that we are still using block-based codecs for operations simply because of the fundamental nature of most video, i.e. objects moving around on a background. I'm also shocked that Anne didn't discuss alternative coding methods like wavelet-based (e.g. JPEG 2000), but - again - these approaches have their own limitations and don't address interframe encoding in the same way that a block-based codec can. If there was a novel approach to coding psychovisually-equivalent video that would address computing power, bitrate and quality reasonably, I believe it would have been brought forward already.

    I think 5G deserves a big mention here that was lacking in Anne's post, because faster connections may solve many of the types of issues that affect perceived visual quality at low bitrates. Get more bandwidth, and you have a better experience. Hopefully 5G will proliferate quickly, but this will be tricky in the developing world where its inherently decentralized nature and the political environments will make its ubiquitous deployment a serious challenge.

    In the end, we're all fighting entropy, particularly when it comes to encoding video. Our ability to perceive video is affected by an imperfect system - the human eye and brain. That's why we've made such gains in digital video since the MPEG-1 days. But the fantasies of ubiquitous HD video to everyone in the world on 100kbps connections are just that. When you're struggling to get by and don't have good health care or clean drinking water, the value of streaming high-quality video isn't there from a business perspective, much less a technical perspective. Everyone will get an experience relative to the capabilities of technology and the value it brings to them accordingly. All else is idealistic pipe dreams until otherwise proven.

  11. New vs old by DrYak · · Score: 5, Insightful

    but it would be stupid to start adopting any of that into actual products or live usage until and unless it tops the more traditional methods in performance.

    The logic behind the article is that the new techniques will never top more traditional (or at least could not have a way to achieved in the current state of affair), because most of the resources (dev time, budget, etc.) are spent optimizing the "status-quo" codecs, and not enough is spent on the new comer.
    By the time something interesting comes up, the latest descendant of the "status-quo" would have been much more optimized.
    It doesn't matter that the PhD thesis "Using Fractal Wavelets in non-Euclidian spaces to compress video" shows some promising advantages over MPEG-5 : it will not get funded, because by then "MPEG-6 is out" and is even better just by minor tweaking every where.
    Thus new idea like a PhD thesis never get funded and explored further, and only further tweaking of what already exist gets funded.

    I personally don't agree.

    The most blatant argument is the list it self.
    With the exception of AV-1, the list is exclusively only the actual list of block based algorithm : MPEG-1 and it's evolutions (up to HEVC) and things that attempts to do something similar while avoiding the patents (the VPx serie by On2, Google).

    It completely ignores stuff like Dirac and Schroedinger :
    completely different approach to video compression (based on wavelets) that got funded, developed and are actually in production (by no less than the BBC).

    It completely ignores the background behind AV-1 and how it relates to Daala.

    AV-1 was designed from the ground up not as an incremental evolution (or patent circumvention) over HEVC, it was designed to go along a different direction (if nothing else, at least for the reason to avoid the patented techniques of MPEG, as avoiding patent madness was the main target behind AV-1 to begin with).
    It was done by AOMedia, where lots of group poured resources (including Netflix themselves).

    Yes, on one side of the AV-1 saga, you have entities like Google that donates their work on VP10 to serve as a basis - so were's again at the "I can't believe it's not MPEG(tm)!" clones.

    But among other code and techniques contributions (beside Cisco's Thor which I'm not considering for the purpose of my post), there's also Xiph who provided their work on Daala.
    There's some crazy stuff that Xiph has been doing there : stuff like replacing the usual "block"-based compression with slightly different "lapped blocks", more radical stuff like throwing away the whole idea of "coding residuals after prediction" and replacing it with what "Perceptual Vector Quantization", etc.
    Some of these weren't kept for the AV-1, but other crazies actually made it into the final product (the classic binary arithmetic coding used by the MPEG family was thrown away for integer range-encoding, though they didn't go as far as use the proposed alternative ANS - Asymmetrical Number System)

    Overall, incrementally improving on MPEG (MPEG 1 -> MPEG 2 -> MPEG 4 ASP -> MPEG 4 AVC/H264 -> MPEG 4 HEVC/H265) get hit hard by the law of diminishing returns. There's only so far that you can reach be incremental improvement.

    Time to get some new approaches.

    Even if AOMedia's AV-1 isn't that much revolutionnary, that's more out of practical considerations (we need a patent-free codec available as fast as possible, including available quickly in hardware, better end up selecting thing that are known to work well) than for not having tried new stuff.
    And even if some of the more out of the box experiment didn't end up in AV-1, they might end up in some future AV-2 (Xiph is keeping experimenting with Daala).

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:New vs old by ImprovOmega · · Score: 2

      It doesn't matter that the PhD thesis "Using Fractal Wavelets in non-Euclidian spaces to compress video" shows some promising advantages over MPEG-5 : it will not get funded, because by then "MPEG-6 is out" and is even better just by minor tweaking every where.

      I did A/V compression development work once upon a time, and I can tell you that almost 20 years ago we were already looking into 3d wavelet functions for video decoding. The problem comes in that it's vastly less computationally and memory efficient than the standard iFrame/bFrame block decoders, and it messes up WAY worse if there's the slightly disruption in the stream.

      I mean, sure, if an iFrame gets hosed you lose part of a second of the video, but you can at least still kind of see what it is with a bunch of displaced blocks and people shrug and move on (fun fact: you can often catching MPEG decoding artifacts in Cable TV streams if you know what to look for). But if you do something like 3d wavelet compression and your top level of the hierarchy gets boned you are going to get some highly unpredictable artifacts that would probably make H.P. Lovecraft think one of his novels came to life, forced him to drop a tab of acid, and then poured nightmare juice directly into his optic nerve.

      And, as has been mentioned before, no real hardware support for embedded devices, like your cable box or smart TV.

  12. Re:Dont build a better mousetrap ??? by GonzoPhysicist · · Score: 2

    so just get a cat?

    --
    horror vacui
  13. Re:What’s the weissman score? by slew · · Score: 2

    Video compression is typically lossy. The Weissman score only applies to loseless.

    The Weissman score is fiction (a product of HBO screenwriter request to a professor make up something "tech-sounding").

    It isn't even an absolute score (because it depends on the unit of time you use to measure it and that isn't defined so if you happen to use something that results in the value 1, your score is infinite). Using a Weissman score in real life is like how fanbois convolute a Hollywood-ism like... "made the Kessel Run in less than twelve parsecs" into something that isn't totally gibberish (when it actually is gibberish)...

    Even if it were actually somehow useful to measure compression, it doesn't measure anything useful for video which is generally compressed off-line...