Slashdot Mirror


The End of Video Coding? (medium.com)

An anonymous reader writes: Netflix's engineering team has an insightful post today that looks at how the industry is handling video coding; the differences in their methodologies; and the challenges new comers face. An excerpt, which sums up where we are:

"MPEG-2, VC1, H.263, H.264/AVC, H.265/HEVC, VP9, AV1 -- all of these standards were built on the block-based hybrid video coding structure. Attempts to veer away from this traditional model have been unsuccessful. In some cases (say, distributed video coding), it was because the technology was impractical for the prevalent use case. In most other cases, however, it is likely that not enough resources were invested in the new technology to allow for maturity.

"Unfortunately, new techniques are evaluated against the state-of-the-art codec, for which the coding tools have been refined from decades of investment. It is then easy to drop the new technology as "not at-par." Are we missing on better, more effective techniques by not allowing new tools to mature? How many redundant bits can we squeeze out if we simply stay on the paved path and iterate on the same set of encoding tools?"

6 of 137 comments (clear)

  1. Re:What else would one do? by Anonymous Coward · · Score: 0, Interesting

    There is no science in this article. It's just marketing.

    4K is already way beyond the ability of a average consumer to discern it's quality. But the industry needs to sell new products. Better screens, more powerful decoders, more bandwidth etc.

    So the whole industry starts a long term marketing work. All the industry members (even unknowingly) create the aura of "you need better video and codecs", and they do that multilevel, to techies like here and to normal people.

    They create a need and we will feel that need in just a few years and go buy an 8K TV even if we can't tell the difference between a 4K or 8K image even with a microscope.

    But hey my new TV is 8K, it's written here...look how the image is crispier (and when people tell me things like that I just tell them "oh yeah, sure, wao I need to buy one of those too" and just let them be happy. Who am I to destroy their dream?)

  2. Re:What else would one do? by skids · · Score: 4, Interesting

    Should they just adopt new and inferior solutions and hope for the best?

    I think the idea here is that the follow-up science/engineering to academic initiatives doesn't actually get funded/done because the unoptimized first cut of a new methodology isn't instantly better than the state of the art. It's basically arguing that the technology is undergoing path dependence, which is no big surprise as it happens all the time in lots of areas.

    That said, the AV crowd has sure made a complete and utter mess of their formats. Piles of CODECs all with various levels of support for piles of video modes all bundled into piles of meta-formats with piles of methods for syncing up audio/ancillary/multistream... my eyes glaze over pretty quickly these days when faced with navigating that maze. Having options is awesome. Leaving them perpetually scattered around on the floor like a bunch of legos... not so much.

    (Still waiting to see someone with serious genetic-algorithms chops tackle lossless CODECs... there's a ready-made problem with a cut and dry fitness function right there.)

  3. Clients aren't getting any faster by rsilvergun · · Score: 3, Interesting

    they're getting more power efficient, but not much faster. I'm not expert, but from what I could tell the revolution in video encoding came because client hardware got a _lot_ faster at decoding high def video. That led to new codecs to take advantage of the increased power. I remember in 2005 needing special software to decode a 1080p stream on my GTX 240 video card and Athlon x64. By 2013 my phone could do it with VLC.

    --
    Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
  4. The value of entropy and psychovisual perception by StandardCell · · Score: 5, Interesting

    At some point, you have to start asking why you need certain quality of experience in limited environments, and what infrastructure it takes to get there.

    The biggest ongoing cost for streaming movies today is CDN storage, in the sense of having enough bitrates and resolutions to be able to accommodate all target devices and connection speeds. As much as people would like to deliver an HD picture to a remote village in the Philippines over a mobile connection on a feature phone, it isn't feasible at the moment for two reasons: they don't need or care about that level of experience, and it isn't technically feasible. The goal of CDN storage is to ensure the edge delivers the content, and the industry has toyed with real-time edge transcoding/transrating to address some of these issues, but fundamentally we are dropping asymptotically to a point on visual quality for a given bitrate and amount of computing power that a codec can deliver at the playback device.

    In that sense, I'm shocked that Anne's post didn't mention Netflix's own VMAF, which is a composite measure of different flavors of PSNR, SSIM and some deep learning. But even here, the fundamental is that we are still using block-based codecs for operations simply because of the fundamental nature of most video, i.e. objects moving around on a background. I'm also shocked that Anne didn't discuss alternative coding methods like wavelet-based (e.g. JPEG 2000), but - again - these approaches have their own limitations and don't address interframe encoding in the same way that a block-based codec can. If there was a novel approach to coding psychovisually-equivalent video that would address computing power, bitrate and quality reasonably, I believe it would have been brought forward already.

    I think 5G deserves a big mention here that was lacking in Anne's post, because faster connections may solve many of the types of issues that affect perceived visual quality at low bitrates. Get more bandwidth, and you have a better experience. Hopefully 5G will proliferate quickly, but this will be tricky in the developing world where its inherently decentralized nature and the political environments will make its ubiquitous deployment a serious challenge.

    In the end, we're all fighting entropy, particularly when it comes to encoding video. Our ability to perceive video is affected by an imperfect system - the human eye and brain. That's why we've made such gains in digital video since the MPEG-1 days. But the fantasies of ubiquitous HD video to everyone in the world on 100kbps connections are just that. When you're struggling to get by and don't have good health care or clean drinking water, the value of streaming high-quality video isn't there from a business perspective, much less a technical perspective. Everyone will get an experience relative to the capabilities of technology and the value it brings to them accordingly. All else is idealistic pipe dreams until otherwise proven.

  5. Re:What else would one do? by AmiMoJo · · Score: 4, Interesting

    Part of the problem is that we have hardware acceleration for certain operations, and if codecs want to do stuff outside that then performance can become an issue for playback. Most streaming devices don't have enough CPU power in their little ARMs to handle decoding, it has to be hardware accelerated by the GPU.

    Then again if anyone can argue successfully for hardware changes it's Netflix.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  6. Re:What else would one do? by slew · · Score: 4, Interesting

    Well if you read the article and not the summary, the authors are discussing that there doesn't seem to be any fundamental changes coming anytime soon. Sure newer codecs are coming out but they are all the same approach. It's like if we discussing public key cryptography and the algorithms used. Imagine if RSA was the only real technique and the only new changes coming out were merely larger keys and that other techniques like elliptic curves didn't exist.

    I think your analogy is somewhat flawed. Public key cryptography was in somewhat of the same "rut" as video codec. Video codecs have been stuck on hybrid block techniques and Public key cryptography has been stuck using modulo arithmetic (RSA, and Elliptical curves both use modulo arithmetic although they depend on the difficulty of inverting different mathematical operations in modulo arithmetic).

    There are of course other hard math problem that can be used in public key cryptography (lattices, knapsack, error-correcting codes, hash based) and they languished for years until the threat of quantum computing cracking the incumbent technology...

    Similarly, I predict hybrid block techniques will likely dominate video encoding until a disruption (or in mathematical catastrophe theory parlance a bifurcation) shows the potential for being 10x better (because 1.2x or 20% better doesn't even pay for your lunch). It doesn't have to be 10x better out of the gate, but if it can't eventually be 10x better, why spend time optimizing it as much as hybrid block encoding. Nobody wants to be developing something that doesn't have legs for a decade or more. The point isn't to find something different for the sake of difference, it's to find something that has legs (even if it isn't better today).

    The problem with finding something with "legs" in video encoding, is that we do not fully understand video. People don't really have much of a theoretical framework to measure one lossy video compression scheme against another (except for "golden-eyes" which depend on what side of the bed you wake up on). Crappy measures like PSNR and SSIM to estimate the loss-ratio vs entropy are still being used because we don't have anything better. One of the reasons people stick to hybrid block coding is that the artifacts are somewhat known even if they can't be measured so it is somewhat easier to make sure you are making forward progress. If the artifacts are totally different (as they would likely be for a different lossy coding scheme), it is much more difficult to compare if you can't objectively measure it to optimize it (the conjoint analysis problem).

    So until we have better theories about what makes a better video codec, people are using "art" to simulate science in this area, and as with most art, it's mostly subjective and it will be difficult to convince anyone of a 10x potential if it is only 80% today. If people *really* want to find something better, we need to start researching more on the measurement problem and less about the artistic aspects. It's not that people haven't tried (e.g., VQEG, but simply very little has come from the efforts to date and there has been little pressure to keep the ball moving forward.

    In contrast, the math of hard problems for public key cryptography is a very productive area of research and the post-quantum-encryption goal has been driving people pretty hard.

    Generally speaking, if you measure it, it can be improved and it's easier to measure incremental progress than big changes on a different dimension.