Netflix Uses AI in Its New Codec To Compress Video Scene By Scene (qz.com)

← Back to Stories (view on slashdot.org)

Netflix Uses AI in Its New Codec To Compress Video Scene By Scene (qz.com)

Posted by msmash on Thursday March 2, 2017 @02:01AM from the breakthrough dept.

An anonymous reader shares a Quartz report: Annoying pauses in your streaming movies are going to become less common, thanks to a new trick Netflix is rolling out. It's using artificial intelligence techniques to analyze each shot in a video and compress it without affecting the image quality, thus reducing the amount of data it uses. The new encoding method is aimed at the growing contingent of viewers in emerging economies who watch video on phones and tablets. "We're allergic to rebuffering," said Todd Yellin, a vice president of innovation at Netflix. "No one wants to be interrupted in the middle of Bojack Horseman or Stranger Things." Yellin hopes the new system, called Dynamic Optimizer, will keep those Netflix binges free of interruption when it's introduced sometime in the next "couple of months." He was demonstrating the system's results at "Netflix House," a mansion in the hills overlooking Barcelona that the company has outfitted for the Mobile World Congress trade show. In one case, the image quality from a 555 kilobits per second (kbps) stream looked identical to one on a data link with half the bandwidth.

3 of 67 comments (clear)

Min score:

Reason:

Sort:

Re:AI? by TheRaven64 · 2017-03-02 02:30 · Score: 2, Informative

Hi, welcome to 2017. AI is now defined by the media to mean 'thing using algorithms'. In related news, algorithm is now defined to mean 'scary thing the reader probably doesn't understand'.

--
I am TheRaven on Soylent News
For all the people saying this isn't AI by Solandri · 2017-03-02 02:53 · Score: 5, Informative

Netflix does use AI in developing the video compression algorithm. The problem with encoding videos with lossy algorithms is that video quality is a subjective thing. You need a person to watch it and tell you how good the video quality looks. This makes it rather slow and difficult to do A/B testing, not to mention how boring it is watching the same clips over and over with different encoding.

Netflix got around the problem by using machine learning to teach a computer when video quality looked good. They had a bunch of people watch videos with different compression and rate the quality, then told the AI that their ratings were gospel. It then analyzed the different videos and decided for itself which features were associated with good quality. Once the computer was generating the video ratings as people, they had a rapid way to do A/B testing. That allowed them to optimize their compression algorithm in much less time than with using humans to rate video quality.

I'm not sure why Summary links to some popular news article which talks in general about Netflix using AI, instead of linking to the actual Netflix page describing exactly what they did. This used to be the sort of technical detail you'd expect from slashdot submissions.
Re: What exactly is Netflix doing? by Anonymous Coward · 2017-03-02 04:22 · Score: 4, Informative

Could be, but when AAC came around for audio compression, the interesting concept was that a pschyoacoustic model was applied to identify which of (at the time 7) multiple compression model paths would provide an the most optimal perceived result of the compression for the given audio samples. Better encoders would choose better paths for encoding. So, the best encoders would identify which compression method to use for each audio segment and how long said audio segment should be.
H.264 and H.265 offer a massive amount of opportunities to tweak encoding of macro blocks. There are spatial considerations (block size), temporal considerations, motion consideration, etc... each individual block can be a differ type (I, P, B, etc...). Each block can be mapped compressed relative to another block considering time and space. Each block can select a different set of coefficients for frequency identification (DCT for example) as well as gradients (quantization). Each block can be stored for optimal management of loss related to congestion (NAL).
To be fair, if I covered every possible case in H.265, I can be here a long time.
I used to write encoders for these standards and I would often target optimal allocation of bitrate relative to PSnR and SSIM. These are great metrics for attempting to model optimal quality following decompression. Unfortunately, I was too early to also optimize bitrate allocation relative to improved perceived quality relative to specific areas of interest which is something we can do today by applying computer vision modules that can simulate what is likely to be most interesting to humans and draw their attention. For example, consider that in "Back to the Future" when watching the scene where Doc types the date into the car computer, a computer can now identify that a human would most likely be drawn to watch the LED digits most closely. So, allocating a greater bitrate there would be better than to Doc's fingers and head movement.
Modern machine learning methods such as those used by Google to recognize a red dress in a photo and catalog it appropriately can easily be used for this type of encoding process and as Audun Mattias Ãygard has been publishing on his blog, these algorithms are public and well known today.
I experimented with this technology for identifying optimal image compression some time back. The tech wasn't ready yet
Bitrate allocation relative to areas of interest could easily allow for 50% bitrate savings. As the tech improves, I could see a great deal more especially in the area of quantization.
Now, if we used more AI for CABAC and CAVLC to precondition the dictionaries, we may have a great model for H.266