Will Compression Be Machine Learning's Killer App? (petewarden.com)
Pete Warden, an engineer and CTO of Jetpac, writes: When I talk to people about machine learning on phones and devices I often get asked "What's the killer application?". I have a lot of different answers, everything from voice interfaces to entirely new ways of using sensor data, but the one I'm most excited about in the near-team is compression. Despite being fairly well-known in the research community, this seems to surprise a lot of people, so I wanted to share some of my personal thoughts on why I see compression as so promising.
I was reminded of this whole area when I came across an OSDI paper on "Neural Adaptive Content-aware Internet Video Delivery". The summary is that by using neural networks they're able to improve a quality-of-experience metric by 43% if they keep the bandwidth the same, or alternatively reduce the bandwidth by 17% while preserving the perceived quality. There have also been other papers in a similar vein, such as this one on generative compression [PDF], or adaptive image compression. They all show impressive results, so why don't we hear more about compression as a machine learning application?
All of these approaches require comparatively large neural networks, and the amount of arithmetic needed scales with the number of pixels. This means large images or video with high frames-per-second can require more computing power than current phones and similar devices have available. Most CPUs can only practically handle tens of billions of arithmetic operations per second, and running ML compression on HD video could easily require ten times that. The good news is that there are hardware solutions, like the Edge TPU amongst others, that offer the promise of much more compute being available in the future. I'm hopeful that we'll be able to apply these resources to all sorts of compression problems, from video and image, to audio, and even more imaginative approaches.
I was reminded of this whole area when I came across an OSDI paper on "Neural Adaptive Content-aware Internet Video Delivery". The summary is that by using neural networks they're able to improve a quality-of-experience metric by 43% if they keep the bandwidth the same, or alternatively reduce the bandwidth by 17% while preserving the perceived quality. There have also been other papers in a similar vein, such as this one on generative compression [PDF], or adaptive image compression. They all show impressive results, so why don't we hear more about compression as a machine learning application?
All of these approaches require comparatively large neural networks, and the amount of arithmetic needed scales with the number of pixels. This means large images or video with high frames-per-second can require more computing power than current phones and similar devices have available. Most CPUs can only practically handle tens of billions of arithmetic operations per second, and running ML compression on HD video could easily require ten times that. The good news is that there are hardware solutions, like the Edge TPU amongst others, that offer the promise of much more compute being available in the future. I'm hopeful that we'll be able to apply these resources to all sorts of compression problems, from video and image, to audio, and even more imaginative approaches.
I thought the Pied Piper platform already uses ML to improve compression, right?
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
Perhaps you should get a patent for this vague description of a mathematical expression. We have learned with the right Supreme Court justices, you can circumvent established case law.
It should be obvious that neural nets are great at this sort of thing. That's how our brains record and recall events. They're not registering a stream of pixels or waveforms and zipping them up, they're registering chained concepts. Every time we remember, we piece these concepts back together, so yes, there is a lot of "imagination" filling in data in even our most detailed memories. But just like that can work for our brains, it can work for computers.
"What is the difference between a Ponzi Scheme and an Investment Bank?" -- Jon Stewart
Preserving perceived quality by which metrics? Comcast recently moved to downgrading the quality of the HD cable programming it provides. Some people see a significant problem with the downgraded video, especially during action video such as sport events. Yet Comcast says that ~the perceived quality is the same.~ So I ask again, how is "perceived quality" going to be measured? And by whom? By those who want to push out the new technology for monetary gain, or by those who are subject to the inferior results of the new technology?
I think you are more than a little confused if you think the Supreme Court has anything to do with patents...
This is what partisanship does to your brains kids. Don't be partisan, learn how systems actually work, and offer thoughtful critiques instead of running around in a blind panic saying things that are outright wrong everywhere you go.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
It should be obvious that neural nets are great at this sort of thing. That's how our brains record and recall events. They're not registering a stream of pixels or waveforms and zipping them up, they're registering chained concepts.
Video codec A: Han shot first!
Video codec B: Really? That's not how I remember it.
I don't care if it's 90,000 hectares. That lake was not my doing.
1. Caching.
Remember, all that matters is the bandwidth of the most constrained point you traverse. And that's typically going to be near the servers, as point-to-point connections are highly inefficient.
If you place frequently-accessed content downstream, much nearer the recipient, you eliminate the need to use any of the pipes along the constricted regions.
Experiments I did on this in the 1990s, when the problem was at its worst, showed that you could get a 60-fold improvement in quality of experience, well beyond anything compression can achieve.
2. Multicast
Most people are used to a few seconds delay before streaming starts. If N people request the same content over a 5 second period, then delaying the first person by 5 seconds won't be perceived as abnormal. You're now transmitting one copy per path. This is an ideal way to populate the aforementioned caches, it would be useful for server-based content only if no caches exist.
3. Fractal compression
Technically, wavelet. Used in the BBC's Schrodinger codec. Produces far better compression than typical codecs.
4. Better pipes
Most of the rest of the world is already operating at bandwidths between 100-10,000x that common in the U.S., with U.S. cable companies have either prosecuted those offering higher speeds or driven heavy equipment through their cables. It is time to stop accepting this as the cost of doing business.
The U.S. should mandate 50 gbps to the home, the highest speed available elsewhere on a large scale. If you're going to be the best, you have to be the best. The U.S. should not be an also-ran. A fat tree is impossible, at those speeds. Realistically, I doubt you could get a block to handle more than 200 gbps. Which is adequate.
Tier 1 is fine, just have a mesh network. Current SDM transmission rates are 111 tbps, which means you can support 555 blocks of houses off a single seven core cable. GMING can supply the info on how you'd actually get that to work on a metro level.
You're really not going to need to do a whole lot of compression at those speeds.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
To rival humans at voice recognition, these assistants would need to do at least two things that actual humans already do:
1: Constantly listen in on your life (and possibly watch it with a camera), so that it can maintain a real-time context to interpret any ambiguous verbal information. Waking up the assistant only after it hears its name is not sufficient because without context you need to be extremely clear to establish what you're talking about. I've noticed that even talking to people, when you switch the topic to a totally new subject, it often requires a "handshake" where you tell them what you're going to talk about and then they acknowledge that they're on the same page with you.
2: Have the assistant interrupt the human immediately in mid-sentence when it doesn't understand something, with something like "Huh?" or "What?". Real people do this so often that we don't even notice it, but right now it would seem incredibly rude if a bot did that.
Most CPUs can only practically handle tens of billions of arithmetic operations per second,
Got to wonder how a computer scientist from 30 years ago would react to this casual statement. I'm going with something along the lines of, "Great Scott!"
You are in a twisty maze of processor lines, all alike.
There is a lot of hype here.