How Google's Pixel 2 'Now Playing' Song Identification Works (venturebeat.com)
An anonymous reader shares a report from VentureBeat, written by Emil Protalinski: The most interesting Google Pixel 2 and Pixel 2 XL feature, to me, is Now Playing. If you've ever used Shazam or SoundHound, you probably understand the basics: The app uses your device's microphone to capture an audio sample and creates an acoustic fingerprint to compare against a central song database. If a match is found, information such as the song title and artist are sent back to the user. Now Playing achieves this with two important differentiators. First, Now Playing detects songs automatically without you explicitly asking -- the feature works when your phone is locked and the information is displayed on the Pixel 2's lock screen (you'll eventually be able to ask Google Assistant what's currently playing, but not yet). Secondly, it's an on-device and local feature: Now Playing functions completely offline (we tested this, and indeed it works with mobile data and Wi-Fi turned off). No audio is ever sent to Google.
Sure, why not? Do you honestly think that something which amounts to a checksum takes very much space? Probably a few bytes per song.
If we figure 32 bytes per song times 50,000 songs, that's only like 1.6MB of space needed.
Yet another lump of unremovable pre-installed stuff taking precious space on your phone.
If you don't turn it on, it doesn't ever download the fingerprint database.
How in the actual fuck is this possible? They have an audio an audio signature of every song built in?
Yes. And this is not surprising; the data needed to identify songs is tiny. Essentially it's just vectors (big numerical arrays), they don't need to store the whole mp3.
More and more can be done locally on the devices. For instance, look at what is actually needed to detect English speech using CMU sphinx:
https://github.com/cmusphinx/p...
(look at the hmm model)
This used to require huge computing power and storage, but now it can work on a mobile device.
Another example: once upon a time you needed Google datacenters to do gender and age recognition on photos. Now you can download pre-trained models for that, and the result can fit on a mobile device. Or you can download the entire dataset (500k photos of celebs) and train it yourself on your own servers;
https://data.vision.ee.ethz.ch...
Or you want a model to recognize basically any kind of object in a photo?
https://github.com/tensorflow/...
(there's a model specifically designed to run on mobile devices)
i know it's disturbing but this is where things are today. Just a few years ago, this XKCD comic was true:
https://xkcd.com/1425/
Now you can actually download the code and models to do that completely offline and in a few ms.
lucm, indeed.
According to the article the local song database is updated once per week based on the changing popularity of songs on Google Play. The least popular songs are replaced rather than expanding the database in perpetuity, and if you never enable the feature the database is never downloaded.