How Google's Pixel 2 'Now Playing' Song Identification Works (venturebeat.com)
An anonymous reader shares a report from VentureBeat, written by Emil Protalinski: The most interesting Google Pixel 2 and Pixel 2 XL feature, to me, is Now Playing. If you've ever used Shazam or SoundHound, you probably understand the basics: The app uses your device's microphone to capture an audio sample and creates an acoustic fingerprint to compare against a central song database. If a match is found, information such as the song title and artist are sent back to the user. Now Playing achieves this with two important differentiators. First, Now Playing detects songs automatically without you explicitly asking -- the feature works when your phone is locked and the information is displayed on the Pixel 2's lock screen (you'll eventually be able to ask Google Assistant what's currently playing, but not yet). Secondly, it's an on-device and local feature: Now Playing functions completely offline (we tested this, and indeed it works with mobile data and Wi-Fi turned off). No audio is ever sent to Google.
Yet another lump of unremovable pre-installed stuff taking precious space on your phone.
If you don't turn it on, it doesn't ever download the fingerprint database.
How in the actual fuck is this possible? They have an audio an audio signature of every song built in?
Yes. And this is not surprising; the data needed to identify songs is tiny. Essentially it's just vectors (big numerical arrays), they don't need to store the whole mp3.
More and more can be done locally on the devices. For instance, look at what is actually needed to detect English speech using CMU sphinx:
https://github.com/cmusphinx/p...
(look at the hmm model)
This used to require huge computing power and storage, but now it can work on a mobile device.
Another example: once upon a time you needed Google datacenters to do gender and age recognition on photos. Now you can download pre-trained models for that, and the result can fit on a mobile device. Or you can download the entire dataset (500k photos of celebs) and train it yourself on your own servers;
https://data.vision.ee.ethz.ch...
Or you want a model to recognize basically any kind of object in a photo?
https://github.com/tensorflow/...
(there's a model specifically designed to run on mobile devices)
i know it's disturbing but this is where things are today. Just a few years ago, this XKCD comic was true:
https://xkcd.com/1425/
Now you can actually download the code and models to do that completely offline and in a few ms.
lucm, indeed.
Although I think you're being funny, no, this couldn't be used in that way. Noise cancelling headphones work by using destructive interference, which requires an exact opposite waveform of the sound being cancelled out. Since the analog waveform of the music would be affected by any number of factors (the quality of the speakers playing it, the equalizer settings of their audio equipment, the bitrate of their source, the echoing of the sound off various objects, multiple speakers playing the audio, which would result in multiple "copies" of the music reaching your ear just very slightly delayed from one another, etc, etc), you couldn't use a "canned" waveform (the original MP3) to cancel out the actual waveform reaching your ears.
Now, while it might be possible, using AI, to try to do a best match of the ambient sound against a canned waveform, and cancel out only the ambient sound that seems to match, it still would not work perfectly. That would result in echos and certain portions of the frequency spectrum still being heard, which would sound very strange.
Better known as 318230.
Why shit on mp3 and try to re-invent the wheel with vectors?
First, nobody is shittng on mp3. As for the reason to use tiny vectors instead of storing big mp3 files, I'm not sure why I have to explain it to you but it comes down to two things.
1) Storage
2) Availability of advanced, high quality vector processing libraries like BLAS or LAPACK
this being said, it was just my guess, for all I know maybe they are storing data in sqlite3 or in the headers of a jpeg file that shows your mom pleasuring herself with a maglite.
lucm, indeed.
jazz songs ?
why do you want to stress their app with sending random data ?
32 thousand CDs, using slim jewel cases at 5mm thickness, means you have a CD tower 160 metres tall. Given a standard height of three metres per floor, your CD stack is over 53 stories high.
#DeleteFacebook