Analyzing YouTube's Audio Fingerprinter

← Back to Stories (view on slashdot.org)

Analyzing YouTube's Audio Fingerprinter

Posted by timothy on Wednesday April 22, 2009 @06:12AM from the streeeeeehtch-thiiiiingsss-ouuuuuut dept.

Al Benedetto writes "I stumbled across this article which analyzes the YouTube audio content identification system in-depth. Apparently, since YouTube's system has no transparency, the behaviors had to be determined based on dozens of trial-and-error video uploads. The author tries things like speed/pitch adjustment, the addition of background noise, as well as other audio tweaks to determine exactly what you'd need to adjust before the fingerprinter started mis-identifying material. From the article: 'When I muted the beginning of the song up until 0:30 (leaving the rest to play) the fingerprinter missed it. When I kept the beginning up until 0:30 and muted everything from 0:30 to the end, the fingerprinter caught it. That indicates that the content database only knows about something in the first 30 seconds of the song. As long as you cut that part off, you can theoretically use the remainder of the song without being detected. I don't know if all samples in the content database suffer from similar weaknesses, but it's something that merits further research.'"

8 of 116 comments (clear)

Min score:

Reason:

Sort:

pHash by b1ng0 · 2009-04-22 06:21 · Score: 5, Interesting

This seems like a good time to pump my own open source project: pHash. pHash is a perceptual hashing library that computes hashes for audio, video and image files, with text and PDF hashing coming soon. We use an algorithm similar to YouTube's audio fingerprinting method but we do not only take into account the first 30 seconds. Although, it's impossible to tell from this basic test whether their algorithm truly only looks at the first 30 seconds, or if the algorithm considers them to be different audio files. If the song is only 1 minute in duration, and 30 seconds is blank, is that really the same audio file as the full 1 minute version? At some point the audio files are not really the same anymore, although the perceptual hashes should be somewhat close to each other. Please give pHash a try. We could use some feedback from the OSS community and would appreciate it greatly.
1. Re:pHash by FredFredrickson · 2009-04-22 06:26 · Score: 3, Interesting
  
  Out of curiosity, how well could pHash be used to find similar songs from a list of songs? Maybe not actually similar, but similar sounding (or same mood)...?
  
  Any ideas how one would go about doing this sort of thing?
  
  --
  Belief? Hope? Preference?The Existential Vortex
2. Re:pHash by ash211 · 2009-04-22 07:44 · Score: 4, Interesting
  
  The problem you're describing is known in the Music Information Retrieval (MIR) world as content-based recommendation (CBR). There are a number of ways to do it, but they're all based on measuring similarity.
  The idea is that people perceive songs as similar based on the characteristics they have, which are termed features. By representing a song's features in a model you can compare the models to see how "distant" they are, and then choose songs from a set that are least-distant. The work that my research group is pursuing represents songs based on timbral features (MFCCs) and rhythmic features (bpm, pulse clarity, syncopation, etc).
  If you're interested in the approach, see http://paragchordia.com/research/cbr.html
Doesn't really matter for most people by American+Terrorist · 2009-04-22 06:28 · Score: 3, Interesting

The big issue here is what Lessig talked about years ago: Free Culture

Then a car commercial parody I made (arguably one of my better videos) was taken down because I used an unlicensed song. That pissed me off. I couldn't easily go back and re-edit the video to remove the song, as the source media had long since been archived in a shoebox somewhere. And I couldn't simply re-upload the video, as it got identified and taken down every time. I needed to find a way to outsmart the fingerprinter. I was angry and I had a lot of free time. Not a good combination.
The guy who wrote TFA is upset that his largely unviewed videos didn't pass an automated test.

My beef with the system is that when culurally significant videos such as the Chinese "Caonima" get taken down because the song violates some copyright of a company I've never heard of on a song I've never in a million years think of buying.

Hope that link works, I had to copy it from Google since I can't even access Youtube anymore here in China.
Re:Whew! by Tsunayoshi · 2009-04-22 07:00 · Score: 2, Interesting

There-in lies the rub with the "all information should be free" mindset...EVERYONE gets to look at it.
I think that is a feature, not a bug.

--
"Get a bicycle. You will not regret it, if you live." - Mark Twain, "Taming the Bicycle"
Re:music ip? by Joe+Snipe · 2009-04-22 07:25 · Score: 2, Interesting

If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse
Or if 30 seconds of additional blank footage were tacked on to the beginning?
And FWIW, there is a very valid reason for assumming they aren't using this fingerprint system: They already had their own in-house created system that they based off of their thumbnail maker program. It is also limited to within 30 sec of a clip if I recall.

--
Sometimes, life itself is sarcasm...
Yeah by Brain-Fu · 2009-04-22 07:26 · Score: 3, Interesting

Music Genome
Someone should do this for imeem.com by illectro · 2009-04-22 09:54 · Score: 2, Interesting

imeem have been doing this for the last few years, and they don't use audible magic, they used the Snocap fingerprint system which apparently was good enough for them to buy Snocap. Their business model has always been built around using the content identification system to make sure the right people get paid for audio played on the site.
imeem is primarily used by people uploading and sharing audio, so using an audio fingerprinting system seems more appropriate than youtube relying on an audio fingerprinter for video content.