Analyzing YouTube's Audio Fingerprinter

← Back to Stories (view on slashdot.org)

Analyzing YouTube's Audio Fingerprinter

Posted by timothy on Wednesday April 22, 2009 @06:12AM from the streeeeeehtch-thiiiiingsss-ouuuuuut dept.

Al Benedetto writes "I stumbled across this article which analyzes the YouTube audio content identification system in-depth. Apparently, since YouTube's system has no transparency, the behaviors had to be determined based on dozens of trial-and-error video uploads. The author tries things like speed/pitch adjustment, the addition of background noise, as well as other audio tweaks to determine exactly what you'd need to adjust before the fingerprinter started mis-identifying material. From the article: 'When I muted the beginning of the song up until 0:30 (leaving the rest to play) the fingerprinter missed it. When I kept the beginning up until 0:30 and muted everything from 0:30 to the end, the fingerprinter caught it. That indicates that the content database only knows about something in the first 30 seconds of the song. As long as you cut that part off, you can theoretically use the remainder of the song without being detected. I don't know if all samples in the content database suffer from similar weaknesses, but it's something that merits further research.'"

30 of 116 comments (clear)

music ip? by FredFredrickson · 2009-04-22 06:13 · Score: 4, Informative

There's the open-source library - libOFA - developed by Music IP (http://code.google.com/p/musicip-libofa/) which happens to create PUIDs on the first 135 seconds of audio in a track. It's used in the music-IP mixer (for mood mixes) but is also used by music database projects such as MusicBrainz.

From what I've seen, it's pretty decent audio fingerprinting, but I'm sure would be subject to the same limitations- if you remove the first 30 seconds of a clip- it would produce a very different fingerprint.

There's no reason to believe youtube isn't using this library or a derivative. There's also no reason to believe this result isn't intended. If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse.

Either way, I could imagine creating a fingerprint based on different sections of a song has the same problems doing an MD5 hash would- each fingerprint would be entirely different. If you don't just compare bit-to-bit, it'll be impossible to catch ALL permutations. And the fact is, that's a lot of computing power anyhow.

--
Belief? Hope? Preference?The Existential Vortex
1. Re:music ip? by Joe+Snipe · 2009-04-22 07:25 · Score: 2, Interesting
  
  If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse
  Or if 30 seconds of additional blank footage were tacked on to the beginning?
  And FWIW, there is a very valid reason for assumming they aren't using this fingerprint system: They already had their own in-house created system that they based off of their thumbnail maker program. It is also limited to within 30 sec of a clip if I recall.
  
  --
  Sometimes, life itself is sarcasm...
This makes it pointless right? by Rayeth · 2009-04-22 06:16 · Score: 2, Insightful

I thought the purpose (however misguided it may be) was to prevent people from uploading copyrighted songs/music videos and re-mixing them. So if I only use portions of the song that aren't in the first 30s I'm home free? That seems silly, the system must still be under refinement or is only there to stop the most blatant offenders.
Whew! by Serenissima · 2009-04-22 06:17 · Score: 5, Funny

It's a good thing no one at Youtube reads Slashdot. Otherwise they might come up with a fix! So, everyone keep this a secret! SHHHH!

--
Give a man a fire and he'll be warm for a day. But light a man on fire and he'll be warm for the rest of his life.
1. Re:Whew! by Tsunayoshi · 2009-04-22 07:00 · Score: 2, Interesting
  
  There-in lies the rub with the "all information should be free" mindset...EVERYONE gets to look at it.
  I think that is a feature, not a bug.
  
  --
  "Get a bicycle. You will not regret it, if you live." - Mark Twain, "Taming the Bicycle"
Slashdot brainstorm here by eclectro · 2009-04-22 06:21 · Score: 5, Insightful

Here's an idea. Start out the video with a useless narrative for the first thirty seconds "blah blah blah skip until :30 and ignore this intro blah blah" then start the music. That way everybody is happy. All google employees are too elitist to read slashdot, right?

--
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
1. Re:Slashdot brainstorm here by spydabyte · 2009-04-22 07:08 · Score: 4, Funny
  
  Sounds like packaging copyright material between thousands of papers and delivering it in PDF format to my university printing service to print out all my textbooks for free... except with less wasted paper.
2. Re:Slashdot brainstorm here by DriedClexler · 2009-04-22 07:20 · Score: 5, Funny
  
  Heh. I think people have already tried that.
  Hi, I'm am amateur movie critic. Today I'm going to show you an example of poor film-making. blah blah blah ...
  *Plays entire Star Wars: Episode I*
  So, as you can see by the [cinematography jargon] and [screen writing jargon], this movie sucked and I hope you learn from it in making your own movies.
  One week later:
  "No! You can't take down my video. This is CLEARLY fair use, since I have OBVIOUSLY used it for educational commentary, and the entire clip was VITAL for showing how much Episode I sucked."
  
  --
  Information theory is life. The rest is just the KL divergence.
3. Re:Slashdot brainstorm here by Locklin · 2009-04-22 07:23 · Score: 3, Funny
  
  Here's an idea. Start out the video with a useless narrative for the first thirty seconds "blah blah blah skip until :30 and ignore this intro blah blah" then start the music.
  Like on the radio?
  
  --
  "Knowledge is the only instrument of production that is not subject to diminishing returns" -Journal of Political Econom
4. Re:Slashdot brainstorm here by Hurricane78 · 2009-04-22 07:26 · Score: 2, Insightful
  
  ...and then you realize, that YouTube changed the algorithm, and that its compression makes your song sound so shitty anyway, that you actually want all the uploads to be taken down. ^^
  
  --
  Any sufficiently advanced intelligence is indistinguishable from stupidity.
5. Re:Slashdot brainstorm here by Anonymous Coward · 2009-04-22 07:42 · Score: 3, Insightful
  
  I see where you're going with this: put an advertisement at the start :-)
6. Re:Slashdot brainstorm here by VeryLargeNumber · 2009-04-22 08:32 · Score: 3, Insightful
  
  Even better - upload the video backwards.
  Someone should make backward youtube plugin for firefox. It even might autodetect backward songs and play them properly.
pHash by b1ng0 · 2009-04-22 06:21 · Score: 5, Interesting

This seems like a good time to pump my own open source project: pHash. pHash is a perceptual hashing library that computes hashes for audio, video and image files, with text and PDF hashing coming soon. We use an algorithm similar to YouTube's audio fingerprinting method but we do not only take into account the first 30 seconds. Although, it's impossible to tell from this basic test whether their algorithm truly only looks at the first 30 seconds, or if the algorithm considers them to be different audio files. If the song is only 1 minute in duration, and 30 seconds is blank, is that really the same audio file as the full 1 minute version? At some point the audio files are not really the same anymore, although the perceptual hashes should be somewhat close to each other. Please give pHash a try. We could use some feedback from the OSS community and would appreciate it greatly.
1. Re:pHash by FredFredrickson · 2009-04-22 06:26 · Score: 3, Interesting
  
  Out of curiosity, how well could pHash be used to find similar songs from a list of songs? Maybe not actually similar, but similar sounding (or same mood)...?
  
  Any ideas how one would go about doing this sort of thing?
  
  --
  Belief? Hope? Preference?The Existential Vortex
2. Re:pHash by Anonymous Coward · 2009-04-22 06:30 · Score: 5, Funny
  
  fucking pHashist...
3. Re:pHash by ash211 · 2009-04-22 07:44 · Score: 4, Interesting
  
  The problem you're describing is known in the Music Information Retrieval (MIR) world as content-based recommendation (CBR). There are a number of ways to do it, but they're all based on measuring similarity.
  The idea is that people perceive songs as similar based on the characteristics they have, which are termed features. By representing a song's features in a model you can compare the models to see how "distant" they are, and then choose songs from a set that are least-distant. The work that my research group is pursuing represents songs based on timbral features (MFCCs) and rhythmic features (bpm, pulse clarity, syncopation, etc).
  If you're interested in the approach, see http://paragchordia.com/research/cbr.html
Tragic by dedazo · 2009-04-22 06:25 · Score: 2, Insightful

That cool tech like this is being used to prevent "piracy" instead of something more useful.

--
Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
1. Re:Tragic by Runaway1956 · 2009-04-22 15:05 · Score: 2, Insightful
  
  At one end of the scale, you'll find those who think the world owes them entertainment. At the other end of the scale, you'll find those copyright squatters who think the world owes them a lavish living for sitting on dead men's works, and for acquiring a monopoly on distribution schemes.
  On your same scale, throwing a kid in jail for wanting to hear music without paying the parasites is just about negative infinity.
  Now that we have judged each other's positions, got anything constructive to offer?
  How about selling the kids all the music they want at a penny or a nickel a song, and allow them fair use, and personal copying right? Ditch the DRM. The use of DRM costs the copyright holder more than distribution!! It's not like it COSTS to distribute - especially if the copyright holders get their heads out of their butts, and use popular distribution schemes, like The Pirate Bay.
  
  --
  "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
Doesn't really matter for most people by American+Terrorist · 2009-04-22 06:28 · Score: 3, Interesting

The big issue here is what Lessig talked about years ago: Free Culture

Then a car commercial parody I made (arguably one of my better videos) was taken down because I used an unlicensed song. That pissed me off. I couldn't easily go back and re-edit the video to remove the song, as the source media had long since been archived in a shoebox somewhere. And I couldn't simply re-upload the video, as it got identified and taken down every time. I needed to find a way to outsmart the fingerprinter. I was angry and I had a lot of free time. Not a good combination.
The guy who wrote TFA is upset that his largely unviewed videos didn't pass an automated test.

My beef with the system is that when culurally significant videos such as the Chinese "Caonima" get taken down because the song violates some copyright of a company I've never heard of on a song I've never in a million years think of buying.

Hope that link works, I had to copy it from Google since I can't even access Youtube anymore here in China.
1. Re:Doesn't really matter for most people by Anonymous Coward · 2009-04-22 06:34 · Score: 2, Insightful
  
  My beef with the system is that when culurally significant videos such as the Chinese "Caonima" get taken down because the song violates some copyright of a company I've never heard of on a song I've never in a million years think of buying.
  So copyrights only apply to companies you've personally heard of and it's a song you'd buy? That's pretty stupid.
Research? by mi · 2009-04-22 06:30 · Score: 2, Insightful

but it's something that merits further research.
Why exactly does it merit any research? This is not riddle posed by Nature — people devised this device (ha-ha), and know all the answers perfectly already, they just don't want to tell you. You are not advancing scientific progress by figuring out somebody's scheme.
You may be advancing your own knowledge and skills, but calling it "research" has no more merit, than paparazzis' "research" into celebrities' lives...

--
In Soviet Washington the swamp drains you.
1. Re:Research? by radtea · 2009-04-22 07:33 · Score: 4, Insightful
  
  This is not riddle posed by Nature
  This is one of the wonderful things about science: it doesn't matter where the puzzle comes from, the same techniques work to solve it.
  Reverse engineering of this kind is one of the most useful areas of applied science, and it is as much research as any other area of scientific enquiry. It is frequently the case that there are many ways to find the answer to a puzzle, and this guy has chosen one of them based on the resources he has available. More power to him for demonstrating how good science can be used to discover what others want to keep secret.
  
  --
  Blasphemy is a human right. Blasphemophobia kills.
Yes, but who analyzes the analyzers? by thomasdz · 2009-04-22 06:31 · Score: 5, Funny

And who fingerprints the analyzers who analyze the analyzers?

--
Karma: Excellent. 15 moderator points expire sometime.
I'd rather lose the last 30 seconds by Knave75 · 2009-04-22 06:46 · Score: 5, Insightful

An unfortunate result. The last 30 seconds of most songs are not usually as interesting as the first 30 seconds.
I wonder if he tried mangling the first 30 seconds at all. For example, keep the first 5 seconds, mess up the 6th and 7th seconds, and then continue on. Or perhaps adding in a base line that would be hard to hear. Or something at the high end of the audio frequency spectrum, to annoy all those teenagers while I listen to my free music in peace.
Yeah by Brain-Fu · 2009-04-22 07:26 · Score: 3, Interesting

Music Genome
1. Re:Yeah by Anonymous Coward · 2009-04-22 08:42 · Score: 4, Informative
  
  Dear Pandora Visitor,
  We are deeply, deeply sorry to say that due to licensing constraints, we can no longer allow access to Pandora for listeners located outside of the U.S. We will continue to work diligently to realize the vision of a truly global Pandora, but for the time being we are required to restrict its use. We are very sad to have to do this, but there is no other alternative.
  If you believe we have made a mistake, we apologize and ask that you please contact us at pandora-support@pandora.com
  If you are a paid subscriber, please contact us at pandora-support@pandora.com and we will issue a pro-rated refund to the credit card you used to sign up. If you have been using Pandora, we will keep a record of your existing stations and bookmarked artists and songs, so that when we are able to launch in your country, they will be waiting for you.
  We will be notifying listeners as licensing agreements are established in individual countries. If you would like to be notified by email when Pandora is available in your country, please enter your email address below. The pace of global licensing is hard to predict, but we have the ultimate goal of being able to offer our service everywhere.
  We share your disappointment and greatly appreciate your understanding.
2. Re:Yeah by ausekilis · 2009-04-22 08:46 · Score: 2, Informative
  
  Perhaps a better link for information: Music Genome Project. A little more detail from Pandora's blog.
Patent filed with explanation of fingerprinting by bipbop · 2009-04-22 08:05 · Score: 3, Informative

Youtube uses Audible Magic's audio fingerprinting technology, which is based on this patent by MuscleFish: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5918223.PN.&OS=PN/5918223&RS=PN/5918223
1. Re:Patent filed with explanation of fingerprinting by BabyDuckHat · 2009-04-22 08:38 · Score: 2, Informative
  
  This is actually very useful information for someone looking for ways to defeat the filter, in that is lists the features of the audio that are used for generating the fingerprint. A successful work-around would most likely require modifications to several aspects of the signal.
  
  From the patent:
  
  The feature vector thus consists of the mean and standard deviation of each of the trajectories (amplitude, pitch, brightness, bass, bandwidth, and MFCCs, plus the first derivative of each of these). These numbers are the only information used in the content-based classification and retrieval of these sounds. It is possible to see some of the essential characteristics of the sound by analyzing these numbers.
Someone should do this for imeem.com by illectro · 2009-04-22 09:54 · Score: 2, Interesting

imeem have been doing this for the last few years, and they don't use audible magic, they used the Snocap fingerprint system which apparently was good enough for them to buy Snocap. Their business model has always been built around using the content identification system to make sure the right people get paid for audio played on the site.
imeem is primarily used by people uploading and sharing audio, so using an audio fingerprinting system seems more appropriate than youtube relying on an audio fingerprinter for video content.