Analyzing YouTube's Audio Fingerprinter
Al Benedetto writes "I stumbled across this article which analyzes the YouTube audio content identification system in-depth. Apparently, since YouTube's system has no transparency, the behaviors had to be determined based on dozens of trial-and-error video uploads. The author tries things like speed/pitch adjustment, the addition of background noise, as well as other audio tweaks to determine exactly what you'd need to adjust before the fingerprinter started mis-identifying material. From the article: 'When I muted the beginning of the song up until 0:30 (leaving the rest to play) the fingerprinter missed it. When I kept the beginning up until 0:30 and muted everything from 0:30 to the end, the fingerprinter caught it. That indicates that the content database only knows about something in the first 30 seconds of the song. As long as you cut that part off, you can theoretically use the remainder of the song without being detected. I don't know if all samples in the content database suffer from similar weaknesses, but it's something that merits further research.'"
There's the open-source library - libOFA - developed by Music IP (http://code.google.com/p/musicip-libofa/) which happens to create PUIDs on the first 135 seconds of audio in a track. It's used in the music-IP mixer (for mood mixes) but is also used by music database projects such as MusicBrainz.
From what I've seen, it's pretty decent audio fingerprinting, but I'm sure would be subject to the same limitations- if you remove the first 30 seconds of a clip- it would produce a very different fingerprint.
There's no reason to believe youtube isn't using this library or a derivative. There's also no reason to believe this result isn't intended. If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse.
Either way, I could imagine creating a fingerprint based on different sections of a song has the same problems doing an MD5 hash would- each fingerprint would be entirely different. If you don't just compare bit-to-bit, it'll be impossible to catch ALL permutations. And the fact is, that's a lot of computing power anyhow.
Belief? Hope? Preference?The Existential Vortex
I thought the purpose (however misguided it may be) was to prevent people from uploading copyrighted songs/music videos and re-mixing them. So if I only use portions of the song that aren't in the first 30s I'm home free? That seems silly, the system must still be under refinement or is only there to stop the most blatant offenders.
It's a good thing no one at Youtube reads Slashdot. Otherwise they might come up with a fix! So, everyone keep this a secret! SHHHH!
Give a man a fire and he'll be warm for a day. But light a man on fire and he'll be warm for the rest of his life.
Here's an idea. Start out the video with a useless narrative for the first thirty seconds "blah blah blah skip until :30 and ignore this intro blah blah" then start the music. That way everybody is happy. All google employees are too elitist to read slashdot, right?
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
This seems like a good time to pump my own open source project: pHash. pHash is a perceptual hashing library that computes hashes for audio, video and image files, with text and PDF hashing coming soon. We use an algorithm similar to YouTube's audio fingerprinting method but we do not only take into account the first 30 seconds. Although, it's impossible to tell from this basic test whether their algorithm truly only looks at the first 30 seconds, or if the algorithm considers them to be different audio files. If the song is only 1 minute in duration, and 30 seconds is blank, is that really the same audio file as the full 1 minute version? At some point the audio files are not really the same anymore, although the perceptual hashes should be somewhat close to each other. Please give pHash a try. We could use some feedback from the OSS community and would appreciate it greatly.
That cool tech like this is being used to prevent "piracy" instead of something more useful.
Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
Then a car commercial parody I made (arguably one of my better videos) was taken down because I used an unlicensed song. That pissed me off. I couldn't easily go back and re-edit the video to remove the song, as the source media had long since been archived in a shoebox somewhere. And I couldn't simply re-upload the video, as it got identified and taken down every time. I needed to find a way to outsmart the fingerprinter. I was angry and I had a lot of free time. Not a good combination.
The guy who wrote TFA is upset that his largely unviewed videos didn't pass an automated test.
My beef with the system is that when culurally significant videos such as the Chinese "Caonima" get taken down because the song violates some copyright of a company I've never heard of on a song I've never in a million years think of buying.
Hope that link works, I had to copy it from Google since I can't even access Youtube anymore here in China.
Why exactly does it merit any research? This is not riddle posed by Nature — people devised this device (ha-ha), and know all the answers perfectly already, they just don't want to tell you. You are not advancing scientific progress by figuring out somebody's scheme.
You may be advancing your own knowledge and skills, but calling it "research" has no more merit, than paparazzis' "research" into celebrities' lives...
In Soviet Washington the swamp drains you.
And who fingerprints the analyzers who analyze the analyzers?
Karma: Excellent. 15 moderator points expire sometime.
Now we are going to have a ton of 30sec Ops of soundless Text about how cool the author is now.
There is also an option to claim fair use (although I think it uses different words) after it identifies a song. I did this for an artwork of mine on Youtube that included the first 30 seconds of a Cure song, and the video stayed. I really do think that in my case, it was fair use. But if you're just trying to upload an old Pearl Jam video, then this probably won't help.
I'll release a new app that will pad all your pirated songs, and at only $10 per album it's a great deal!
From the article:
"I couldn't easily go back and re-edit the video to remove the song, as the source media had long since been archived in a shoebox somewhere."
This is easy to do. Avidemux can separate the video and audio streams and recombine them.
You don't need to go back to the raw files in your NLE to do this.
An unfortunate result. The last 30 seconds of most songs are not usually as interesting as the first 30 seconds.
I wonder if he tried mangling the first 30 seconds at all. For example, keep the first 5 seconds, mess up the 6th and 7th seconds, and then continue on. Or perhaps adding in a base line that would be hard to hear. Or something at the high end of the audio frequency spectrum, to annoy all those teenagers while I listen to my free music in peace.
This hole doesn't really indentify a hole in the technology itself, just in the implementation. I'm more interested in hearing some audio that sounds the same while defeating the fingerprinting scheme. Much more interesting.
Does anyone how how Shazam works? it does a remarkably good job of song identification, even based on small samples from the middle of a song
I've noticed that is picks up vocals very well. There are some songs on Youtube that have had their individual parts lifted from guitar hero and uploaded for people to learn each part. Instrumental parts are fine, as are most full instrumental songs. However, vocal parts ARE picked up by the fingerprinter. (Search Muse acapella for examples)
I dont care who ya are that there's funny
have you seen my sig? there are many others like it but none that are the same
do the phase shift. A little phase shifting won't ruin the song much more than the compression already does.
MusicBrainz works exactly as described.
Support my political activism on Patreon.
First if this is really how the system works, why cut the first thirty seconds, when you can just pad the beginning with 30 seconds of..introduction?
More importantly, I can't believe they'd design the fingerprinting to be that trivial to fool. We have tons of knowledge about fingerprinting methods and there are dozens of ways to make this smarter.
A much more robust system will take a random sample of small clips from the entire video. The sample can still be 30sec long (total), but harder to fool. We can take N random samples from each of the original songs, create fingerprints and put them in a data structure which makes retrieval efficient. When a video is uploaded we choose at random one of the N fingerprinting schemes and check against the fingerprint database. Even if users learn all of the N schemes, they can succeed only with probability 1/N for each clip - much worse if they don't know the schemes. Also a user can be blocked if an attempt to upload the same (unauthorized) thing twice is detected. This is nothing like crypto security but it's better than fingerprinting a fixed prefix of the video. I am not taking into account the error probability of the fingerprints themselves.
With reasonable N efficiency should not suffer much (in particular the efficiency of checking a new video will be about the same as in the single fingerprint system - making the fingerprint databases can take longer, but it does not have to be in real time).
I am not saying that the thing I wrote above is that good or let alone sophisticated. Just pointing out that it is very easy to do better.
Music Genome
...just to pass some filter, then you must think the song's very well-worth listening to.. which, to me, implies it ought to be worth buying.
If it's just some background music piece - I dunno, try another song.. plenty of royalty-free and even completely free ones (nope, they're probably not in the billboard top 100 right now - so sorry).
I'm more curious about the cases where there really IS fair use involved.. what happens in those cases.. do you get to hit a checkbox saying "I believe this is fair use, please proceed to accept my upload and continue with any potential infringement processes you believe are required."?
Why exactly does it merit any research? This is not riddle posed by Nature... You are not advancing scientific progress by figuring out somebody's scheme.
Good grief. If you're trying to find out something that you can't just go look up at the library, and you're forming and testing hypotheses to do so, that could reasonably be called "research". Don't be so pedantic. :-)
(Anyway, it may well turn out that Nature is just stuff that someone already knows all the answers perfectly to and just doesn't want to tell you.)
"Not an actor, but he plays one on TV."
** In 2.48 pt.
There fixed that for you.
I didn't read TFA, but why should I care how YouTube does this??? It's not any kind of AI breakthrough, and the only reason to subvert the system is to do something illegal...
As a music lover of all genres, I say that the first 30 seconds sets the mood for the rest of the track. I can't properly get into some tracks without the intros.
The last 30 seconds of most songs is a wind down, repetitious chanting of the chorus, or random instrument bashing session anyway.
probably google has to create the fingerprints itself and it's slightly more promising to ask for the first 30s seconds of each track than saying to the rightholders "give us a full copy of every song you ever made, we want to build a copyright tool"
I tried this for myself. I muted the first 30 seconds of a copyrighted song and tried uploading it, but it didn't made any difference. Same for adding 30 seconds of silence in front, changing the pitch, etc.
Meh, I don't know what I'm doing wrong. Maybe YouTube just doesn't have 433 in its database yet.
Youtube uses Audible Magic's audio fingerprinting technology, which is based on this patent by MuscleFish: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5918223.PN.&OS=PN/5918223&RS=PN/5918223
Its called fair use.
---- Booth was a patriot ----
I met the founder of Auditude.com, a competitor to the company that supplies audio fingerprinting for YouTube. Fascinating guy, but even more fascinating technology. They claim they can identify any clip as short as 5 seconds from any portion of the original recording.
You can test them out on myspace, I'd be interested to see how well they stack up in real world tests to YouTube's provider.
what if you added 30 seconds of background noise to the beginning of the audio file (instead of deleting 30 seconds)?
The real tragedy is that a rickroll is no longer as easy. You never know how long a link might work.
Don't submit videos with music by the big labels. Using creative commons music or music from labels who approve of the free advertising will simultaneously keep you from having your videos taken down and provide more visibility to non-RIAA-label artists, helping to make their cartel useless.
imeem have been doing this for the last few years, and they don't use audible magic, they used the Snocap fingerprint system which apparently was good enough for them to buy Snocap. Their business model has always been built around using the content identification system to make sure the right people get paid for audio played on the site.
imeem is primarily used by people uploading and sharing audio, so using an audio fingerprinting system seems more appropriate than youtube relying on an audio fingerprinter for video content.
If they start using whatever Shazaam uses, we're screwed. In any case I'm sure this is the start of an arms race in which the fingerprinter keeps getting more and more elaborate to counter the effects of people trying to fool it.
Drill baby drill - on Mars
Any use that doesn't result in me getting obsene amounts of money is NOT FAIR!
</RIAA>
People .. stop using youtube and build your own video streaming server.
All you need is 10mbit / 20mbit upload, some space and http://www.longtailvideo.com/players/jw-flv-player/ that can be downloaded here: http://netsky.org/SpcVideo/flvplayer.zip.
I've created my own test server and it does the job better than youtube.
Check here:
http://netsky.org/SpcVideo/sea1.htm
This is a concert i recorded from a TV over a satelite.
If you create your own server then no one will delete your videos. :)
http://foosic.org/ this is the most accurate and reliable fingerprinting algorithm I've used.
But the author had already told us why this was:
IE. the system was designed to identify complete songs. It is a fair assumption that someone producing a pirated music compilation CD will include the entire song, including its first 30 seconds, so only checking the first 30 seconds is a perfectly sound strategy.
It wasn't designed for the YouTube environment, and put bluntly YouTube only want to be seen to be doing something. It's in their interest to get as many views as possible, and music helps! Audible Magic is "industry standard", so they can say that they're doing everything that can be expected of them. Voilà, their backsides are covered and their faces well and truly saved, and they have a show of "good faith" to hold up in any future legal action.
HAL.
Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
you rock. anything to bring back the excitement that was once youtube. fuck wmg.