Slashdot Mirror


Analyzing YouTube's Audio Fingerprinter

Al Benedetto writes "I stumbled across this article which analyzes the YouTube audio content identification system in-depth. Apparently, since YouTube's system has no transparency, the behaviors had to be determined based on dozens of trial-and-error video uploads. The author tries things like speed/pitch adjustment, the addition of background noise, as well as other audio tweaks to determine exactly what you'd need to adjust before the fingerprinter started mis-identifying material. From the article: 'When I muted the beginning of the song up until 0:30 (leaving the rest to play) the fingerprinter missed it. When I kept the beginning up until 0:30 and muted everything from 0:30 to the end, the fingerprinter caught it. That indicates that the content database only knows about something in the first 30 seconds of the song. As long as you cut that part off, you can theoretically use the remainder of the song without being detected. I don't know if all samples in the content database suffer from similar weaknesses, but it's something that merits further research.'"

7 of 116 comments (clear)

  1. Whew! by Serenissima · · Score: 5, Funny

    It's a good thing no one at Youtube reads Slashdot. Otherwise they might come up with a fix! So, everyone keep this a secret! SHHHH!

    --
    Give a man a fire and he'll be warm for a day. But light a man on fire and he'll be warm for the rest of his life.
  2. Slashdot brainstorm here by eclectro · · Score: 5, Insightful

    Here's an idea. Start out the video with a useless narrative for the first thirty seconds "blah blah blah skip until :30 and ignore this intro blah blah" then start the music. That way everybody is happy. All google employees are too elitist to read slashdot, right?

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
    1. Re:Slashdot brainstorm here by DriedClexler · · Score: 5, Funny

      Heh. I think people have already tried that.

      Hi, I'm am amateur movie critic. Today I'm going to show you an example of poor film-making. blah blah blah ...

      *Plays entire Star Wars: Episode I*

      So, as you can see by the [cinematography jargon] and [screen writing jargon], this movie sucked and I hope you learn from it in making your own movies.

      One week later:

      "No! You can't take down my video. This is CLEARLY fair use, since I have OBVIOUSLY used it for educational commentary, and the entire clip was VITAL for showing how much Episode I sucked."

      --
      Information theory is life. The rest is just the KL divergence.
  3. pHash by b1ng0 · · Score: 5, Interesting

    This seems like a good time to pump my own open source project: pHash. pHash is a perceptual hashing library that computes hashes for audio, video and image files, with text and PDF hashing coming soon. We use an algorithm similar to YouTube's audio fingerprinting method but we do not only take into account the first 30 seconds. Although, it's impossible to tell from this basic test whether their algorithm truly only looks at the first 30 seconds, or if the algorithm considers them to be different audio files. If the song is only 1 minute in duration, and 30 seconds is blank, is that really the same audio file as the full 1 minute version? At some point the audio files are not really the same anymore, although the perceptual hashes should be somewhat close to each other. Please give pHash a try. We could use some feedback from the OSS community and would appreciate it greatly.

    1. Re:pHash by Anonymous Coward · · Score: 5, Funny

      fucking pHashist...

  4. Yes, but who analyzes the analyzers? by thomasdz · · Score: 5, Funny

    And who fingerprints the analyzers who analyze the analyzers?

    --
    Karma: Excellent. 15 moderator points expire sometime.
  5. I'd rather lose the last 30 seconds by Knave75 · · Score: 5, Insightful

    An unfortunate result. The last 30 seconds of most songs are not usually as interesting as the first 30 seconds.

    I wonder if he tried mangling the first 30 seconds at all. For example, keep the first 5 seconds, mess up the 6th and 7th seconds, and then continue on. Or perhaps adding in a base line that would be hard to hear. Or something at the high end of the audio frequency spectrum, to annoy all those teenagers while I listen to my free music in peace.