Slashdot Mirror


Analyzing YouTube's Audio Fingerprinter

Al Benedetto writes "I stumbled across this article which analyzes the YouTube audio content identification system in-depth. Apparently, since YouTube's system has no transparency, the behaviors had to be determined based on dozens of trial-and-error video uploads. The author tries things like speed/pitch adjustment, the addition of background noise, as well as other audio tweaks to determine exactly what you'd need to adjust before the fingerprinter started mis-identifying material. From the article: 'When I muted the beginning of the song up until 0:30 (leaving the rest to play) the fingerprinter missed it. When I kept the beginning up until 0:30 and muted everything from 0:30 to the end, the fingerprinter caught it. That indicates that the content database only knows about something in the first 30 seconds of the song. As long as you cut that part off, you can theoretically use the remainder of the song without being detected. I don't know if all samples in the content database suffer from similar weaknesses, but it's something that merits further research.'"

116 comments

  1. music ip? by FredFredrickson · · Score: 4, Informative

    There's the open-source library - libOFA - developed by Music IP (http://code.google.com/p/musicip-libofa/) which happens to create PUIDs on the first 135 seconds of audio in a track. It's used in the music-IP mixer (for mood mixes) but is also used by music database projects such as MusicBrainz.

    From what I've seen, it's pretty decent audio fingerprinting, but I'm sure would be subject to the same limitations- if you remove the first 30 seconds of a clip- it would produce a very different fingerprint.

    There's no reason to believe youtube isn't using this library or a derivative. There's also no reason to believe this result isn't intended. If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse.

    Either way, I could imagine creating a fingerprint based on different sections of a song has the same problems doing an MD5 hash would- each fingerprint would be entirely different. If you don't just compare bit-to-bit, it'll be impossible to catch ALL permutations. And the fact is, that's a lot of computing power anyhow.

    --
    Belief? Hope? Preference?The Existential Vortex
    1. Re:music ip? by Captain+Splendid · · Score: 0, Offtopic

      John, I read your book and I want my money back, you boring bastard!

      --
      Linux, you magnificent bastard, I read the fucking manual!
    2. Re:music ip? by Hadlock · · Score: 0

      Rommel, I read your book, you magnificent bastard!
       
      Name that movie, using other obscure movie references. Go.

      --
      moox. for a new generation.
    3. Re:music ip? by Jurily · · Score: 1

      If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse.

      Nope. The principle of CYA says that if there's any possibility of a lawsuit, nuke it from orbit.

    4. Re:music ip? by Anonymous Coward · · Score: 0

      nuke it from orbit.

      it's the only way to be sure.

    5. Re:music ip? by Joe+Snipe · · Score: 2, Interesting

      If the first 30 seconds of a song are missing- maybe that makes youtube confident that it could be considered fairuse

      Or if 30 seconds of additional blank footage were tacked on to the beginning?

      And FWIW, there is a very valid reason for assumming they aren't using this fingerprint system: They already had their own in-house created system that they based off of their thumbnail maker program. It is also limited to within 30 sec of a clip if I recall.

      --
      Sometimes, life itself is sarcasm...
    6. Re:music ip? by Nova77 · · Score: 1

      There's also last.fm fingerprint library which is open source: svn://svn.audioscrobbler.net/recommendation/MusicID/lastfm_fplib

    7. Re:music ip? by NuclearError · · Score: 1

      Get out of here, you damn dirty apes!

      --
      Nuclear engineers build weapons. Civil engineers build targets.
    8. Re:music ip? by quickOnTheUptake · · Score: 1

      crap through a goose

      --
      Mod points: Guaranteed to remove your sense of humor.
      Side effects may include gullibility and temporary retardation
    9. Re:music ip? by Jurily · · Score: 1

      Exactly.

    10. Re:music ip? by dwbassett42 · · Score: 1

      I'm not sure what it says for my taste in movies that I instantly recognized this quote, but none of the earlier ones.

    11. Re:music ip? by Anonymous Coward · · Score: 1

      TL;DR
      Screw you guys, I'm going home.

    12. Re:music ip? by treeves · · Score: 1

      "I'll be back."

      --
      ...the future crusty old bastards are already drinking the Kool-Aid.
    13. Re:music ip? by Kagura · · Score: 0, Redundant

      Yeah.

    14. Re:music ip? by KevinIsOwn · · Score: 1

      Either way, I could imagine creating a fingerprint based on different sections of a song has the same problems doing an MD5 hash would- each fingerprint would be entirely different. If you don't just compare bit-to-bit, it'll be impossible to catch ALL permutations. And the fact is, that's a lot of computing power anyhow.

      To be honest, I'd be fairly surprised if they used a method that boiled down to a hash for exactly the problem you point out. I would make a bet they either currently use, or will use, a method that stems from Broder's method of identifying near duplicate documents (Paper: Identifying and filtering near-duplicate documents). Using such a method, removing the first 30 seconds of a song won't necessarily fool the fingerprinting method. It might, but there is a high probability that enough of the "shingles" will match between the song and the fingerprint to raise an alarm.

      But who knows, I'm not an expert in song fingerprinting (IANAEISF?)!

    15. Re:music ip? by 16Chapel · · Score: 1

      "I believe in slime and stink and every crawling, putrid thing... every possible ugliness and corruption, you son of a bitch. I believe... in you. "

  2. This makes it pointless right? by Rayeth · · Score: 2, Insightful

    I thought the purpose (however misguided it may be) was to prevent people from uploading copyrighted songs/music videos and re-mixing them. So if I only use portions of the song that aren't in the first 30s I'm home free? That seems silly, the system must still be under refinement or is only there to stop the most blatant offenders.

    1. Re:This makes it pointless right? by tepples · · Score: 1

      So if I only use portions of the song that aren't in the first 30s I'm home free? That seems silly, the system must still be under refinement or is only there to stop the most blatant offenders.

      I'm inclined to believe the latter. If a video doesn't use more than about 30 seconds of a recording at a time, it's likely that the video's author attempted to use the work fairly. I guess my video got flagged because I opened with a vocal-cut version of one of the songs on which I was commenting.

  3. Whew! by Serenissima · · Score: 5, Funny

    It's a good thing no one at Youtube reads Slashdot. Otherwise they might come up with a fix! So, everyone keep this a secret! SHHHH!

    --
    Give a man a fire and he'll be warm for a day. But light a man on fire and he'll be warm for the rest of his life.
    1. Re:Whew! by Tsunayoshi · · Score: 2, Interesting

      There-in lies the rub with the "all information should be free" mindset...EVERYONE gets to look at it.

      I think that is a feature, not a bug.

      --
      "Get a bicycle. You will not regret it, if you live." - Mark Twain, "Taming the Bicycle"
    2. Re:Whew! by Anonymous Coward · · Score: 1, Funny

      Which leads to the metaphysical question bugging us all: does a WHOOSH count as information?!

    3. Re:Whew! by Nicolay77 · · Score: 1

      WHOOSHes are overrated.

      In this case the reply was more interesting than the joke.

      --
      We are Turing O-Machines. The Oracle is out there.
  4. Slashdot brainstorm here by eclectro · · Score: 5, Insightful

    Here's an idea. Start out the video with a useless narrative for the first thirty seconds "blah blah blah skip until :30 and ignore this intro blah blah" then start the music. That way everybody is happy. All google employees are too elitist to read slashdot, right?

    --
    Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
    1. Re:Slashdot brainstorm here by Anonymous Coward · · Score: 0

      I think what they meant is that you should not use the first 30 seconds of the copyrighted material anywhere in your video, not that you shouldn't put any copyrighted material in the first 30 seconds of your youtube video.

    2. Re:Slashdot brainstorm here by eclectro · · Score: 1

      That may not be the case as it may be too computationally intensive to process the whole five minute video that is uploaded.

      --
      Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
    3. Re:Slashdot brainstorm here by spydabyte · · Score: 4, Funny

      Sounds like packaging copyright material between thousands of papers and delivering it in PDF format to my university printing service to print out all my textbooks for free... except with less wasted paper.

    4. Re:Slashdot brainstorm here by DriedClexler · · Score: 5, Funny

      Heh. I think people have already tried that.

      Hi, I'm am amateur movie critic. Today I'm going to show you an example of poor film-making. blah blah blah ...

      *Plays entire Star Wars: Episode I*

      So, as you can see by the [cinematography jargon] and [screen writing jargon], this movie sucked and I hope you learn from it in making your own movies.

      One week later:

      "No! You can't take down my video. This is CLEARLY fair use, since I have OBVIOUSLY used it for educational commentary, and the entire clip was VITAL for showing how much Episode I sucked."

      --
      Information theory is life. The rest is just the KL divergence.
    5. Re:Slashdot brainstorm here by Locklin · · Score: 3, Funny

      Here's an idea. Start out the video with a useless narrative for the first thirty seconds "blah blah blah skip until :30 and ignore this intro blah blah" then start the music.

      Like on the radio?

      --
      "Knowledge is the only instrument of production that is not subject to diminishing returns" -Journal of Political Econom
    6. Re:Slashdot brainstorm here by Hurricane78 · · Score: 2, Insightful

      ...and then you realize, that YouTube changed the algorithm, and that its compression makes your song sound so shitty anyway, that you actually want all the uploads to be taken down. ^^

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    7. Re:Slashdot brainstorm here by Anonymous Coward · · Score: 3, Insightful

      I see where you're going with this: put an advertisement at the start :-)

    8. Re:Slashdot brainstorm here by VeryLargeNumber · · Score: 3, Insightful

      Even better - upload the video backwards.
      Someone should make backward youtube plugin for firefox. It even might autodetect backward songs and play them properly.

    9. Re:Slashdot brainstorm here by Anonymous Coward · · Score: 0

      blah blah blah, skip this comment and go straight to the next one modded 5.

    10. Re:Slashdot brainstorm here by WhatsAProGingrass · · Score: 1

      Why not encrypt the audio and have users download a special player that will decrypt at runtime. End user hears the intended audio and youtube only hears noise.

      --
      Mark
  5. pHash by b1ng0 · · Score: 5, Interesting

    This seems like a good time to pump my own open source project: pHash. pHash is a perceptual hashing library that computes hashes for audio, video and image files, with text and PDF hashing coming soon. We use an algorithm similar to YouTube's audio fingerprinting method but we do not only take into account the first 30 seconds. Although, it's impossible to tell from this basic test whether their algorithm truly only looks at the first 30 seconds, or if the algorithm considers them to be different audio files. If the song is only 1 minute in duration, and 30 seconds is blank, is that really the same audio file as the full 1 minute version? At some point the audio files are not really the same anymore, although the perceptual hashes should be somewhat close to each other. Please give pHash a try. We could use some feedback from the OSS community and would appreciate it greatly.

    1. Re:pHash by FredFredrickson · · Score: 3, Interesting

      Out of curiosity, how well could pHash be used to find similar songs from a list of songs? Maybe not actually similar, but similar sounding (or same mood)...?

      Any ideas how one would go about doing this sort of thing?

      --
      Belief? Hope? Preference?The Existential Vortex
    2. Re:pHash by Anonymous Coward · · Score: 5, Funny

      fucking pHashist...

    3. Re:pHash by ash211 · · Score: 4, Interesting

      The problem you're describing is known in the Music Information Retrieval (MIR) world as content-based recommendation (CBR). There are a number of ways to do it, but they're all based on measuring similarity.

      The idea is that people perceive songs as similar based on the characteristics they have, which are termed features. By representing a song's features in a model you can compare the models to see how "distant" they are, and then choose songs from a set that are least-distant. The work that my research group is pursuing represents songs based on timbral features (MFCCs) and rhythmic features (bpm, pulse clarity, syncopation, etc).

      If you're interested in the approach, see http://paragchordia.com/research/cbr.html

    4. Re:pHash by Anonymous Coward · · Score: 0

      fucking cOward...

    5. Re:pHash by bcrowell · · Score: 1

      This seems like a good time to pump my own open source project: pHash. pHash is a perceptual hashing library that computes hashes for audio, video and image files, with text and PDF hashing coming soon.

      Cool! The history of these algorithms, and of databases like CDDB, is kind of depressing to an open-source guy like me. It's great to see someone doing this as an open-source project. How stable is the algorithm? If I compute a pHash today, will it still be compatible with pHashes computed next year? Any plans to make a database of pHashes? I have a music collection that contains a lot of digitized LPs, and music that's sometimes one album per file but sometimes one track per file. It would be really convenient to be able to crank out pHashes for them and get useful data automatically.

    6. Re:pHash by Anonymous Coward · · Score: 0

      Yeah, I'm with you. It really sucks when people don't realize that they should give their work away for free so that basement-dwelling social rejects can avoid getting a job and still entertain themselves as they see fit.

  6. Tragic by dedazo · · Score: 2, Insightful

    That cool tech like this is being used to prevent "piracy" instead of something more useful.

    --
    Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
    1. Re:Tragic by Anonymous Coward · · Score: 0

      you mean like the other cool tech that's being used for piracy instead of something cool? let's at least try to have a shred of honesty here, piracy caused an issue that made people throw money at the problem. if you think it was a waste of money than don't give them cause by supporting piracy.

      or is this just another ill-considered swipe at an industry that is trying to do what any other industry would do if their product was threatened? it has all the cool factor of the 15 year old kid wearing a dead kennedys t-shirt smoking a 'boro in protest to cigarette bans.

    2. Re:Tragic by Runaway1956 · · Score: 1

      Put the horse before the horse shit, alright? Blatant abuse of copyright "law" caused a problem that was addressed by piracy, then in turn,the abusers of copyright "law" attempted to address THEIR problem with piracy.

      And, your holier-than-thou attitude toward cigarettes goes hand in hand with the abuse of copyright. Some kid wants to smoke, but "THE MAN" has the badge, the gun, the club, and all the money to back up his dictates that the kid can't smoke.

      Why don't you smoke a doobie and chill out with that kid, and listen to some of his pirated headbanger's music? Some head banging might do you some good, LMAO

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
    3. Re:Tragic by The+End+Of+Days · · Score: 0, Flamebait

      The only problem piracy addresses is that of people who feel the world owes them entertainment. On the the scale of zero to justice, this solution ranks somewhere around negative infinity.

    4. Re:Tragic by Runaway1956 · · Score: 2, Insightful

      At one end of the scale, you'll find those who think the world owes them entertainment. At the other end of the scale, you'll find those copyright squatters who think the world owes them a lavish living for sitting on dead men's works, and for acquiring a monopoly on distribution schemes.

      On your same scale, throwing a kid in jail for wanting to hear music without paying the parasites is just about negative infinity.

      Now that we have judged each other's positions, got anything constructive to offer?

      How about selling the kids all the music they want at a penny or a nickel a song, and allow them fair use, and personal copying right? Ditch the DRM. The use of DRM costs the copyright holder more than distribution!! It's not like it COSTS to distribute - especially if the copyright holders get their heads out of their butts, and use popular distribution schemes, like The Pirate Bay.

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
  7. Doesn't really matter for most people by American+Terrorist · · Score: 3, Interesting
    The big issue here is what Lessig talked about years ago: Free Culture

    Then a car commercial parody I made (arguably one of my better videos) was taken down because I used an unlicensed song. That pissed me off. I couldn't easily go back and re-edit the video to remove the song, as the source media had long since been archived in a shoebox somewhere. And I couldn't simply re-upload the video, as it got identified and taken down every time. I needed to find a way to outsmart the fingerprinter. I was angry and I had a lot of free time. Not a good combination.

    The guy who wrote TFA is upset that his largely unviewed videos didn't pass an automated test.

    My beef with the system is that when culurally significant videos such as the Chinese "Caonima" get taken down because the song violates some copyright of a company I've never heard of on a song I've never in a million years think of buying.

    Hope that link works, I had to copy it from Google since I can't even access Youtube anymore here in China.

    1. Re:Doesn't really matter for most people by American+Terrorist · · Score: 1

      It's 2:30 AM, srry typos

    2. Re:Doesn't really matter for most people by Anonymous Coward · · Score: 2, Insightful

      My beef with the system is that when culurally significant videos such as the Chinese "Caonima" get taken down because the song violates some copyright of a company I've never heard of on a song I've never in a million years think of buying.

      So copyrights only apply to companies you've personally heard of and it's a song you'd buy? That's pretty stupid.

    3. Re:Doesn't really matter for most people by American+Terrorist · · Score: 1

      I'm drunk and tired waiting for the English Premier League games to come on, no claims to being super insightful right now. I just meant that it pisses me off even more than it already does. If that makes any sense.

    4. Re:Doesn't really matter for most people by twidarkling · · Score: 1

      So copyrights only apply to companies you've personally heard of and it's a song you'd buy? That's pretty stupid.

      I believe the point wasn't the significance of the company filing the complaint, but the content of the video being removed. If a video has a significant contribution, a larger company might be more willing to let a small infringement slide on the basis of good will, since they have other sources of income. A small company would be more likely to be zealous, since even small infringements represent a significant portion of potential income. Of course, that paradigm hardly applies in all situations, probably not even most. However, there is a point there to be made. If someone has something significant to contribute, should an allowance be made for small infringements made in pursuit of that contribution? How many would be silenced for fear of stumbling afoul of copyright issues? How many would be too many? If I'm making a free video, and am earning absolutely nothing from having made it, should I have to pay for the use of a song that provides the perfect counter-point to what I'm saying? Then again, if I'm making any kind of revenue from it, even ads on the page, yes, you should have to pay for the right to use it.

      --
      Canada: The US's more awesome sibling.
  8. Research? by mi · · Score: 2, Insightful

    but it's something that merits further research.

    Why exactly does it merit any research? This is not riddle posed by Nature — people devised this device (ha-ha), and know all the answers perfectly already, they just don't want to tell you. You are not advancing scientific progress by figuring out somebody's scheme.

    You may be advancing your own knowledge and skills, but calling it "research" has no more merit, than paparazzis' "research" into celebrities' lives...

    --
    In Soviet Washington the swamp drains you.
    1. Re:Research? by Jah-Wren+Ryel · · Score: 1

      So, anthropology is not research?

      --
      When information is power, privacy is freedom.
    2. Re:Research? by radtea · · Score: 4, Insightful

      This is not riddle posed by Nature

      This is one of the wonderful things about science: it doesn't matter where the puzzle comes from, the same techniques work to solve it.

      Reverse engineering of this kind is one of the most useful areas of applied science, and it is as much research as any other area of scientific enquiry. It is frequently the case that there are many ways to find the answer to a puzzle, and this guy has chosen one of them based on the resources he has available. More power to him for demonstrating how good science can be used to discover what others want to keep secret.

      --
      Blasphemy is a human right. Blasphemophobia kills.
    3. Re:Research? by billcopc · · Score: 1

      The "research" only leads to working around the content filters, posting material the site operators explicitly do not want.

      It would be more interesting if there was a productive application for this knowledge. Putting Rihanna songs on Youtube does not fit my idea of "productive".

      --
      -Billco, Fnarg.com
    4. Re:Research? by Jah-Wren+Ryel · · Score: 1

      It would be more interesting if there was a productive application for this knowledge. Putting Rihanna songs on Youtube does not fit my idea of "productive".

      Why is it that so many people are so ready to condemn others because of their own lack of imagination?

      My niece is a working print model and aspiring actress who has starred in a few very high profile music videos (the kind that get nominated for MTV's annual awards) and been featured in a few national commercials. She has put together a youtube channel to promote her career - the goal was to include a copy of very video work she has been in and title it so that her name was explicitly associated with the video or commercial. That tactic works very well for google searches - search on her name and the first page of searches includes a list of every youtube video with her name in the title.

      Unfortunately, some of the videos are blocked. Some are fine - recognized as copyrighted but the nominal owner has entered into a revenue sharing agreement with youtube - but not all of them. Turns out that most of the blocked ones are already posted on youtube via official channels with very poor video quality (letterboxed and pillarboxed for example). So my niece is not able to upload high-quality versions (or even identical copies downloaded with one of the billion youtube downloaders) of those videos to her own channel in order to use them as part of a modern day portfolio/reel.

      If she's able to use this guy's research, it will improve her ability to promote her career by showcasing the work she has already done.

      --
      When information is power, privacy is freedom.
    5. Re:Research? by mi · · Score: 1

      Turns out that most of the blocked ones are already posted on youtube via official channels with very poor video quality (letterboxed and pillarboxed for example).

      So, the copyright belongs to someone else, and they chose to use a lower-quality version. I don't see, how this gives your niece — although she appears in the video, she is not the owner of it — the right to post her own version...

      Back to the point about advancing human progress, I don't think, a particular fashion model's success or failure have any effect on it...

      --
      In Soviet Washington the swamp drains you.
    6. Re:Research? by Jah-Wren+Ryel · · Score: 1

      You are clearly unfamiliar with how hollywood works and the concepts of reels and portfolios.

      Back to the point about advancing human progress, I don't think, a particular fashion model's success or failure have any effect on it...

      Did I say she was a fashion model? Or are you just projecting your own pejorative attitude about the entertainment industry? Ironic you are so dismissive of the industry and yet so quick to defend an entirely bogus preconceived notion about some of their rights.

      --
      When information is power, privacy is freedom.
    7. Re:Research? by Eil · · Score: 1

      Why exactly does it merit any research? This is not riddle posed by Nature -- people devised this device (ha-ha), and know all the answers perfectly already, they just don't want to tell you. You are not advancing scientific progress by figuring out somebody's scheme.

      So as long as somebody knows exactly how the system works, that's good enough for you? Fine, but that's not how all of us are wired. Google's knowledge of their audio fingerprinting scheme is useless to me if I want to know how it works and they won't tell me.

      Making the details of these kinds of systems publicly available is a valuable service to society as a whole because it means each person who is interested in similar technology or systems doesn't have to waste his/her time repeating the same experiment. It also provides extremely useful information for average Google YouTube users who want to upload videos but don't want said videos unconditionally muted because Google's algorithms can't distinguish between fair-use samples and blatant copyright infringement.

    8. Re:Research? by The+End+Of+Days · · Score: 1

      Did your niece receive ownership of the various content as compensation or put up the money for the production or something along those lines? I doubt it but there's always that chance.

      What I find most likely, however, is that she was hired to perform a service and compensated according to a contract she agreed to. It seems fairly unrealistic that distribution rights were part of that compensation, and if that is indeed the case, she shouldn't be distributing no matter how personally beneficial she would find it.

      I would certainly find it lucrative if I could resell some of the software I've written under work-for-hire contracts, but I agreed to the conditions involving my employers retaining those rights in exchange for compensation.

      Do-overs don't work in the adult world, and violating the rights of others because it's personally beneficial is repugnant to say the least.

    9. Re:Research? by The+End+Of+Days · · Score: 0, Troll

      Yeah, wouldn't it be awesome if Google were forced to give away all their technology regardless of the investment they made to develop it? Sure, they'd lose out in every way, but who cares? All they did was put in the work, but they're a big corporation and you're a noble free hacker. You should win by default.

    10. Re:Research? by Jah-Wren+Ryel · · Score: 1

      I would certainly find it lucrative if I could resell some of the software I've written under work-for-hire contracts, but I agreed to the conditions involving my employers retaining those rights in exchange for compensation.

      Damn! Yet another poster without a clue as to how portfolios and reels are used within the entertainment and advertising business. Why do you guys think you know so much about something you clearly have zero knowledge of? And why are you all so fucking high and mighty about it too? Repugnant? Jesus!

      Here's your clue - it is standard procedure to send copies of your reel - i.e. a dvd with significant examples of your prior work for hire to agents (when you don't yet have an agent) and casting directors. That's distribution, but no one complains because without it the whole system would fall apart. Just as reels were once literally film reels, then vhs tapes and eventually became dvds, moving to online reels is becoming de rigueur.

      --
      When information is power, privacy is freedom.
  9. Yes, but who analyzes the analyzers? by thomasdz · · Score: 5, Funny

    And who fingerprints the analyzers who analyze the analyzers?

    --
    Karma: Excellent. 15 moderator points expire sometime.
    1. Re:Yes, but who analyzes the analyzers? by Hurricane78 · · Score: 1

      Simple. Make it a recursive loop.

      In Haskell you would do it like this:

      import Helpers (loadAudio,Analyzer)
       
      mkAnalyzer a =
        let aa = Analyzer a
        in aa : mkAnalizer aa
       
      analyze (audio:[]) = audio
      analyze (analyzee:as) = (analyze as) analyzee
       
      main = do
        audio <- loadAudio "someAudio.wav"
        let safeAnalyzersChain = mkAnalyzer audio
        analyze safeAnalyzersChain

      Of course this is verbose and not optimized, because I am still a Haskell amateur. And of course this program will never end, because I am not an idiot. ^^ But if you change the second last line to

      let safeAnalyzersChain = take n (mkAnalyzer audio)

      where "n" is the number of analizers you think are enough,
      it should end.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    2. Re:Yes, but who analyzes the analyzers? by Anonymous Coward · · Score: 0

      But if you change the second last line to

      let safeAnalyzersChain = take n (mkAnalyzer audio)

      where "n" is the number of analizers you think are enough, it should end.

      Hm...what about n=0?

    3. Re:Yes, but who analyzes the analyzers? by Hurricane78 · · Score: 1

      Hmm... This would give you an empty list. So it would definitely not compile, but complain about not all cases being handled.
      But the missing case could easily be added with

      analyze [] = []

      Of course, you would then have to handle this is the code following the last line in "main" too.

      Or you limit "n" to >0.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
  10. Great. by Icegryphon · · Score: 1

    Now we are going to have a ton of 30sec Ops of soundless Text about how cool the author is now.

    1. Re:Great. by twidarkling · · Score: 1

      You misunderstand. It's not the video. It's the song. So cut in after the first verse, even right at the start of the song, and it's likely to pass inspection.

      --
      Canada: The US's more awesome sibling.
  11. Fair Use by Paul+Slocum · · Score: 1

    There is also an option to claim fair use (although I think it uses different words) after it identifies a song. I did this for an artwork of mine on Youtube that included the first 30 seconds of a Cure song, and the video stayed. I really do think that in my case, it was fair use. But if you're just trying to upload an old Pearl Jam video, then this probably won't help.

  12. New song padding service by Anonymous Coward · · Score: 0

    I'll release a new app that will pad all your pirated songs, and at only $10 per album it's a great deal!

  13. Easily solved. by Anonymous Coward · · Score: 0

    From the article:

    "I couldn't easily go back and re-edit the video to remove the song, as the source media had long since been archived in a shoebox somewhere."

    This is easy to do. Avidemux can separate the video and audio streams and recombine them.
    You don't need to go back to the raw files in your NLE to do this.

  14. I'd rather lose the last 30 seconds by Knave75 · · Score: 5, Insightful

    An unfortunate result. The last 30 seconds of most songs are not usually as interesting as the first 30 seconds.

    I wonder if he tried mangling the first 30 seconds at all. For example, keep the first 5 seconds, mess up the 6th and 7th seconds, and then continue on. Or perhaps adding in a base line that would be hard to hear. Or something at the high end of the audio frequency spectrum, to annoy all those teenagers while I listen to my free music in peace.

    1. Re:I'd rather lose the last 30 seconds by Hurricane78 · · Score: 1

      And to make your animals go crazy? And I mean pathologically crazy, as in If it were a human, it would need strong medication an a padded room.".

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    2. Re:I'd rather lose the last 30 seconds by syousef · · Score: 1

      I wonder if he tried mangling the first 30 seconds at all. For example, keep the first 5 seconds, mess up the 6th and 7th seconds, and then continue on.

      If you're going to do that, you're going to ruin the song. If that's what you want to do it would be much more effective to sing it at the top of your lungs off key in the shower and put that in as your soundtrack instead. As a bonus, you're still violating copyright.

      --
      These posts express my own personal views, not those of my employer
  15. Cutting off the first 30 seconds by dmomo · · Score: 1

    This hole doesn't really indentify a hole in the technology itself, just in the implementation. I'm more interested in hearing some audio that sounds the same while defeating the fingerprinting scheme. Much more interesting.

    1. Re:Cutting off the first 30 seconds by maxume · · Score: 1

      Hearing audio that sounds the same doesn't seem that interesting to me.

      Being able to generate it, on the other hand...

      --
      Nerd rage is the funniest rage.
  16. Shazam by Anonymous Coward · · Score: 0

    Does anyone how how Shazam works? it does a remarkably good job of song identification, even based on small samples from the middle of a song

  17. Acapellas are flagged! by Riceo · · Score: 1

    I've noticed that is picks up vocals very well. There are some songs on Youtube that have had their individual parts lifted from guitar hero and uploaded for people to learn each part. Instrumental parts are fine, as are most full instrumental songs. However, vocal parts ARE picked up by the fingerprinter. (Search Muse acapella for examples)

  18. funny not troll by ganjadude · · Score: 1, Funny

    I dont care who ya are that there's funny

    --
    have you seen my sig? there are many others like it but none that are the same
  19. least harmful alteration by Anonymous Coward · · Score: 0

    do the phase shift. A little phase shifting won't ruin the song much more than the compression already does.

  20. MusicBrainz by bluefoxlucid · · Score: 1

    MusicBrainz works exactly as described.

  21. Two comments by Anonymous Coward · · Score: 0

    First if this is really how the system works, why cut the first thirty seconds, when you can just pad the beginning with 30 seconds of..introduction?

    More importantly, I can't believe they'd design the fingerprinting to be that trivial to fool. We have tons of knowledge about fingerprinting methods and there are dozens of ways to make this smarter.

    A much more robust system will take a random sample of small clips from the entire video. The sample can still be 30sec long (total), but harder to fool. We can take N random samples from each of the original songs, create fingerprints and put them in a data structure which makes retrieval efficient. When a video is uploaded we choose at random one of the N fingerprinting schemes and check against the fingerprint database. Even if users learn all of the N schemes, they can succeed only with probability 1/N for each clip - much worse if they don't know the schemes. Also a user can be blocked if an attempt to upload the same (unauthorized) thing twice is detected. This is nothing like crypto security but it's better than fingerprinting a fixed prefix of the video. I am not taking into account the error probability of the fingerprints themselves.

    With reasonable N efficiency should not suffer much (in particular the efficiency of checking a new video will be about the same as in the single fingerprint system - making the fingerprint databases can take longer, but it does not have to be in real time).

    I am not saying that the thing I wrote above is that good or let alone sophisticated. Just pointing out that it is very easy to do better.

  22. Yeah by Brain-Fu · · Score: 3, Interesting
    1. Re:Yeah by Anonymous Coward · · Score: 4, Informative

      Dear Pandora Visitor,

      We are deeply, deeply sorry to say that due to licensing constraints, we can no longer allow access to Pandora for listeners located outside of the U.S. We will continue to work diligently to realize the vision of a truly global Pandora, but for the time being we are required to restrict its use. We are very sad to have to do this, but there is no other alternative.

      If you believe we have made a mistake, we apologize and ask that you please contact us at pandora-support@pandora.com

      If you are a paid subscriber, please contact us at pandora-support@pandora.com and we will issue a pro-rated refund to the credit card you used to sign up. If you have been using Pandora, we will keep a record of your existing stations and bookmarked artists and songs, so that when we are able to launch in your country, they will be waiting for you.

      We will be notifying listeners as licensing agreements are established in individual countries. If you would like to be notified by email when Pandora is available in your country, please enter your email address below. The pace of global licensing is hard to predict, but we have the ultimate goal of being able to offer our service everywhere.

      We share your disappointment and greatly appreciate your understanding.

    2. Re:Yeah by ausekilis · · Score: 2, Informative

      Perhaps a better link for information: Music Genome Project. A little more detail from Pandora's blog.

  23. If you'd have to go through all that trouble... by Animaether · · Score: 1

    ...just to pass some filter, then you must think the song's very well-worth listening to.. which, to me, implies it ought to be worth buying.

    If it's just some background music piece - I dunno, try another song.. plenty of royalty-free and even completely free ones (nope, they're probably not in the billboard top 100 right now - so sorry).

    I'm more curious about the cases where there really IS fair use involved.. what happens in those cases.. do you get to hit a checkbox saying "I believe this is fair use, please proceed to accept my upload and continue with any potential infringement processes you believe are required."?

  24. away pedant! by mkcmkc · · Score: 1

    Why exactly does it merit any research? This is not riddle posed by Nature... You are not advancing scientific progress by figuring out somebody's scheme.

    Good grief. If you're trying to find out something that you can't just go look up at the library, and you're forming and testing hypotheses to do so, that could reasonably be called "research". Don't be so pedantic. :-)

    (Anyway, it may well turn out that Nature is just stuff that someone already knows all the answers perfectly to and just doesn't want to tell you.)

    --
    "Not an actor, but he plays one on TV."
  25. Re:What is the sound of one cock slapping? by Anonymous Coward · · Score: 0

    ** In 2.48 pt.

    There fixed that for you.

  26. Who Cares?? by jowilkin · · Score: 0, Troll

    I didn't read TFA, but why should I care how YouTube does this??? It's not any kind of AI breakthrough, and the only reason to subvert the system is to do something illegal...

    1. Re:Who Cares?? by gobbo · · Score: 1

      the only reason to subvert the system is to do something illegal...

      Wrong, wrong, wrong.

      1: There are many fair use possibilities that this system could infringe on.

      2: Copyright infringement's illegality varies depending on your local legal system.

      3: The fuzzy areas of fair use can arguably be extended pretty far into mashups and remixing.

    2. Re:Who Cares?? by The+End+Of+Days · · Score: 1

      I get how noble (fair use just sounds noble) those reasons are, but why is it Google's responsibility to publish things they don't want to publish? Being big and really good at what they do doesn't remove their right to control what they serve.

  27. I'd rather lose the first 30 seconds by Anonymous Coward · · Score: 0

    As a music lover of all genres, I say that the first 30 seconds sets the mood for the rest of the track. I can't properly get into some tracks without the intros.

    The last 30 seconds of most songs is a wind down, repetitious chanting of the chorus, or random instrument bashing session anyway.

  28. google is not able to crop anything by Anonymous Coward · · Score: 0

    probably google has to create the fingerprints itself and it's slightly more promising to ask for the first 30s seconds of each track than saying to the rightholders "give us a full copy of every song you ever made, we want to build a copyright tool"

  29. I can't reproduce this... by Anonymous Coward · · Score: 0

    I tried this for myself. I muted the first 30 seconds of a copyrighted song and tried uploading it, but it didn't made any difference. Same for adding 30 seconds of silence in front, changing the pitch, etc.
    Meh, I don't know what I'm doing wrong. Maybe YouTube just doesn't have 433 in its database yet.

    1. Re:I can't reproduce this... by Anonymous Coward · · Score: 0

      That should have been 4'33". You ruined it, /.!

  30. Patent filed with explanation of fingerprinting by bipbop · · Score: 3, Informative
    1. Re:Patent filed with explanation of fingerprinting by Anonymous Coward · · Score: 0

      Absolutely correct. From the Audible Magic website ( http://www.audiblemagic.com/clients-partners/contentsvcs.asp ) :

      Google's video sharing site, YouTube, has partnered with Audible Magic for content identification. Audible Magic provides content identification services that identify copyrighted music on user videos uploaded to YouTube.

    2. Re:Patent filed with explanation of fingerprinting by BabyDuckHat · · Score: 2, Informative

      This is actually very useful information for someone looking for ways to defeat the filter, in that is lists the features of the audio that are used for generating the fingerprint. A successful work-around would most likely require modifications to several aspects of the signal.

      From the patent:

      The feature vector thus consists of the mean and standard deviation of each of the trajectories (amplitude, pitch, brightness, bass, bandwidth, and MFCCs, plus the first derivative of each of these). These numbers are the only information used in the content-based classification and retrieval of these sounds. It is possible to see some of the essential characteristics of the sound by analyzing these numbers.

  31. If you cut off the first 30 seconds.. by nurb432 · · Score: 1

    Its called fair use.

    --
    ---- Booth was a patriot ----
  32. Auditude by hitfu · · Score: 1

    I met the founder of Auditude.com, a competitor to the company that supplies audio fingerprinting for YouTube. Fascinating guy, but even more fascinating technology. They claim they can identify any clip as short as 5 seconds from any portion of the original recording.

    You can test them out on myspace, I'd be interested to see how well they stack up in real world tests to YouTube's provider.

  33. 30 seconds of silence by Anonymous Coward · · Score: 0

    what if you added 30 seconds of background noise to the beginning of the audio file (instead of deleting 30 seconds)?

  34. Rick Roll Tragedy by omnichad · · Score: 1

    The real tragedy is that a rickroll is no longer as easy. You never know how long a link might work.

  35. Here's a thought... by DrEldarion · · Score: 1

    Don't submit videos with music by the big labels. Using creative commons music or music from labels who approve of the free advertising will simultaneously keep you from having your videos taken down and provide more visibility to non-RIAA-label artists, helping to make their cartel useless.

    1. Re:Here's a thought... by fiannaFailMan · · Score: 1

      I once uploaded a vid with music by Ulrich Schnauss, published by Domino Recording Co. I don't know if they're a big label or not but they were cool with me using the soundtrack because they get to post an iTunes 'buy now' button and an amazon.com 'buy now' button on my movie's page.

      On the plus side, an enlightened record company can use this as a means of getting free advertising and driving sales as long as they don't find the video objectionable.

      On the negative side, who's to say that all record companies are going to be so enlightened, and who's to say that their idea of objectionable is the same as the end user's? I'd hate to upload a video criticising, say, George W "shit for brains" Bush only to have it taken down because some record label executive happens to be a Fox News fan.

      --
      Drill baby drill - on Mars
  36. Someone should do this for imeem.com by illectro · · Score: 2, Interesting

    imeem have been doing this for the last few years, and they don't use audible magic, they used the Snocap fingerprint system which apparently was good enough for them to buy Snocap. Their business model has always been built around using the content identification system to make sure the right people get paid for audio played on the site.

    imeem is primarily used by people uploading and sharing audio, so using an audio fingerprinting system seems more appropriate than youtube relying on an audio fingerprinter for video content.

  37. What about Shazaam? by fiannaFailMan · · Score: 1

    If they start using whatever Shazaam uses, we're screwed. In any case I'm sure this is the start of an arms race in which the fingerprinter keeps getting more and more elaborate to counter the effects of people trying to fool it.

    --
    Drill baby drill - on Mars
  38. Obligatory cynicism by drwav · · Score: 1


    Any use that doesn't result in me getting obsene amounts of money is NOT FAIR!
    </RIAA>

  39. Youtube.. by Spc01 · · Score: 1

    People .. stop using youtube and build your own video streaming server. All you need is 10mbit / 20mbit upload, some space and http://www.longtailvideo.com/players/jw-flv-player/ that can be downloaded here: http://netsky.org/SpcVideo/flvplayer.zip. I've created my own test server and it does the job better than youtube. Check here: http://netsky.org/SpcVideo/sea1.htm This is a concert i recorded from a TV over a satelite. If you create your own server then no one will delete your videos. :)

  40. foosic by tick-tock-atona · · Score: 1

    http://foosic.org/ this is the most accurate and reliable fingerprinting algorithm I've used.

  41. Wrong tool...? by Half-pint+HAL · · Score: 1
    From TFA:

    It's downright dumb: Wrap your heads around this. When I muted the beginning of the song up until 0:30 (leaving the rest to play) the fingerprinter missed it. When I kept the beginning up until 0:30 and muted everything from 0:30 to the end, the fingerprinter caught it. That indicates that the content database only knows about something in the first 30 seconds of the song. As long as you cut that part off, you can theoretically use the remainder of the song without being detected. I don't know if all samples in the content database suffer from similar weaknesses, but it's something that merits further research.

    But the author had already told us why this was:

    Audible Magic originally wrote software for CD duplication companies. When you handed a master disc off to a duplication house, they'd check it with an Audible Magic system first. The goal was to positively identify every song on the disc, as well as the copyright/licensing status, before the company ran off 10,000 copies of your potentially pirated disc.

    IE. the system was designed to identify complete songs. It is a fair assumption that someone producing a pirated music compilation CD will include the entire song, including its first 30 seconds, so only checking the first 30 seconds is a perfectly sound strategy.

    It wasn't designed for the YouTube environment, and put bluntly YouTube only want to be seen to be doing something. It's in their interest to get as many views as possible, and music helps! Audible Magic is "industry standard", so they can say that they're doing everything that can be expected of them. Voilà, their backsides are covered and their faces well and truly saved, and they have a show of "good faith" to hold up in any future legal action.

    HAL.

    --
    Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
  42. dude by its74associates · · Score: 1

    you rock. anything to bring back the excitement that was once youtube. fuck wmg.