Slashdot Mirror


Audio Watermark Web Spider Starts Crawling

DippityDo writes "A new web tool is scanning the net for signs of copyright infringement. Digimarc's patented system searches video and audio files for special watermarks that would indicate they are not to be shared, then reports back to HQ with the results. It sounds kind of creepy, but has a long way to go before it makes a practical difference. 'For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system. Then, in the event that a user capped a broadcast and uploaded it online, the scanner system would eventually find it and report its location online. Yet the system is not designed to hop on P2P networks or private file sharing hubs, but instead crawls public web sites in search of watermarked material.'"

31 of 173 comments (clear)

  1. Scrubbing Watermarks? by Skewray · · Score: 4, Funny

    So if the watermarks are public, they can be identified and scrubbed before posting?

  2. "is" scanning, or "will be" scanning? by User+956 · · Score: 3, Insightful

    A new web tool is scanning the net for signs of copyright infringement ... 'For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system.

    So, basically, their web tool is scanning for things that don't yet exist. Bully!

    --
    The theory of relativity doesn't work right in Arkansas.
  3. Ahem! by Stanistani · · Score: 2, Insightful

    Time to examine how this works, and how to block it from your website.

    You are allowed to protect unwanted use and access of your copyrighted information, after all!

    1. Re:Ahem! by suv4x4 · · Score: 2, Interesting

      Time to examine how this works, and how to block it from your website. You are allowed to protect unwanted use and access of your copyrighted information, after all!

      Don't be a hypocrite. It'll do nothing to your "copyrighted information" put match it against a set of hashes and discard it if it doesn't match. If it matches, an operator would look for signs of illegal activity.

      In other words, nothing that the industry isn't doing right now, but now more automated.

      Noone likes RIAA suing grandmas and 10 yo girls, or terrible DRM schemes and so on. Doesn't mean you gotta get silly and react "by default" on any technology designed to help protect industry's intellectual rights.

      ---

      I'm only concerned with those crawlers going mad and sucking the bandwidth out of a site which hosts plenty of media files. Or dumbly downloading everything (zips, executables) and you having to foot the bill for the spent traffic in the end.

      Google's Mozilla-based bot was found doing such damage on some sites (crawling at incredible speed, bringing the sites down with it), which I suppose were a number of isolated incidents since this bot is still being worked on.

      Still, Google wouldn't download large binary files it can't understand, and this is likely to do so, and match everything against the "watermark", otherwise it'd be too simple to fool it. I just hope they implement it properly, if even because they'll have to pay for this bandwidth as well (aggregated).

    2. Re:Ahem! by Stanistani · · Score: 2, Interesting

      I find your whole post interesting, and a cogent reply.

      I especially like:
      >I'm only concerned with those crawlers going mad and sucking the bandwidth out of a site which hosts plenty of media files. Or dumbly downloading everything (zips, executables) and you having to foot the bill for the spent traffic in the end.

      That's a concern of mine, too.

      I wanted in my post to get people thinking about the contradiction between how well protected industry's intellectual properties are protected as opposed to ours.

      If I had substituted 'bandwidth' for 'copyrighted information' I suspect the results would have been better.

  4. *cough* robots.txt *cough* by $RANDOMLUSER · · Score: 2, Insightful
    --
    No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    1. Re:*cough* robots.txt *cough* by TCM · · Score: 4, Interesting

      Don't forget to blacklist a client as soon as it violates the robots.txt.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    2. Re:*cough* robots.txt *cough* by TexasDex · · Score: 2, Informative

      Heck, even if they do masquerade the bot as a valid browser. Just make a present but essentially invisible link on your pages that bots will follow but humans won't be able to see. You can even call it something obvious like block_me.html if you want. Any client who follows that link is almost certainly a bot instead of a human, and therefore should be automatically blocked. There is no easy way for this kind of bot to defend against this strategy, without totally losing their effectiveness. More obvious strategies include monitoring usage patterns for bot-like activity, although this is less reliable and possibly prone to false positives. Either way though, there are ways to tell a bot other than just it's USER-AGENT string.

      --
      The Cheese Stands Alone.
    3. Re:*cough* robots.txt *cough* by computersareevil · · Score: 2, Informative

      Yep. And this bot trap that I use does just that. Works like a champ.

  5. Corporate IP infringements by kabocox · · Score: 5, Insightful

    This isn't aimmed at the home use or small time crowd. It's ideal role is aimed at finding big name corporate offenders that have unlicensed PR crap on brochers, websites, or ads and making sure that the guy whose's content it is gets his cut. It's not worth it to go against small time folks. Think of professional photographers making sure their photos aren't run in mags or on the web without them getting their cut.

    1. Re:Corporate IP infringements by Jaqenn · · Score: 3, Informative

      Incidentally, I work next to a guy that this happened to. He's a amateur photographer, and a local PR firm grabbed some of his photos off the net and used it to promote some event. They even put his name in the credits, but never actually told him what they were doing. Through lucky coincidence he noticed what they did, and after some mild legal drama settled out of court with them for a few thousand dollars.

      --
      You are awash in a sea of fiercely stated opinions. Obvious exits are: 'File->Quit', 'Reply', and 'Page Down'.
    2. Re:Corporate IP infringements by I(rispee_I(reme · · Score: 2, Interesting

      Although the article says the spider is not crawling P2P nets, I can't help but wonder if Gnutella is exempt, as each Gnutella client is a specialized HTTP server, if memory serves... I've definitely had Gnutella clients in my google results, although it's mostly Shareaza users.

  6. Misdirection by j00r0m4nc3r · · Score: 2, Insightful

    For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system

    For all you know they have been doing this for the past 10 years.

  7. Web Spider? by Sneakernets · · Score: 5, Funny

    I have a Web Newspaper rolled up, waiting on it.

    --
    "No freeman shall ever be debarred the use of arms." -- Thomas Jefferson
  8. Re:So what by ZachPruckowski · · Score: 4, Insightful

    Blur the watermark and they are screwed.

    Assuming the watermarks are public or traceable. If all you're doing is identifying the fact that it's copyrighted, you could have a thousand different watermarks. Their location at any of half a dozen places in the audio stream would indicate infringement. That means that the pirate needs to search for any of 6000 possible spots for the watermark, and remove it. If the watermarks don't try to distinguish some copies of the work from other copies of the work, you can't use a simple diff to root them out.

  9. oh no! by matt328 · · Score: 3, Funny

    Because, yeah, I store all my potential copyright-infringing materials on my public web server.

    --
    Check out the cave on the east side of lake Hylia. Strange and wonderful things live in it.
  10. Re:So what by MyLongNickName · · Score: 4, Funny

    A better way. Put a bunch of legitimate sound clips out on the internet, but change it to have the watermark. Make sure your files get spread all over the place. A lot of false positives would render this useless.

    And on a more sick note, you could find the "I am browsing gay porn" wav file and modify it. Can you imagine the poor schmuck who has to go review each report to see if it is true?

    --
    See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
  11. Stupid idea by pclminion · · Score: 2, Interesting

    As this thing crawls the web, suppose it encounters a page on my web site that has links to 50,000 music files. Except they are actually all the same file, a legitimate file which is dynamically served up by the web server when the spider requests it. So there's no storage space issue on my end, but now the spider has to process 50,000 files. That's going to take a damn long time. Maybe I can bog it down so badly that it can't get any real work done.

  12. I hope it works! by 5pp000 · · Score: 5, Insightful

    Why does everyone here want this not to work? Seems to me this could be the alternative to DRM. It doesn't interfere with fair use at all; it only detects when copyrighted works are made widely available.

    If we want to dissuade the entertainment industry from using DRM, it seems incumbent upon us, as technologists, to propose alternatives that at least partially answer copyright owners' legitimate concerns. Seems to me this could be one of them.

    --
    Your god may be dead, but mine aren't!
    1. Re:I hope it works! by John+Hasler · · Score: 2, Informative

      > It doesn't interfere with fair use at all; it only detects when copyrighted
      > works are made widely available.

      You assume that there will be no false-positives. There will be many.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    2. Re:I hope it works! by Rosco+P.+Coltrane · · Score: 4, Insightful

      Why does everyone here want this not to work?

      Because my friend, the way the world is going, one of these days you'll have to consult a lawyer before taking a dump, just in case the toilet seat scans your ass print and reports unauthorized use.

      You see, the entire world is slowly being privatised. All of it, including obvious commons like the air we breathe and the water we drink, and innocuous things that everybody take for granted suddenly "belong" to someone, or aren't allowed to do because some "rightful owner" says so one day. You might wander, what does music or pictures have to do with it? Sure it doesn't, but it's just the trend. Watermarking music is fine, but what if some day some digital camera manufacturer decides that you can't shoot pictures of specially painted federal building because of some anti-terrorist law for example, and you happen to take a picture of your friend with the local FBI building in the background and post them on your website? Suddenly the camera goes "tsk tsk, can't do that pal...". Would you like that?

      It's the trend that's worrying. People making machines decide for you what you may or may not do. It might be a legitimate use now, but I can see plenty of cases where this kind of technology would simply curtail civil liberties.

      --
      "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
  13. Re:So what by DrLex · · Score: 2, Interesting

    This probably involves watermarks that are hidden in the masked part of the spectrum, i.e. in the same way MP3 and similar codecs work. You can't easily remove those without distorting the audio considerably, unless you would know exactly what kind of watermark it is and how to remove it. Of course you can just 'blur' the entire audio clip, but people aren't used to listening to "cassette-tape-that-has-been-lying-in-the-sun-for- too-long" kind of audio anymore.

  14. Questions by fluch · · Score: 2, Funny

    Does it respect robots.txt?
    Does it run on Linux?

  15. youtube isnt slashdot by poptones · · Score: 2, Informative

    Miss Information meet... Miss Information.

    Nowhere does it say youtube will be watermarking all content. For this to work that's the OPPOSITE of what needs to happen - but if all the content providers embrace some sort of standard watermark then it will be trivial for youtube to SCAN your "original" content and see whether or not it is ACTUALLY YOUR CONTENT. How will they know? Because YOUR content will either contain YOUR watermark or it will contain no watermark at all.

    And youtube allows you to "retract" anything you say anytime you want. You can make your content private if you like, restrict it to select "friends," or take it back completely.

    It is about copyright and who controls and distributes under that copyright, but youtube isn't slashdot. It isn't even itunes, where their business model is built around watermarking everything and charging for individual access to it.

    And for the other geniuseseses who think you can simply "blur it out," RTFA on digimarc. Duh, if it were so simple to "blur it out" then it would be pretty damn useless, now wouldn't it? Some websites have been watermarking their images for years now and contracting with companies who DO crawl p2p services and usenet looking for infringers, and while it aint 100% effective it has been pretty damn effective at stopping people from sharing their shit. This isn't a watermark like on paper, it's a DIGITAL watermark - it's "visible" (or audible) but only in the sense it adds noise to the picture or sound and degrades its quality; you can "blur" it but that won't completely obliterate the embedded information as it is essentially an encrypted piece of copyright information steganographically embedded into the media.

    I hate the way this stuff degrades the quality, but most dfon't even notice it. I know this because I've worked with some of these sites and I seemed to be one of the very few who ever had any complaint about it. I've shared marked and unmarked content hundreds of times and very few people seem able to tell the difference... so, without knowing what to look for in the file source, how will you even know what content to "blur" and what not to blur?

    if this were adopted widely, it seems the biggest problem would be - ironically - with "original" content composed from fairly used bits and pieces of other works. If you just rip and post a part of a movie or tv show you're going to be pissing off only one content creator - but what if you make an original montage from ten different pieces of protected media? The watermarks would all still be there, you'd potentially be getting takedown notices and/or lawsuit papers from ten different content owners.

    The technology is useful. But what's really needed (still) is meaningful regulation of terms and fair public use policy enforcement.

  16. Re: How Digimarc's Technology Works by Anonymous Coward · · Score: 2, Informative

    All that Digimarc does (for any media including stills, music and video) is introduce "noise" into the bit stream. This noise has to be at a level or interval that it is not perceptable by humans.

    They simply introduce a bit pattern or, more often, a delta pattern (change in bits by some delta) which is less detectable. This pattern usually contains a recognition pattern and some encrypted data.

    Certain bit patterns can be used in pictures and video so that as long as you capture the video out put at nearly any viewable scale you can recover this signature. This includes video taping a TV or monitor playing a Digimarc protected image etc. This is how they can figure out who leaked early copies of major movies to the black market even once the movie has been copied to various media a number of times.

    Anyhow what you do to beat Digimarc's technology is to introduce "noise" over their "noise" in such away as to render theirs useless. One of the simplest ways to attempt this is to downgrade the quality. Still depending on the pattern used they may be able to detect it.

    Another thing to remember is that their spider is limited by latency. Therefore they cannot commit a lot of time to the analysis of all files. Therefore I would have to imagine one wouldn't have to worry about using a heavy duty algorithm to erase the signature.

    I think enough people on here are smart enough that they will be able to google for Digimarc's pattens and old articles to get a pretty good idea of what they do and then obfuscate their own signature. You don't need to worry about cracking encryption or anything that hard to get around their scheme. It's not a particularly strong approach.

  17. Re:So what by Ash+Vince · · Score: 3, Funny

    Nice theory, but in reality the watermark will be copyrighted so they will sue you for copyright infringement anyway. :)

    --
    I dont read /. to RTFA, I read /. to offend people in ignorance.
  18. Re:So what by cheater512 · · Score: 2, Insightful

    Whoa. Deja Vu. Didnt they say that about HD-DVD and Vista's new security?

  19. Re:So what by McFadden · · Score: 5, Funny

    the pirate needs to search for any of 6000 possible spots for the watermark, and remove it.
    I'm trying to think of a nifty device that would be able to search 6,000 possible spots in a file to look for a watermark, but the name escapes me just at the moment...

    No wait... I think I've got it... Isn't it called a "computer"?
  20. Re:So what by PopeRatzo · · Score: 4, Interesting

    The sad thing about this episode is that digital watermarks could be a wonderful tool, used by artists and their customers to guarantee a given work's authorship. Instead, it's used to punish the very people who make it possible for the artists to survive: their listeners.

    I work in an academic environment, and I can't think of a single person in my life who has not violated a copyright or user agreement. If your job is to teach, it's almost inevitable. If you're an enthusiast or fan of a particular artist, it becomes a statistical certainty that you've broken the "law" regarding intellectual property.

    I contacted Digimarc once because I wanted to find out about ways to add an identifying mark to a digital file that would let a user know that the file was the authentic work of a particular artist. Not to prevent copying, mind you, because the files in question were meant to be shared. I just wanted the users to be able to know with some certainty that what they were hearing was actually produced by who they expected.

    The reply I got from Digimarc (I still have the email) was that they weren't interested in such uses of their product, and anyway "it's priced out of reach of the individual artist or production company". Real sweethearts.

    In the last few days there have been lots of stories about people and corporations who make their money off the backs off creative folks. There are those who provide a real service (like the guy who delivers pizza to the recording studio, or the woman who fixes my digital mixing console) and there are those who live to suck the life out of what should be a source of joy for both the artist and the user. Like I've said before, parasites need to live, too. But what really galls me is when they act like they're really doing something of value to anyone but themselves and their accountants.

    Seriously, to paraphrase Jesus or Steve Albini (it's one of those religious dudes, I forget which): "It's easier to drive a Range Rover through the butthole of a camel than for a label executive or booking agent to enter the kingdom of heaven."

    --
    You are welcome on my lawn.
  21. Re:So what by Lumpy · · Score: 3, Insightful

    no you don't. simply find a way to obscure the watermark and place it everywhere. Digimarc's watermarking for Images can be thwarted incredibly easy. simply bi cubic resize the image down slightly smaller AFTER you rotate it 1 -5 degrees. Poof their watermark is no longer detectable as it has been munged hard all over the image.

    I guarantee their audio and video watermark will be as easy to defeat, Digimarc is as innovative in technology as Macrovision.

    And yes, that is a slam on them.

    --
    Do not look at laser with remaining good eye.
  22. Re:So what by ZachPruckowski · · Score: 2, Insightful

    The problem isn't checking each spot for any of a given set of watermarks, it's identifying all the watermarks and all the spots they could be. You need to do a lot of work to build that database. You'd need tens of thousands of music files to even get started.