Audio Watermark Web Spider Starts Crawling
DippityDo writes "A new web tool is scanning the net for signs of copyright infringement. Digimarc's patented system searches video and audio files for special watermarks that would indicate they are not to be shared, then reports back to HQ with the results. It sounds kind of creepy, but has a long way to go before it makes a practical difference. 'For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system. Then, in the event that a user capped a broadcast and uploaded it online, the scanner system would eventually find it and report its location online. Yet the system is not designed to hop on P2P networks or private file sharing hubs, but instead crawls public web sites in search of watermarked material.'"
Blur the watermark and they are screwed.
~ All comments automatically moderated -1 since 2004 ~
So if the watermarks are public, they can be identified and scrubbed before posting?
A new web tool is scanning the net for signs of copyright infringement ... 'For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system.
So, basically, their web tool is scanning for things that don't yet exist. Bully!
The theory of relativity doesn't work right in Arkansas.
Time to examine how this works, and how to block it from your website.
You are allowed to protect unwanted use and access of your copyrighted information, after all!
You can't talk about Wikipedia's flaws on Wikipedia
Ahem
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
This isn't aimmed at the home use or small time crowd. It's ideal role is aimed at finding big name corporate offenders that have unlicensed PR crap on brochers, websites, or ads and making sure that the guy whose's content it is gets his cut. It's not worth it to go against small time folks. Think of professional photographers making sure their photos aren't run in mags or on the web without them getting their cut.
For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system
For all you know they have been doing this for the past 10 years.
I have a Web Newspaper rolled up, waiting on it.
"No freeman shall ever be debarred the use of arms." -- Thomas Jefferson
So they're engaging in mass downloading and scanning?
Somebody explain to me how a massive, netwide wget doesn't constitute copyright infringement.
Oh great. We all know just how ths is going to work. Content will be guilty until proven innocent and any system that relies on the vigilence of its owners will run amuck when they don't. With this kind of tripe coming out, why don't we just turn off the net and go back to tin cans and a string.
I've already blocked 198.70.x.x (their website IP) at the router, but I doubt they're running this scanner from there.
"Powers. I have them."
Posting opn slashdot is not the best way to protect your copyrighted message.
/. and you are Owned..
see the bottom of this page:
Comments are owned by the Poster. The Rest © 1997-2007 OSTG.
YOu own your comment, but you cannot control, delete, edit , retract it. SO post your copyrighted message on
Same goes for yourtupe and digimark: yourtub can do with YOUR content as soon as you posted it there and digimarc will control the distribution of your content once you go on and go with an agreement with them.
Moral of the story: it is not about copyright, but who controls/distributes the copyright.
Because, yeah, I store all my potential copyright-infringing materials on my public web server.
Check out the cave on the east side of lake Hylia. Strange and wonderful things live in it.
I would not be surprised if they do not make use of google, yahoo, and MSN to find these.
I prefer the "u" in honour as it seems to be missing these days.
As this thing crawls the web, suppose it encounters a page on my web site that has links to 50,000 music files. Except they are actually all the same file, a legitimate file which is dynamically served up by the web server when the spider requests it. So there's no storage space issue on my end, but now the spider has to process 50,000 files. That's going to take a damn long time. Maybe I can bog it down so badly that it can't get any real work done.
H1: This site may not be accessed by any person or computer program affiliated with the RIAA or any of its affiliates. By accessing this site as a member of that group you agree to hold this site and its contributors in indemnity for all offenses civil or criminal, and to release to the public domain all copyrights held by you and your employer.
Why does everyone here want this not to work? Seems to me this could be the alternative to DRM. It doesn't interfere with fair use at all; it only detects when copyrighted works are made widely available.
If we want to dissuade the entertainment industry from using DRM, it seems incumbent upon us, as technologists, to propose alternatives that at least partially answer copyright owners' legitimate concerns. Seems to me this could be one of them.
Your god may be dead, but mine aren't!
All anyone has to do is find the watermark for, say, the movie adaptation for 1984, and add it in early in the movie/CD.
All that would be needed to make this scheme less useful would be for some bright person to apply the audio watermark to a whole bunch of files that aren't copyrighted. Then, when the spider finds one of these bogus files, a real person has to determine that it isn't copyrighted material.
Of course, the way the **AA has acted in the past, it wouldn't surprise anyone that automated [threat] letters would be sent out. This would leave someone open to counterclaim.
âoeAny society that would give up a little liberty to gain a little security will deserve neither and lose both.
... if the xontext was actually correct, which lately it has been blown out of proportion and distorted by both sides of the piracy debate.
If you believe in privacy, and believe you have "nothing to hide" at the same time, you're a goddammed idiot
Does it respect robots.txt?
Does it run on Linux?
... another bot that will eat away the paid bandwidth of my site. Many people have a limited upload quota for their site. Of course they won't put dozens of media files online, but suppose this bot crawls at a quite high rate, a few audio files can quickly gobble up a lot of this quota. Most likely it won't obey robots.txt, so I hope it can be blocked by other means.
How does someone with a media heavy site block this thing, or get on a "white list" ?
How often is it going to come around ?
Who pays for the added load caused by this thing when it doesn't find anything wrong ?
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
It'll only be a matter of time before the watermarking scheme is figured out, after which time they will have enabled the masses to deploy their own spidering software, effectively making piracy easier. I, for one, can't wait for big companies to sign up for something like this. Online piracy will always find a way.
Lossless media files are still too big for most people to swap them across the Internet, so exact watermarks protect only a tiny fraction of the files that copyright owners care about.
MP3 is lossy, and there's lots of different MP3 data that sounds close enough to the original MP3 that a song can be transcoded to new data that sounds "the same" within the tolerance for noise that all MP3 listening demands.
Won't this watermark scheme just fall victim to the first revision of an MP3 (or MP4 video, etc) encoder that includes a "scramble" option?
--
make install -not war
10 Most mp3s are not hosted on websites.
20 The ones that are are usually on vinyl-lovin' music blogs that post semi-obscure music from the past.
30 Watermarking is only going to work on new music - how can you watermark something already released?
40 New music bites: GOTO 10
The point of this kind of watermarking is that you can trace the copy that made it on to the net back to some original copy that you gave out. So, this is great in situations where you are able to uniquely watermark a copy of something and then give it to a specific person who then you can hold responsible for keeping it secret. It is also useful if you are going to put something like photography portfolio online. You can use this to track down people who have snagged copies from your website or whatever.
It doesn't help so much combating movie or music piracy because the legitimate copies aren't uniquely watermarked for each user you give it to. So, you find the watermark in some hollywood movie online? So what? You already knew the movie was pirated to begin with. The only way this is helpful is if you are tracing it back to a specific watermarked copy like the ones they give out for voting on the academy awards. (So, for example, you know that Robert Dinero's copy of "Seabiscuit" ended up on P2P and you can hold Mr. Dinero accountable.)
But it doesn't stop anyone from buying a CD and then ripping it because the watermark doesn't implicate the pirate.
The one thing it could potentially do, though, is differentiate between a file called "Usher.mp3" that was put onto a university web site by Professor Usher for his class vs. music by the musician called Usher. So, in theory ubiquitous use of this could help prevent false positives, but the last time I checked the RIAA/MPAA doesn't give a shit about sending false take down notices.
Avoid Missing Ball for High Score
Miss Information meet... Miss Information.
Nowhere does it say youtube will be watermarking all content. For this to work that's the OPPOSITE of what needs to happen - but if all the content providers embrace some sort of standard watermark then it will be trivial for youtube to SCAN your "original" content and see whether or not it is ACTUALLY YOUR CONTENT. How will they know? Because YOUR content will either contain YOUR watermark or it will contain no watermark at all.
And youtube allows you to "retract" anything you say anytime you want. You can make your content private if you like, restrict it to select "friends," or take it back completely.
It is about copyright and who controls and distributes under that copyright, but youtube isn't slashdot. It isn't even itunes, where their business model is built around watermarking everything and charging for individual access to it.
And for the other geniuseseses who think you can simply "blur it out," RTFA on digimarc. Duh, if it were so simple to "blur it out" then it would be pretty damn useless, now wouldn't it? Some websites have been watermarking their images for years now and contracting with companies who DO crawl p2p services and usenet looking for infringers, and while it aint 100% effective it has been pretty damn effective at stopping people from sharing their shit. This isn't a watermark like on paper, it's a DIGITAL watermark - it's "visible" (or audible) but only in the sense it adds noise to the picture or sound and degrades its quality; you can "blur" it but that won't completely obliterate the embedded information as it is essentially an encrypted piece of copyright information steganographically embedded into the media.
I hate the way this stuff degrades the quality, but most dfon't even notice it. I know this because I've worked with some of these sites and I seemed to be one of the very few who ever had any complaint about it. I've shared marked and unmarked content hundreds of times and very few people seem able to tell the difference... so, without knowing what to look for in the file source, how will you even know what content to "blur" and what not to blur?
if this were adopted widely, it seems the biggest problem would be - ironically - with "original" content composed from fairly used bits and pieces of other works. If you just rip and post a part of a movie or tv show you're going to be pissing off only one content creator - but what if you make an original montage from ten different pieces of protected media? The watermarks would all still be there, you'd potentially be getting takedown notices and/or lawsuit papers from ten different content owners.
The technology is useful. But what's really needed (still) is meaningful regulation of terms and fair public use policy enforcement.
Id like to see how you can possibly detect a watermark that has gone through lossy compression.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Does not our fair use rights allow us to post part of a broad cast? IANAL but I think they do. So what if I post a small portion of a video or sound file allowed under fair use and it happens to contain the watermark?
The race isn't always to the swift... but that's the way to bet!
All that Digimarc does (for any media including stills, music and video) is introduce "noise" into the bit stream. This noise has to be at a level or interval that it is not perceptable by humans.
They simply introduce a bit pattern or, more often, a delta pattern (change in bits by some delta) which is less detectable. This pattern usually contains a recognition pattern and some encrypted data.
Certain bit patterns can be used in pictures and video so that as long as you capture the video out put at nearly any viewable scale you can recover this signature. This includes video taping a TV or monitor playing a Digimarc protected image etc. This is how they can figure out who leaked early copies of major movies to the black market even once the movie has been copied to various media a number of times.
Anyhow what you do to beat Digimarc's technology is to introduce "noise" over their "noise" in such away as to render theirs useless. One of the simplest ways to attempt this is to downgrade the quality. Still depending on the pattern used they may be able to detect it.
Another thing to remember is that their spider is limited by latency. Therefore they cannot commit a lot of time to the analysis of all files. Therefore I would have to imagine one wouldn't have to worry about using a heavy duty algorithm to erase the signature.
I think enough people on here are smart enough that they will be able to google for Digimarc's pattens and old articles to get a pretty good idea of what they do and then obfuscate their own signature. You don't need to worry about cracking encryption or anything that hard to get around their scheme. It's not a particularly strong approach.
ROBOTS.TXT
DIGIMARC = NO
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
You got lucky: the watermark sounds exactly like someone singing the words "bring your daughter to the slaughter", so you're covered.
Unfortunately, those lyrics will be added to every new commercial song that this system protects. Kenny G initially had some concerns, but he's on board with the plan now.
HIV Crosses Species Barrier... into Muppets
The obvious way is to try to work out which IP addresses the
crawler is operating from, and use the business end of a firewall
to block it. (Given past events, I don't think a robots.txt is likely
to work.)
A less-obvious way is to discover what the watermark is and
slap it onto a few...hundred million files that have nothing to
do with what it's looking for. Those files don't need to have
actual audio content -- as long as they meet the criteria that
the spider is looking for. So perhaps a bit of Perl, a few calls
to rand() and some well-chosen filenames might be enough.
I like watermarks. Watermarks allow copyright holders to essentially put a digital "Copyright (C) 2007 Joe Smith" onto their documents. This makes it possible to track who committed a copyright offense without stripping legitimate users of their rights. Copyright holders can prosecute infringers without having to guess that the file is copyrighted by looking at the file name or something dumb that has high false positives. (No more suing grandmas.) They can also find the original mass-distributing pirate and take them down. So the average person can have their fair use rights back, and the copyright holders can stop the major infringers. That's a win-win for everyone.
I see lots of knee-jerk reactions like "oh, I'll just put watermarks in everything to fool them" or "time to modify robots.txt!" which aren't warranted. First, if watermarks are done properly, they are cryptographic signatures and you can't put them on other things. That's good for you, because nobody can put their watermark on your files. And nobody can put your watermark on their files. Wannabe-pirates can't claim "Somebody forged that watermark to look like I distributed the file."
I would happily download a watermarked movie from BitTorrent. It means I can modify it, format-shift it, loan it to my brother, etc. But if I some one puts it up on BitTorrent then the copyright holder can track down who did it and sue them. I have no problem with the RIAA/MPAA/anyone else going after legitimate copyright offendors. Isn't that what we want?
http://ask.slashdot.org/comments.pl?sid=6823&cid=8 86346
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
What makes you think he wasn't actually browsing for gay porn?
If memory serves correctly it sounds something like "WHOOOOOAAAAAAAAA". I may be off on the number of A's though.
get the specific water marks and start putting them on every thing and anything.... Oh wait, that's copyright infringment....
Damn can't have any more fun...
DRM 'manages access' in the same way that a prison 'manages freedom'
Digimarc was great- I loved them. It was hillarious to see images marked and then 'remarked' by hacking the program to re-watermark the image. The original mark wasn't recoverable.
http://www.woodmann.com/fravia/frogdigi.htm
Food for thought.
Why does everyone here want this not to work?
I run a website with more than 6gb of photos and video that I have created from scratch. My hosting provider is generous, but there is a finite limit to my bandwidth. Like the TurnItIn bot, this digimarc bot will be an uninvited pest that repeatedly spiders the site downloading all my content to sniff it.
Since these visits are of no benefit to me, I'll block it by user-agent in htaccess and IP address once people figure out where this beast lives. Obviously, robots.txt depends on client-side cooperation, which this thing likely won't obey if it intends to be as promiscuous as possible.
Seth
$5 / month hosted VPS on linux = awesome!
As has been pointed out the "watermark method" must be doomed to fail.
The article implies that the whole media file has to be downloaded and if that's the case there's a much better way of doing this. There are algorithms out there that can efficiently calculate a "signature" for the content of the file (for images, the "best" such algorithm is probably the SIFT algorithm, and there exist algorithms for other media files). These "signatures" are usually so called "multidimensional descriptors" that have the characteristic to be invariant to changes in the media (such as compression in images/audio/video or stretching, cropping and various other manipulations of image files etc).
The major obstacle so far has been how to perform an efficient search on a large collection of such signatures. That really isn't an obstacle any more.
My masters degree thesis (which I have just started working on) actually deals with further improvements to a new indexing architecture that has been developed at my school to deal with just such multidimensional descriptors. The improvement from previous ways of multidimensional descriptor indexing is huge. As an example, our test database contains descriptors for over 300,000 images and searching for a similar image took 2 hours with previous techniques. Using the new index type it takes 2 seconds, and the best part is that the query speed is not dependent on the size of the database.
You don't think enough... therefore you better not be!
How does the crawler plan on identifying multimedia files from any other binary files on the site? Assuming (like most spiders) it will follow the links on the website, which ones will it single out?
If it goes by extension (which it almost surely will, and even if it doesn't it's somewhat irrelevant), it can't possibly support every audio format. So, once you figure out what formats it can't interpret, just rip your music into that! OGG/Vorbis anyone?
Better still, just change the file extensions of all your music files, and tell people to download them and rename to .mp3 or what have you.
Isn't it possible for individuals who publish audio and are sick of this whole debate to copyright their own works and write a license to use that states the user has a right to listen to the content, but not to use it for "automatic" analysis, etc. In this case, would not the organization looking for copyright infringement be infringing on the copyright? Assuming the owner of the site is the legal holder of the copyright.
In this case, the owner of this "original" work could sue the company performing these checks for copyright infringement since they do not have a right to analyze the work. The only way they would be exempted from this restriction would be if the work was actually not an "original" work which they wouldn't know until they accessed it.
What I'm trying to say in a round about way is that history has shown that the entertainment industry sues first and thinks later. How many harassment suits have been filed under the DMCA law? How many times has someone been harassed for simply being connected to a P2P network?
I think we can expect that anyone found with any material on their web sites that contain a watermark will be treated as if they are guilty.
Sure, if you have the money you can defend yourself.
The race isn't always to the swift... but that's the way to bet!