Audio Watermark Web Spider Starts Crawling
DippityDo writes "A new web tool is scanning the net for signs of copyright infringement. Digimarc's patented system searches video and audio files for special watermarks that would indicate they are not to be shared, then reports back to HQ with the results. It sounds kind of creepy, but has a long way to go before it makes a practical difference. 'For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system. Then, in the event that a user capped a broadcast and uploaded it online, the scanner system would eventually find it and report its location online. Yet the system is not designed to hop on P2P networks or private file sharing hubs, but instead crawls public web sites in search of watermarked material.'"
Blur the watermark and they are screwed.
~ All comments automatically moderated -1 since 2004 ~
So if the watermarks are public, they can be identified and scrubbed before posting?
A new web tool is scanning the net for signs of copyright infringement ... 'For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system.
So, basically, their web tool is scanning for things that don't yet exist. Bully!
The theory of relativity doesn't work right in Arkansas.
Time to examine how this works, and how to block it from your website.
You are allowed to protect unwanted use and access of your copyrighted information, after all!
You can't talk about Wikipedia's flaws on Wikipedia
Ahem
No folly is more costly than the folly of intolerant idealism. - Winston Churchill
This isn't aimmed at the home use or small time crowd. It's ideal role is aimed at finding big name corporate offenders that have unlicensed PR crap on brochers, websites, or ads and making sure that the guy whose's content it is gets his cut. It's not worth it to go against small time folks. Think of professional photographers making sure their photos aren't run in mags or on the web without them getting their cut.
For the system to work, players at multiple levels would need to get involved. Broadcasters would need to add identifying watermarks to their broadcast, in cooperation with copyright holders, and both parties would need to register their watermarks with the system
For all you know they have been doing this for the past 10 years.
I have a Web Newspaper rolled up, waiting on it.
"No freeman shall ever be debarred the use of arms." -- Thomas Jefferson
So they're engaging in mass downloading and scanning?
Somebody explain to me how a massive, netwide wget doesn't constitute copyright infringement.
Oh great. We all know just how ths is going to work. Content will be guilty until proven innocent and any system that relies on the vigilence of its owners will run amuck when they don't. With this kind of tripe coming out, why don't we just turn off the net and go back to tin cans and a string.
I've already blocked 198.70.x.x (their website IP) at the router, but I doubt they're running this scanner from there.
"Powers. I have them."
Posting opn slashdot is not the best way to protect your copyrighted message.
/. and you are Owned..
see the bottom of this page:
Comments are owned by the Poster. The Rest © 1997-2007 OSTG.
YOu own your comment, but you cannot control, delete, edit , retract it. SO post your copyrighted message on
Same goes for yourtupe and digimark: yourtub can do with YOUR content as soon as you posted it there and digimarc will control the distribution of your content once you go on and go with an agreement with them.
Moral of the story: it is not about copyright, but who controls/distributes the copyright.
This system also requires that pirates would have to register with InstaTrace before uploading warez.
Because, yeah, I store all my potential copyright-infringing materials on my public web server.
Check out the cave on the east side of lake Hylia. Strange and wonderful things live in it.
I would not be surprised if they do not make use of google, yahoo, and MSN to find these.
I prefer the "u" in honour as it seems to be missing these days.
As this thing crawls the web, suppose it encounters a page on my web site that has links to 50,000 music files. Except they are actually all the same file, a legitimate file which is dynamically served up by the web server when the spider requests it. So there's no storage space issue on my end, but now the spider has to process 50,000 files. That's going to take a damn long time. Maybe I can bog it down so badly that it can't get any real work done.
All that is needed is a way to reverse engineer their software to see what exactly it is looking for. From there, you can modify that out of the content, as one poster had previously said.
H1: This site may not be accessed by any person or computer program affiliated with the RIAA or any of its affiliates. By accessing this site as a member of that group you agree to hold this site and its contributors in indemnity for all offenses civil or criminal, and to release to the public domain all copyrights held by you and your employer.
Simple solution: keep your audio files dry, no watermarks!
Sheesh, corporations these days...
Next.
Why does everyone here want this not to work? Seems to me this could be the alternative to DRM. It doesn't interfere with fair use at all; it only detects when copyrighted works are made widely available.
If we want to dissuade the entertainment industry from using DRM, it seems incumbent upon us, as technologists, to propose alternatives that at least partially answer copyright owners' legitimate concerns. Seems to me this could be one of them.
Your god may be dead, but mine aren't!
You mean the judge wouldn't have bought the "information wants to be free" argument?
All anyone has to do is find the watermark for, say, the movie adaptation for 1984, and add it in early in the movie/CD.
All that would be needed to make this scheme less useful would be for some bright person to apply the audio watermark to a whole bunch of files that aren't copyrighted. Then, when the spider finds one of these bogus files, a real person has to determine that it isn't copyrighted material.
Of course, the way the **AA has acted in the past, it wouldn't surprise anyone that automated [threat] letters would be sent out. This would leave someone open to counterclaim.
âoeAny society that would give up a little liberty to gain a little security will deserve neither and lose both.
Does it respect robots.txt?
Does it run on Linux?
Yeah! Just like I can "scrub" FOSS and get away with it. Good thing there's no big organization trying to stop me.
... another bot that will eat away the paid bandwidth of my site. Many people have a limited upload quota for their site. Of course they won't put dozens of media files online, but suppose this bot crawls at a quite high rate, a few audio files can quickly gobble up a lot of this quota. Most likely it won't obey robots.txt, so I hope it can be blocked by other means.
How does someone with a media heavy site block this thing, or get on a "white list" ?
How often is it going to come around ?
Who pays for the added load caused by this thing when it doesn't find anything wrong ?
Wanna fight ? Bend over, stick your head up your ass, and fight for air.
My wife and I recently covered a heavy metal song (Bring your Daughter, Iron Maiden) and posted the thing on the net. We forgot to add the watermark from the original. Does anyone know what the watermark sounds like?
It'll only be a matter of time before the watermarking scheme is figured out, after which time they will have enabled the masses to deploy their own spidering software, effectively making piracy easier. I, for one, can't wait for big companies to sign up for something like this. Online piracy will always find a way.
Since they started draggin unborn children and grannies 99+yo (who never owned a computer) to court MAFIAA lost it's credibility altogether.
No respect.
Let's write a bot that will find upload spots online and upload anything there these cocksuckers deem worth protecting!
Lossless media files are still too big for most people to swap them across the Internet, so exact watermarks protect only a tiny fraction of the files that copyright owners care about.
MP3 is lossy, and there's lots of different MP3 data that sounds close enough to the original MP3 that a song can be transcoded to new data that sounds "the same" within the tolerance for noise that all MP3 listening demands.
Won't this watermark scheme just fall victim to the first revision of an MP3 (or MP4 video, etc) encoder that includes a "scramble" option?
--
make install -not war
10 Most mp3s are not hosted on websites.
20 The ones that are are usually on vinyl-lovin' music blogs that post semi-obscure music from the past.
30 Watermarking is only going to work on new music - how can you watermark something already released?
40 New music bites: GOTO 10
The point of this kind of watermarking is that you can trace the copy that made it on to the net back to some original copy that you gave out. So, this is great in situations where you are able to uniquely watermark a copy of something and then give it to a specific person who then you can hold responsible for keeping it secret. It is also useful if you are going to put something like photography portfolio online. You can use this to track down people who have snagged copies from your website or whatever.
It doesn't help so much combating movie or music piracy because the legitimate copies aren't uniquely watermarked for each user you give it to. So, you find the watermark in some hollywood movie online? So what? You already knew the movie was pirated to begin with. The only way this is helpful is if you are tracing it back to a specific watermarked copy like the ones they give out for voting on the academy awards. (So, for example, you know that Robert Dinero's copy of "Seabiscuit" ended up on P2P and you can hold Mr. Dinero accountable.)
But it doesn't stop anyone from buying a CD and then ripping it because the watermark doesn't implicate the pirate.
The one thing it could potentially do, though, is differentiate between a file called "Usher.mp3" that was put onto a university web site by Professor Usher for his class vs. music by the musician called Usher. So, in theory ubiquitous use of this could help prevent false positives, but the last time I checked the RIAA/MPAA doesn't give a shit about sending false take down notices.
Avoid Missing Ball for High Score
Miss Information meet... Miss Information.
Nowhere does it say youtube will be watermarking all content. For this to work that's the OPPOSITE of what needs to happen - but if all the content providers embrace some sort of standard watermark then it will be trivial for youtube to SCAN your "original" content and see whether or not it is ACTUALLY YOUR CONTENT. How will they know? Because YOUR content will either contain YOUR watermark or it will contain no watermark at all.
And youtube allows you to "retract" anything you say anytime you want. You can make your content private if you like, restrict it to select "friends," or take it back completely.
It is about copyright and who controls and distributes under that copyright, but youtube isn't slashdot. It isn't even itunes, where their business model is built around watermarking everything and charging for individual access to it.
And for the other geniuseseses who think you can simply "blur it out," RTFA on digimarc. Duh, if it were so simple to "blur it out" then it would be pretty damn useless, now wouldn't it? Some websites have been watermarking their images for years now and contracting with companies who DO crawl p2p services and usenet looking for infringers, and while it aint 100% effective it has been pretty damn effective at stopping people from sharing their shit. This isn't a watermark like on paper, it's a DIGITAL watermark - it's "visible" (or audible) but only in the sense it adds noise to the picture or sound and degrades its quality; you can "blur" it but that won't completely obliterate the embedded information as it is essentially an encrypted piece of copyright information steganographically embedded into the media.
I hate the way this stuff degrades the quality, but most dfon't even notice it. I know this because I've worked with some of these sites and I seemed to be one of the very few who ever had any complaint about it. I've shared marked and unmarked content hundreds of times and very few people seem able to tell the difference... so, without knowing what to look for in the file source, how will you even know what content to "blur" and what not to blur?
if this were adopted widely, it seems the biggest problem would be - ironically - with "original" content composed from fairly used bits and pieces of other works. If you just rip and post a part of a movie or tv show you're going to be pissing off only one content creator - but what if you make an original montage from ten different pieces of protected media? The watermarks would all still be there, you'd potentially be getting takedown notices and/or lawsuit papers from ten different content owners.
The technology is useful. But what's really needed (still) is meaningful regulation of terms and fair public use policy enforcement.
Id like to see how you can possibly detect a watermark that has gone through lossy compression.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
Does not our fair use rights allow us to post part of a broad cast? IANAL but I think they do. So what if I post a small portion of a video or sound file allowed under fair use and it happens to contain the watermark?
The race isn't always to the swift... but that's the way to bet!
All that Digimarc does (for any media including stills, music and video) is introduce "noise" into the bit stream. This noise has to be at a level or interval that it is not perceptable by humans.
They simply introduce a bit pattern or, more often, a delta pattern (change in bits by some delta) which is less detectable. This pattern usually contains a recognition pattern and some encrypted data.
Certain bit patterns can be used in pictures and video so that as long as you capture the video out put at nearly any viewable scale you can recover this signature. This includes video taping a TV or monitor playing a Digimarc protected image etc. This is how they can figure out who leaked early copies of major movies to the black market even once the movie has been copied to various media a number of times.
Anyhow what you do to beat Digimarc's technology is to introduce "noise" over their "noise" in such away as to render theirs useless. One of the simplest ways to attempt this is to downgrade the quality. Still depending on the pattern used they may be able to detect it.
Another thing to remember is that their spider is limited by latency. Therefore they cannot commit a lot of time to the analysis of all files. Therefore I would have to imagine one wouldn't have to worry about using a heavy duty algorithm to erase the signature.
I think enough people on here are smart enough that they will be able to google for Digimarc's pattens and old articles to get a pretty good idea of what they do and then obfuscate their own signature. You don't need to worry about cracking encryption or anything that hard to get around their scheme. It's not a particularly strong approach.
ROBOTS.TXT
DIGIMARC = NO
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
The obvious way is to try to work out which IP addresses the
crawler is operating from, and use the business end of a firewall
to block it. (Given past events, I don't think a robots.txt is likely
to work.)
A less-obvious way is to discover what the watermark is and
slap it onto a few...hundred million files that have nothing to
do with what it's looking for. Those files don't need to have
actual audio content -- as long as they meet the criteria that
the spider is looking for. So perhaps a bit of Perl, a few calls
to rand() and some well-chosen filenames might be enough.
I like watermarks. Watermarks allow copyright holders to essentially put a digital "Copyright (C) 2007 Joe Smith" onto their documents. This makes it possible to track who committed a copyright offense without stripping legitimate users of their rights. Copyright holders can prosecute infringers without having to guess that the file is copyrighted by looking at the file name or something dumb that has high false positives. (No more suing grandmas.) They can also find the original mass-distributing pirate and take them down. So the average person can have their fair use rights back, and the copyright holders can stop the major infringers. That's a win-win for everyone.
I see lots of knee-jerk reactions like "oh, I'll just put watermarks in everything to fool them" or "time to modify robots.txt!" which aren't warranted. First, if watermarks are done properly, they are cryptographic signatures and you can't put them on other things. That's good for you, because nobody can put their watermark on your files. And nobody can put your watermark on their files. Wannabe-pirates can't claim "Somebody forged that watermark to look like I distributed the file."
I would happily download a watermarked movie from BitTorrent. It means I can modify it, format-shift it, loan it to my brother, etc. But if I some one puts it up on BitTorrent then the copyright holder can track down who did it and sue them. I have no problem with the RIAA/MPAA/anyone else going after legitimate copyright offendors. Isn't that what we want?
What makes you think he wasn't actually browsing for gay porn?
As a long time advocate for torturing to death the children of the RIAA's lawyers, I think this is WONDERFUL!!! Thank God they are finally really looking at the song, instead of claiming that every MP3 with certain words in the title is pirated! Maybe one day I can get a job other than pursuing civil rights actions against the RIAA / MPAA / BSA, or giving away military style rifles to disgruntled grad students who will not graduate because the only current copy of their dissertation was taken down by a fraudulent DMCA notice.
Andy Out!
get the specific water marks and start putting them on every thing and anything.... Oh wait, that's copyright infringment....
Damn can't have any more fun...
DRM 'manages access' in the same way that a prison 'manages freedom'
With apologies to Ernest Lawrence Thayer
The outlook wasn't brilliant for the student march that night;
The quads were filled with rent-a-cops and not a picket sign in sight;
With Cooney busted for possestion, and Barrows, the riot laws;
A sickly silence fell upon the supporters of The Cause.
A straggling few got up to go, in deep despair. The rest
Clung to that hope which "springs eternal in the human breast;"
They thought, If only Gay Doc Ruby could be rallying that mob,
We'd put up even money now, with Doc Ruby at the quads.
But Flynn preceded Doc Ruby, as did also Jimmy Blake,
And the former was a no-good and the latter was a fake;
Forlorn, that stricken multitude discouraged by the odds,
For there seemed but little chance of Doc Ruby's getting to the quads.
But Flynn let fly a bottle, to the wonderment of all,
And Blake, the much despised, set a bomb off in the hall,
And when the dust had lifted and men saw what had occurred,
Jimmy beaned the Dean of Students, while the bombed out library burned.
Then from five thousand throats and more there rose a lusty yell,
It rumbled through the valley, it rattled in the dell,
A Harley roared up from the street, and was tearing up the sod,
And Doc Ruby, Gay Doc Ruby, was advancing through the quads.
There was ease in Doc Ruby's manner as he wheeled into his place;
There was pride in Doc Ruby's bearing and a smile on Doc Ruby's face,
And when, responding to the cheers, he lightly gave a nod,
No stranger in the crowd could doubt `twas Gay Doc Ruby at the quads.
Ten thousand eyes were on him as he gunned the throttle loud;
Five thousand tongues applauded as he signaled to the crowd.
And while the nervous officers grabbed the night sticks from their hips,
Defiance gleamed in Doc Ruby's eye, a sneer curled Doc Ruby's lip.
And now a can of tear gas came hurtling through the air,
And Doc Ruby stood a-watching it in haughty grandeur there,
Close by the haughty Doc Ruby, the can unheeded sped --
"That ain't my style," said Doc Ruby. "Break it up!" the coppers said.
From the streets, black with people, there went up a muffled roar,
Like the beating of the storm waves on a stern and distant shore.
"Kill them; kill the pigs!" shouted someone from the mob;--
And Doc Ruby guns his engine, and wipes-out on the lawn.
With a fist of protest shaking, Doc Ruby's visage shone;
He jumped back on his Harley; he bade the march go on;
The Harley takes off through the quads, 'till it hits a vicious bump;
And Doc Ruby sails through the air, landing smack upon his rump.
"Fascists!" he screeched, "Capitalist, Imperialist, Racist, Sexist pigs!"
"If I must I'll ride a tricycle, but we'll have this march - you dig?"
They saw his face grow stern and cold; they saw his muscles strain,
And they knew that Gay Doc Ruby wouldn't lose that bike again!
The sneer is gone from Doc Ruby's lip; his teeth are clenched in hate;
He sniffs with cruel derision as he lets go of the brake.
And now he throws it into first, the clutch he now he lets go,
And now the air is shattered as the bike takes off - alone.
Oh! somewhere there's a campus town where they drum and chant all night.
They protest for the rain forest, and demand the wart-hog's rights.
And somewhere bongs are being passed, and somewhere radicals shout;
But there is no joy at Old State U -- Gay Doc Ruby has Wiped Out!
Digimarc was great- I loved them. It was hillarious to see images marked and then 'remarked' by hacking the program to re-watermark the image. The original mark wasn't recoverable.
http://www.woodmann.com/fravia/frogdigi.htm
Food for thought.
Why does everyone here want this not to work?
I run a website with more than 6gb of photos and video that I have created from scratch. My hosting provider is generous, but there is a finite limit to my bandwidth. Like the TurnItIn bot, this digimarc bot will be an uninvited pest that repeatedly spiders the site downloading all my content to sniff it.
Since these visits are of no benefit to me, I'll block it by user-agent in htaccess and IP address once people figure out where this beast lives. Obviously, robots.txt depends on client-side cooperation, which this thing likely won't obey if it intends to be as promiscuous as possible.
Seth
$5 / month hosted VPS on linux = awesome!
Today's IP problems stem from the above argument that middle men, not creators, are rewarded for the work that the creators do. This applies to art, music, engineering, drugs... all "IP".
As has been pointed out the "watermark method" must be doomed to fail.
The article implies that the whole media file has to be downloaded and if that's the case there's a much better way of doing this. There are algorithms out there that can efficiently calculate a "signature" for the content of the file (for images, the "best" such algorithm is probably the SIFT algorithm, and there exist algorithms for other media files). These "signatures" are usually so called "multidimensional descriptors" that have the characteristic to be invariant to changes in the media (such as compression in images/audio/video or stretching, cropping and various other manipulations of image files etc).
The major obstacle so far has been how to perform an efficient search on a large collection of such signatures. That really isn't an obstacle any more.
My masters degree thesis (which I have just started working on) actually deals with further improvements to a new indexing architecture that has been developed at my school to deal with just such multidimensional descriptors. The improvement from previous ways of multidimensional descriptor indexing is huge. As an example, our test database contains descriptors for over 300,000 images and searching for a similar image took 2 hours with previous techniques. Using the new index type it takes 2 seconds, and the best part is that the query speed is not dependent on the size of the database.
You don't think enough... therefore you better not be!
I don't want Creative Commons and needing to parse the terms of various styles of licenses. I don't want to encounter at all the content of the RIAA/MPAA mafiaas who are seeking to hammer down on me for using it in a way I prefer. I need my web browsing "experience" to be designed in a manner in which only "GPL'd" content is visible and accessible to me, and all the copyrighted items are drowned out and disappear as the ugly noise they are.
Digital opens up options for freely sharing, copying, and widely distributing. There are plenty of independents who are making THEIR content available in alignment with the free manner in which digital works. How can I banish the content of those who refuse to play freely from ever polluting my computer?
DRM is like a fence with which the RIAA/MPAA wall themselves in. I understand that, and applaud it. If they don't want me, with my DRM free ways, then I don't want THEM. The problem is, their DRM is not efficient enough for my needs, because it doesn't wall them in perfectly.
I think the free software community needs to work with the RIAA/MPAA to help them perfect their DRM, so that we can freely go about our business using our computers in the full-sharing mode we deserve and grew up with, without the need for ever being bothered by the RIAA's and MPAA's DRM-polluted content ever again. Our interests are 100% aligned. But by leaving it to them, their execution of DRM is so piss poor that we keep stumbling over their unwanted, encumbered crap.
What about the possibility of cleaning files of their watermarks? If the watermarks do indeed uniquely identify their source, then comparing two of the files should yield where the difference is. Would that not make it possible to remove or distort the watermark?
This isn't like an image watermark where distorting the watermark distorts the quality of the image. Since the watermark technically cannot be heard, couldn't two files be compared and the different bits between the two erased.
Two digital music copies should be the same except in the place where they are watermarked.
How does the crawler plan on identifying multimedia files from any other binary files on the site? Assuming (like most spiders) it will follow the links on the website, which ones will it single out?
If it goes by extension (which it almost surely will, and even if it doesn't it's somewhat irrelevant), it can't possibly support every audio format. So, once you figure out what formats it can't interpret, just rip your music into that! OGG/Vorbis anyone?
Better still, just change the file extensions of all your music files, and tell people to download them and rename to .mp3 or what have you.
Isn't it possible for individuals who publish audio and are sick of this whole debate to copyright their own works and write a license to use that states the user has a right to listen to the content, but not to use it for "automatic" analysis, etc. In this case, would not the organization looking for copyright infringement be infringing on the copyright? Assuming the owner of the site is the legal holder of the copyright.
In this case, the owner of this "original" work could sue the company performing these checks for copyright infringement since they do not have a right to analyze the work. The only way they would be exempted from this restriction would be if the work was actually not an "original" work which they wouldn't know until they accessed it.
What I'm trying to say in a round about way is that history has shown that the entertainment industry sues first and thinks later. How many harassment suits have been filed under the DMCA law? How many times has someone been harassed for simply being connected to a P2P network?
I think we can expect that anyone found with any material on their web sites that contain a watermark will be treated as if they are guilty.
Sure, if you have the money you can defend yourself.
The race isn't always to the swift... but that's the way to bet!