Copyright Tool Scans Web For Violations

← Back to Stories (view on slashdot.org)

Copyright Tool Scans Web For Violations

Posted by Zonk on Tuesday December 19, 2006 @04:21AM from the he-knows-when-you've-been-bad-or-good dept.

The Wall Street Journal is reporting on a tech start-up that proposes to offer the ultimate in assurance for content owners. Attributor Corporation is going to offer clients the ability to scan the web for their own intellectual property. The article touches on previous use of techniques like DRM and in-house staff searches, and the limited usefulness of both. They specifically cite the pending legal actions against companies like YouTube, and wonder about what their attitude will be towards initiatives like this. From the article: "Attributor analyzes the content of clients, who could range from individuals to big media companies, using a technique known as 'digital fingerprinting,' which determines unique and identifying characteristics of content. It uses these digital fingerprints to search its index of the Web for the content. The company claims to be able to spot a customer's content based on the appearance of as little as a few sentences of text or a few seconds of audio or video. It will provide customers with alerts and a dashboard of identified uses of their content on the Web and the context in which it is used. The content owners can then try to negotiate revenue from whoever is using it or request that it be taken down. In some cases, they may decide the content is being used fairly or to acceptable promotional ends. Attributor plans to help automate the interaction between content owners and those using their content on the Web, though it declines to specify how."

9 of 185 comments (clear)

Min score:

Reason:

Sort:

buh by lucky130 · 2006-12-19 04:36 · Score: 5, Insightful

"as little as a few sentences of text or a few seconds of audio or video"

Like quotations in a paper, or video snippets in an educational presentation?
1. Re:buh by NeutronCowboy · 2006-12-19 05:38 · Score: 4, Insightful
  
  You're assuming anyone is going to manually verify any of the results. From my experience with people using monitoring software (especially non-techies who are simply consumers of the technology, but who provided the money for it), the vast majority of them are simply going to call their lawyers when they see the dashboard light up. I see vast letter writing campaigns come from this, with little actual infringing being prosecuted.
  
  This is a scary product. Not so much because of the technology behind it, but because of how it is going to be implemented and (ab)used.
  
  --
  Those who can, do. Those who can't, sue.
Fighting an avalanche with a snow shovel by TheWoozle · 2006-12-19 04:42 · Score: 4, Insightful

Doesn't this merely serve to point out the absurdity of "Intellectual Property"?

--
Insisting on "correct" English is like saying that there is only one, definitive recipe for chili.
Some interesting questions... by PingSpike · 2006-12-19 04:46 · Score: 4, Insightful

Great, now all the torrent sites will require captcha verification too! ;P

Actually, can they even scan torrents without downloading the entire file? And whats to stop everyone from just blocking them from accessing their websites? Are they going to go in covertly, pretending to be actual users? I can see every legit website blocking their access as well, why pay for bandwidth to supply that?

Sure, youtube can be more efficiently attacked...but youtube has been dancing in front of the cannons since its inception, we all knew it was going to get shot eventually.
Re:i don't like robots.txt anyway. by FooAtWFU · 2006-12-19 04:57 · Score: 5, Informative

You're absolutely right that "if you don't want it on the public Web, don't put it there in the first place" -- but there are still times when you have a legitimate reason that you don't want a page indexed, downloaded, or otherwise visited by a robot. Dynamically generated content is one example reason; sometimes certain pages can be a big drain on your website, and you'd prefer not to have every spider in the world hitting them up every few minutes.
Let's take a fun legitimate site like, oh... Wikipedia:
# Folks get annoyed when VfD discussions end up the number 1 google hit for # their name. See bugzilla bug #4776 # en: Disallow: /wiki/Wikipedia:Articles_for_deletion/ Disallow: /wiki/Wikipedia%3AArticles_for_deletion/ Disallow : /wiki/Wikipedia:Votes_for_deletion/ Disallow: /wiki/Wikipedia%3AVotes_for_deletion/ Disallow: /wiki/Wikipedia:Pages_for_deletion/ Disallow: /wiki/Wikipedia%3APages_for_deletion/ Disallow: /wiki/Wikipedia:Miscellany_for_deletion/ Disallow : /wiki/Wikipedia%3AMiscellany_for_deletion/ Disall ow: /wiki/Wikipedia:Miscellaneous_deletion/ Disallow: /wiki/Wikipedia%3AMiscellaneous_deletion/ Disallo w: /wiki/Wikipedia:Copyright_problems Disallow: /wiki/Wikipedia%3ACopyright_problems
(They also disallow certain specially generated pages like Special:Random, and any of the pages which actually let you edit the site).
Let's see, what are some other sites? Ooh. Take a look at Slashdot's robots.txt! (disallows a variety of fun pages.) Microsoft's? How about whitehouse.gov? Google?

--
The World Wide Web is dying. Soon, we shall have only the Internet.
Re:Raise. by Mayhem178 · 2006-12-19 05:12 · Score: 5, Funny

127.0.0.1: $ cat robots.txt
# robots.txt for 127.0.0.1
# This file is copyright 2006 by me.
User-agent: AttributorCorporationDMCABot
Disallow: *

Hahaha! You screwed up! I have your IP address now! I will send 127.0.0.1 to every company that uses the sniffer and tell them the person at that IP is an evil, evil person who exploits innocent people for their own profit and power!

--
"You will pay for your lack of vision..." - Emperor Palpatine to Ray Charles
Re:Yeah by AdamKG · 2006-12-19 05:17 · Score: 4, Interesting

and again in the minority in that I can separate out the purpose of copyrights and the evil actions of the legal arms of **AA companies.
Let's make one thing clear: the RIAA/MPAA lawsuits are not, in any way, shape, or form, an abuse, negative side of, misapplication or malicious use of Copyrights. They fulfill the role of Copyrights in the first place; they are the logical end result of a system that says citizens are allowed to distribute ideas (or expressions of ideas), then stop any further distribution of them.

The **AA lawsuits are ridiculous, yes. But the ridiculous part is not the litigation itself, it's the laws on which the lawsuits are brought under.

--
groupthink: It's good for self-esteem.
Re:search by hash? by Johann+Lau · 2006-12-19 05:45 · Score: 4, Informative

"Unaltered media files" are the exception, not the rule. Changing even a bit of metadata (stripping exif from an image, changing an mp3 tag) would change the checksum, not to mention things like putting things into an archive, resizing images, (re)recompressing music.

But yeah, it might make sense for Google to become "aware" of unique content and variations of it.. but I doubt they'd ever use that openly for (aiding in) hunting down copyright infringement, simply for PR reasons.
Re:i don't like robots.txt anyway. by mandelbr0t · 2006-12-19 06:32 · Score: 5, Informative
Dynamically generated content is one example reason; sometimes certain pages can be a big drain on your website
And dynamic content is, of course, the answer. If I'm going to put up copyrighted content in the future, I'd use one of a dozen schemes that regenerate the download link on a per-session basis. Obviously they're not going to honour robots.txt, but why are your links readable by such a basic spider? You need to:
1. Disallow anonymous downloads. You need to be logged onto the site to download anything, torrent or otherwise
2. Use a CAPTCHA to prevent spiders from signing up for said accounts
3. Use the session id to generate unique download links on a per-session basis
4. Change the key on your BitTorrent tracker every 12-24 hours. This will require that a downloader get the latest torrent from the original website (which requires login), reducing the impact of a leaked torrent
5. Compress and possibly encrypt the content so that it's less obvious what it is
Anyone who follows the above steps (and most sites already do most or all of this) won't be found by the spider. Period.

The only thing I can think of that this product would be useful for is to find people who have blatantly copied my website, but I'm sure you could find those people equally easily with Google.

mandelbr0t
--
"Please describe the scientific nature of the 'whammy'" - Agent Scully