Copyright Tool Scans Web For Violations

← Back to Stories (view on slashdot.org)

Copyright Tool Scans Web For Violations

Posted by Zonk on Tuesday December 19, 2006 @04:21AM from the he-knows-when-you've-been-bad-or-good dept.

The Wall Street Journal is reporting on a tech start-up that proposes to offer the ultimate in assurance for content owners. Attributor Corporation is going to offer clients the ability to scan the web for their own intellectual property. The article touches on previous use of techniques like DRM and in-house staff searches, and the limited usefulness of both. They specifically cite the pending legal actions against companies like YouTube, and wonder about what their attitude will be towards initiatives like this. From the article: "Attributor analyzes the content of clients, who could range from individuals to big media companies, using a technique known as 'digital fingerprinting,' which determines unique and identifying characteristics of content. It uses these digital fingerprints to search its index of the Web for the content. The company claims to be able to spot a customer's content based on the appearance of as little as a few sentences of text or a few seconds of audio or video. It will provide customers with alerts and a dashboard of identified uses of their content on the Web and the context in which it is used. The content owners can then try to negotiate revenue from whoever is using it or request that it be taken down. In some cases, they may decide the content is being used fairly or to acceptable promotional ends. Attributor plans to help automate the interaction between content owners and those using their content on the Web, though it declines to specify how."

18 of 185 comments (clear)

Min score:

Reason:

Sort:

Wager by Baricom · 2006-12-19 04:33 · Score: 3, Insightful

Anybody care to place a friendly wager that they're not going to honor robots.txt?
Can't they just use google or torrent sites? by LiquidCoooled · 2006-12-19 04:33 · Score: 3, Informative

Can't they just use google or torrent sites?
If users can find items they want, presumably the copyright holders could use the same methods...

--
liqbase :: faster than paper
buh by lucky130 · 2006-12-19 04:36 · Score: 5, Insightful

"as little as a few sentences of text or a few seconds of audio or video"

Like quotations in a paper, or video snippets in an educational presentation?
1. Re:buh by NeutronCowboy · 2006-12-19 05:38 · Score: 4, Insightful
  
  You're assuming anyone is going to manually verify any of the results. From my experience with people using monitoring software (especially non-techies who are simply consumers of the technology, but who provided the money for it), the vast majority of them are simply going to call their lawyers when they see the dashboard light up. I see vast letter writing campaigns come from this, with little actual infringing being prosecuted.
  
  This is a scary product. Not so much because of the technology behind it, but because of how it is going to be implemented and (ab)used.
  
  --
  Those who can, do. Those who can't, sue.
Fighting an avalanche with a snow shovel by TheWoozle · 2006-12-19 04:42 · Score: 4, Insightful

Doesn't this merely serve to point out the absurdity of "Intellectual Property"?

--
Insisting on "correct" English is like saying that there is only one, definitive recipe for chili.
Raise. by Tackhead · 2006-12-19 04:44 · Score: 3, Funny

> Anybody care to place a friendly wager that they're not going to honor robots.txt?
127.0.0.1: $ cat robots.txt
# robots.txt for 127.0.0.1 # This file is copyright 2006 by me. User-agent: AttributorCorporationDMCABot Disallow: *
And if they do honor robots.txt, I'll be able to sue the fuckers for infringing on my copyright, because they must have read it in order to honor it.
1. Re:Raise. by Mayhem178 · 2006-12-19 05:12 · Score: 5, Funny
  
  127.0.0.1: $ cat robots.txt
  # robots.txt for 127.0.0.1
  # This file is copyright 2006 by me.
  User-agent: AttributorCorporationDMCABot
  Disallow: *
  
  Hahaha! You screwed up! I have your IP address now! I will send 127.0.0.1 to every company that uses the sniffer and tell them the person at that IP is an evil, evil person who exploits innocent people for their own profit and power!
  
  --
  "You will pay for your lack of vision..." - Emperor Palpatine to Ray Charles
2. Re:Raise. by FooAtWFU · 2006-12-19 05:32 · Score: 3, Interesting
  
  You joke, of course, of course, but there are tools out there to detect when a bot is abusing your site and not following robots.txt. The usual technique is to hide a few links in your page, and also have these links blocked by robots.txt. When a user visits the link, they're banned from viewing the site. (Sometimes, a CAPTCHA-like utility for unblocking yourself is presented along with the 403 page, in the event that a particularly curious user manages to find the link and activate it manually.)
  
  --
  The World Wide Web is dying. Soon, we shall have only the Internet.
Yeah by Hijacked+Public · 2006-12-19 04:45 · Score: 3, Interesting

FTFA:
If it works, it's a fantastic invention

Its purpose aside, yes, it would be a fantastic thing to be able to scan the entire web and reliably identify the context and content of any specific media file type. Video, audio, image, etc. Particularly if it could identify purposely obfuscated content.
I'm in what is almost certainly a tiny minority of Slashdotters in that I actually create copyrightable material rather than only consume it. I'm again in the minority in that I think copyrights are a good thing and again in the minority in that I can separate out the purpose of copyrights and the evil actions of the legal arms of **AA companies.
Regardless, while scanning the internet for improperly used material sounds great on paper this will probably end up being as effective as finding water with a divining rod. The current tactic of locking down things at the hardware and OS levels will get more support from the media companies, not that they seem all that good at choosing tactics when the internet is involved.

--
"Sacrifice for the good of The State" - The State
1. Re:Yeah by jedidiah · 2006-12-19 04:59 · Score: 3, Insightful
  
  There's a wide gulf between copyright being a good idea in concept and being sensibly implemented in it's current form.
  
  Not everyone that creates content thinks that draconian enforcement attempts are a good idea, or even in the best interests of those that create content.
  
  If your work can't survive in the marketplace, which includes the prospect of everyone on the planet getting to use it for free, then perhaps you should get some sort of more conventional day job.
  
  The difference between a game that sells 50K and one that sells 5 Million has nothing to do with DRM.
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
2. Re:Yeah by AdamKG · 2006-12-19 05:17 · Score: 4, Interesting
  
  and again in the minority in that I can separate out the purpose of copyrights and the evil actions of the legal arms of **AA companies.
  Let's make one thing clear: the RIAA/MPAA lawsuits are not, in any way, shape, or form, an abuse, negative side of, misapplication or malicious use of Copyrights. They fulfill the role of Copyrights in the first place; they are the logical end result of a system that says citizens are allowed to distribute ideas (or expressions of ideas), then stop any further distribution of them.
  
  The **AA lawsuits are ridiculous, yes. But the ridiculous part is not the litigation itself, it's the laws on which the lawsuits are brought under.
  
  --
  groupthink: It's good for self-esteem.
3. Re:Yeah by kanweg · 2006-12-19 05:26 · Score: 3, Interesting
  
  I'm a patent attorney and no stranger to IP. Having said that, any IP law is, or at least should be, a balance to on the one hand freedom to operate (both for IP users and for IP creators) and on the other hand a means for compensation for IP creators. For patents, that balance is not there for patents on software. Also for patents, at least they last for 20 years max. For copyright, that balance is not there. And I'm curious to hear whether you think it is a good thing that whatever you create is still under copyright more than 40 years after you die.
  
  Bert
Some interesting questions... by PingSpike · 2006-12-19 04:46 · Score: 4, Insightful

Great, now all the torrent sites will require captcha verification too! ;P

Actually, can they even scan torrents without downloading the entire file? And whats to stop everyone from just blocking them from accessing their websites? Are they going to go in covertly, pretending to be actual users? I can see every legit website blocking their access as well, why pay for bandwidth to supply that?

Sure, youtube can be more efficiently attacked...but youtube has been dancing in front of the cannons since its inception, we all knew it was going to get shot eventually.
search by hash? by straponego · 2006-12-19 04:47 · Score: 3, Interesting

Does Google allow searching by md5sum or equivalent? I'm sure they have the capability. While not as impressive as what this company claims, it'd also be more reliable for unaltered media files.
But it looks like the real "innovation" these guys are pushing toward is fully automated filing of lawsuits. I think that was in Accelerando, which is fantastic, and which you can download it free.
1. Re:search by hash? by Johann+Lau · 2006-12-19 05:45 · Score: 4, Informative
  
  "Unaltered media files" are the exception, not the rule. Changing even a bit of metadata (stripping exif from an image, changing an mp3 tag) would change the checksum, not to mention things like putting things into an archive, resizing images, (re)recompressing music.
  
  But yeah, it might make sense for Google to become "aware" of unique content and variations of it.. but I doubt they'd ever use that openly for (aiding in) hunting down copyright infringement, simply for PR reasons.
Re:Dupe by Maximum+Prophet · 2006-12-19 04:53 · Score: 3, Interesting
Since copyright lasts a long time and doesn't depend on being defended like trademark, there will be some allowances "for promotional reasons" like this:
1. Leak copywritten material in easy to copy format to places where it will be copied
2. Watch viral marketing campaign take over
3. Profit
4. Wait 'til revenue falls
5. Find infringers using new scan tools
6. Sue them
7. Profit more!!!
--
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Re:i don't like robots.txt anyway. by FooAtWFU · 2006-12-19 04:57 · Score: 5, Informative

You're absolutely right that "if you don't want it on the public Web, don't put it there in the first place" -- but there are still times when you have a legitimate reason that you don't want a page indexed, downloaded, or otherwise visited by a robot. Dynamically generated content is one example reason; sometimes certain pages can be a big drain on your website, and you'd prefer not to have every spider in the world hitting them up every few minutes.
Let's take a fun legitimate site like, oh... Wikipedia:
# Folks get annoyed when VfD discussions end up the number 1 google hit for # their name. See bugzilla bug #4776 # en: Disallow: /wiki/Wikipedia:Articles_for_deletion/ Disallow: /wiki/Wikipedia%3AArticles_for_deletion/ Disallow : /wiki/Wikipedia:Votes_for_deletion/ Disallow: /wiki/Wikipedia%3AVotes_for_deletion/ Disallow: /wiki/Wikipedia:Pages_for_deletion/ Disallow: /wiki/Wikipedia%3APages_for_deletion/ Disallow: /wiki/Wikipedia:Miscellany_for_deletion/ Disallow : /wiki/Wikipedia%3AMiscellany_for_deletion/ Disall ow: /wiki/Wikipedia:Miscellaneous_deletion/ Disallow: /wiki/Wikipedia%3AMiscellaneous_deletion/ Disallo w: /wiki/Wikipedia:Copyright_problems Disallow: /wiki/Wikipedia%3ACopyright_problems
(They also disallow certain specially generated pages like Special:Random, and any of the pages which actually let you edit the site).
Let's see, what are some other sites? Ooh. Take a look at Slashdot's robots.txt! (disallows a variety of fun pages.) Microsoft's? How about whitehouse.gov? Google?

--
The World Wide Web is dying. Soon, we shall have only the Internet.
Re:i don't like robots.txt anyway. by mandelbr0t · 2006-12-19 06:32 · Score: 5, Informative
Dynamically generated content is one example reason; sometimes certain pages can be a big drain on your website
And dynamic content is, of course, the answer. If I'm going to put up copyrighted content in the future, I'd use one of a dozen schemes that regenerate the download link on a per-session basis. Obviously they're not going to honour robots.txt, but why are your links readable by such a basic spider? You need to:
1. Disallow anonymous downloads. You need to be logged onto the site to download anything, torrent or otherwise
2. Use a CAPTCHA to prevent spiders from signing up for said accounts
3. Use the session id to generate unique download links on a per-session basis
4. Change the key on your BitTorrent tracker every 12-24 hours. This will require that a downloader get the latest torrent from the original website (which requires login), reducing the impact of a leaked torrent
5. Compress and possibly encrypt the content so that it's less obvious what it is
Anyone who follows the above steps (and most sites already do most or all of this) won't be found by the spider. Period.

The only thing I can think of that this product would be useful for is to find people who have blatantly copied my website, but I'm sure you could find those people equally easily with Google.

mandelbr0t
--
"Please describe the scientific nature of the 'whammy'" - Agent Scully