Copyright Tool Scans Web For Violations
The Wall Street Journal is reporting on a tech start-up that proposes to offer the ultimate in assurance for content owners. Attributor Corporation is going to offer clients the ability to scan the web for their own intellectual property. The article touches on previous use of techniques like DRM and in-house staff searches, and the limited usefulness of both. They specifically cite the pending legal actions against companies like YouTube, and wonder about what their attitude will be towards initiatives like this. From the article: "Attributor analyzes the content of clients, who could range from individuals to big media companies, using a technique known as 'digital fingerprinting,' which determines unique and identifying characteristics of content. It uses these digital fingerprints to search its index of the Web for the content. The company claims to be able to spot a customer's content based on the appearance of as little as a few sentences of text or a few seconds of audio or video. It will provide customers with alerts and a dashboard of identified uses of their content on the Web and the context in which it is used. The content owners can then try to negotiate revenue from whoever is using it or request that it be taken down. In some cases, they may decide the content is being used fairly or to acceptable promotional ends. Attributor plans to help automate the interaction between content owners and those using their content on the Web, though it declines to specify how."
Anybody care to place a friendly wager that they're not going to honor robots.txt?
Can't they just use google or torrent sites?
If users can find items they want, presumably the copyright holders could use the same methods...
liqbase
"as little as a few sentences of text or a few seconds of audio or video"
Like quotations in a paper, or video snippets in an educational presentation?
Doesn't this merely serve to point out the absurdity of "Intellectual Property"?
Insisting on "correct" English is like saying that there is only one, definitive recipe for chili.
127.0.0.1: $ cat robots.txt
# robots.txt for 127.0.0.1
# This file is copyright 2006 by me.
User-agent: AttributorCorporationDMCABot
Disallow: *
And if they do honor robots.txt, I'll be able to sue the fuckers for infringing on my copyright, because they must have read it in order to honor it.
Its purpose aside, yes, it would be a fantastic thing to be able to scan the entire web and reliably identify the context and content of any specific media file type. Video, audio, image, etc. Particularly if it could identify purposely obfuscated content.
I'm in what is almost certainly a tiny minority of Slashdotters in that I actually create copyrightable material rather than only consume it. I'm again in the minority in that I think copyrights are a good thing and again in the minority in that I can separate out the purpose of copyrights and the evil actions of the legal arms of **AA companies.
Regardless, while scanning the internet for improperly used material sounds great on paper this will probably end up being as effective as finding water with a divining rod. The current tactic of locking down things at the hardware and OS levels will get more support from the media companies, not that they seem all that good at choosing tactics when the internet is involved.
"Sacrifice for the good of The State" - The State
Great, now all the torrent sites will require captcha verification too! ;P
Actually, can they even scan torrents without downloading the entire file? And whats to stop everyone from just blocking them from accessing their websites? Are they going to go in covertly, pretending to be actual users? I can see every legit website blocking their access as well, why pay for bandwidth to supply that?
Sure, youtube can be more efficiently attacked...but youtube has been dancing in front of the cannons since its inception, we all knew it was going to get shot eventually.
But it looks like the real "innovation" these guys are pushing toward is fully automated filing of lawsuits. I think that was in Accelerando, which is fantastic, and which you can download it free.
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
Let's take a fun legitimate site like, oh... Wikipedia:
(They also disallow certain specially generated pages like Special:Random, and any of the pages which actually let you edit the site).Let's see, what are some other sites? Ooh. Take a look at Slashdot's robots.txt! (disallows a variety of fun pages.) Microsoft's? How about whitehouse.gov? Google?
The World Wide Web is dying. Soon, we shall have only the Internet.
And dynamic content is, of course, the answer. If I'm going to put up copyrighted content in the future, I'd use one of a dozen schemes that regenerate the download link on a per-session basis. Obviously they're not going to honour robots.txt, but why are your links readable by such a basic spider? You need to:
Anyone who follows the above steps (and most sites already do most or all of this) won't be found by the spider. Period.
The only thing I can think of that this product would be useful for is to find people who have blatantly copied my website, but I'm sure you could find those people equally easily with Google.
mandelbr0t
"Please describe the scientific nature of the 'whammy'" - Agent Scully