Web Copyright Crackdown On the Way

← Back to Stories (view on slashdot.org)

Web Copyright Crackdown On the Way

Posted by kdawson on Friday March 5, 2010 @02:38AM from the eighty-percent-rule dept.

Hugh Pickens writes "Journalist Alan D. Mutter reports on his blog 'Reflections of a Newsosaur' that a coalition of traditional and digital publishers is launching the first-ever concerted crackdown on copyright pirates on the Web. Initially targeting violators who use large numbers of intact articles, the first offending sites to be targeted will be those using 80% or more of copyrighted stories more than 10 times per month. In the first stage of a multi-step process, online publishers identified by Silicon Valley startup Attributor will be sent a letter informing them of the violations and urging them to enter into license agreements with the publishers whose content appears on their sites. In the second stage Attributor will ask hosting services to take down pirate sites. 'We are not going after past damages' from sites running unauthorized content says Jim Pitkow, the chief executive of Attributor. The emphasis, Pitkow says is 'to engage with publishers to bring them into compliance' by getting them to agree to pay license fees to copyright holders in the future. Offshore sites will not be immune from the crackdown: almost all of them depend on banner ads served by US-based services, and the DMCA requires the ad service to act against any violator. Attributor says it can interdict the revenue lifeline at any offending site in the world." One possible weakness in Attributor's business plan, unless they intend to violate the robots.txt convention: they find violators by crawling the Web.

21 of 224 comments (clear)

Min score:

Reason:

Sort:

Re:Robots.txt by yincrash · 2010-03-05 02:43 · Score: 5, Insightful

Seriously. Following robots.txt is not law, only convention. I'm sure it doesn't take much to convince themselves to ignore it. Money, "doing the right thing", etc. If you view the copyright infringers as pirates, then why should Attributor follow their wishes?
Re:Robots.txt by notgm · 2010-03-05 02:43 · Score: 4, Insightful

is there some written law that holds people to following robots.txt? if not, how is it even possible to call it a weakness?
Re:i'm a little clueless here by Tim+C · 2010-03-05 02:53 · Score: 4, Insightful

This one.
On the other hand, that's an utterly asinine comment to have made (the one you quote, not yours). Of course they'll ignore it, why on Earth wouldn't they? It is in no way binding, and robots are free to ignore it, just as site owners are free to block connections from specific incoming IP addresses, the owners of those IPs are free to switch to new ones, and so on, ad infinitum.

--
It's official. Most of you are morons.
Lessoned learned from RIAA by KnownIssues · 2010-03-05 02:54 · Score: 4, Insightful

Sounds like they've learned their lesson from the RIAA. I'm not saying I agree with them and think they are right to do this. But, if you're going to try to enforce your interpretation of the law, this is at least a sane philosophy of doing so. Not going after damages is a smart move.
Will that ultimately include slashdot? by elrous0 · 2010-03-05 02:54 · Score: 5, Interesting

A lot of aggregator sites like this one base a lot of their topical content on articles printed elsewhere. While most (incl. /.) don't print whole articles intact, a lot of them do quote heavily (what used to be called "fair use," back when that phrase actually meant anything). So their first step is to go after the sites that reprint the articles whole-cloth. But will they stop there?

--
SJW: Someone who has run out of real oppression, and has to fake it.
1. Re:Will that ultimately include slashdot? by MtHuurne · 2010-03-05 03:30 · Score: 3, Insightful
  
  Unless an article is very short, quoting 80% of it is not fair use. So for now, I think they have every right to take steps against sites making money from their content without compensation.
  Yes, I am cynical enough to expect the reasonable 80% limit to be lowered over time until it reaches unreasonable levels. But let's hold the flames until they have actually crossed that line.
2. Re:Will that ultimately include slashdot? by natehoy · 2010-03-05 04:56 · Score: 3, Insightful
  
  80% is a reasonable starting point. If they start lowering it, we'll have to express our righteous indignation then. Fair use, when interpreted, is generally considered a LOT lower than routinely cutting-and-pasting 80% of articles, so they have a long way to lower it before we can honestly call our indignation righteous.
  Seriously, this really isn't a "slippery slope" situation. It seems to be a well-thought-out and sane set of guidelines. If anything, they are being a bit generous for now, and they can still tighten this quite a bit without coming close to busting "fair use" or even "reasonable use".
  Basically they are saying, "if you routinely use 80%+ of our articles as your own content, we're asking you to stop. We won't sue you for any past uses, we just want to make it clear that this isn't cool any more."
  A fair usage (not the lack of quotes, I am not talking about a legal doctrine) would be to use about 20% of the source article (properly attributed) with a link back to the original article. Give credit where it's due (and cite your sources). Then add your own thoughts, or don't. But don't take whole-cloth articles and post them on your own site with your own ads.
  Every discussion board I've ever participated in has pretty much recommended some really close variant to this anyway. It usually reads something like "cite a paragraph or two at most and have a link to the source article plainly visible nearby".
  
  --
  "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
Please do so by OzPeter · 2010-03-05 02:59 · Score: 5, Insightful

And in the process take down all those inane blogs whose sole purpose is to scrape and repost articles so they get an advertising hit.

--
I am Slashdot. Are you Slashdot as well?
1. Re:Please do so by Anonymous Coward · 2010-03-05 03:13 · Score: 3, Insightful
  
  While they're at it, can they take down forum/mailinglist mirrors too?
  It is extremely annoying when searching to find that the top 30 results all contain the exact same forum or blog post.
2. Re:Please do so by garcia · 2010-03-05 03:22 · Score: 4, Insightful
  
  And in the process find all the commercial sites using my copyrighted Flickr photos for their own purposes without my permission or payment. I'm tired of sending invoices and dealing with companies who tell you that your photo wasn't worth the $300 you charge and instead send you $50 thinking that it will clear up the matter.
  I love the hypocrisy of all of this. They are just as much at fault as any of those aggregation blogs. They just have more money to be a pain in the ass.
Re:i'm a little clueless here by KingSkippus · 2010-03-05 03:01 · Score: 4, Interesting

The Robots exclusion standard. Not that it will stop them; as others have pointed out, if they think they're "doing the right thing," I'm sure they will not be concerned about such a standard.
The worry here really isn't so much for the people who are hosting sites with infringing content. I'm sure a moral argument could be made that Attributor is well within the right to disregard the wishes of those who are breaking copyright law. However, I run several sites that have no infringing content whatsoever, sites with things that have content that, while not private, I don't particularly want spiders crawling. I'm not so naive to think that they don't do it anyway; I have server logs proving that they do. However, in this case, we have a company that is claiming to be legitimate completely ignoring my--someone who is not infringing--wishes and doing it.
Put another way, by convention, my neighbors don't use binoculars to peer into my house windows to see what I'm doing although there's currently not really anything stopping them from doing so. Even though I don't particularly have anything to hide, if I find that they are violating our polite social contract, then I'll put up shades just because it's none of their damn business.
I don't think that the robots.txt convention will be the thing that stops Attributor. I think that it will be that it won't take long for web site authors to figure out what user agents, IP address, etc. that Attributor is using and will block access from Attributor to their sites. Like I said, I have no infringing content on my sites, but if Attributor is going to ignore me politely asking their robots not to scan my sites, then I'm fully in the right to take further steps to forcibly prevent them from doing so.
Re: Offshore sites WILL be immune by Sockatume · 2010-03-05 03:03 · Score: 3, Insightful

Are you kidding? ACTA's going to harmonise everything so closely to the US that they'll be able to prosecute anyone.

--
No kidding!!! What do you say at this point?
the article, for your convenience by mdemonic · 2010-03-05 03:05 · Score: 5, Funny

A coalition of traditional and digital publishers this month will launch the first-ever concerted crackdown on copyright pirates on the web, initially targeting violators who use large numbers of intact articles.
Details of the crackdown were provided by Jim Pitkow, the chief executive of Attributor, a Silicon Valley start-up that has been selected as the agent for several publishers who want to be compensated by websites that are using their content without paying licensing fees.
In a telephone interview yesterday, Pitkow declined to identify the individual publishers in his coalition, but said they include “about a dozen” organizations representing wire services, traditional print publishers and “top-tier blog networks.”
The first offending sites to be targeted will be those using 80% or more of copyrighted stories more than 10 times per month.
In the first stage of a multi-step process aimed at encouraging copyright compliance instead of punishing scofflaws, Pitkow said online publishers identified by his company will be sent a letter informing them of the violations and urging them to enter into license agreements with the publishers whose content appears on their sites.
If copyright pirates refuse to pay, Attributor will request the major search engines to remove offending pages from search results and will ask banner services to stop serving ads to pages containing unauthorized content. The search engines and ad services are required to immediately honor such requests by the federal Digital Millennium Copyright Act (DMCA).
If the above efforts fail, Attributor will ask hosting services to take down pirate sites. Because hosting services face legal liability under the DCMA if they do not comply, they will act quickly, said Pitkow.
“We are not going after past damages” from sites running unauthorized content said Pitkow. The emphasis, he said is “to engage with publishers to bring them into compliance” by getting them to agree to pay license fees to copyright holders in the future.
License fees, which are set by each of the individual organizations producing content, may range from token sums for a small publisher to several hundred dollars for yearlong rights to a piece from a major publisher, said Pitkow.
Attributor identifies copyright violators by scraping the web to find copyrighted content on unauthorized sites. A team of investigators will contact violators in an effort to bring them into compliance or, alternatively, begin taking action under DMCA.
click the link to read the last 21%
1. Re:the article, for your convenience by natehoy · 2010-03-05 05:13 · Score: 3, Interesting
  
  No one.
  He posted the article, cited it as the original article (knowing there was a proper citation link above), and posted less than 80% of it. This is a completely legitimate use of the article as per Attributor's new rules. Two or three more words from the article would have made it an "80% rule" bust, but would still have been OK as long as he didn't make a habit of it. It's repeated use of more than 80% of source article text that Attributor wants to go after.
  Most discussion boards already limit direct citation to a paragraph or two, or approximately 20% of the article.
  So Attributor's 80% limit is making a clear statement that they are really only interested in pursuing people who make a routine habit of copying entire articles. And if the bulk of your content is coming from copying 100% of someone else's original news articles, you aren't exactly someone I want to waste my righteous indignation defending.
  
  --
  "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
Re:i'm a little clueless here by fuzzyfuzzyfungus · 2010-03-05 03:17 · Score: 3, Informative

Since, as you say, robots.txt will likely do nothing against them, the bigger question becomes "how do they plan to do their crawling?". Crawling from a well defined IP block, using software with user agent Attributor_copy_cop, will be laughably simple to block or present false noninfringing content to.

Spoofing the UA strings and(if necessary) some of the behavior of common web browsers is a simple software problem, so I assume that they'll do that(unless they are terminally incompetent). Out of curiosity, though, does anybody know how easy and cheap it would be (using legitimate methods not botnet style stuff) for such a commercial entity to obtain a reasonably large number of, ideally "residential looking", IPs that change fairly often? Do you just call verizon and say "I want 500 residential DSL lines brought out to so-and-so location"? Would you obtain the services of one of the sleazy datacenter operators who caters to spammers and the like and knows how to switch IP blocks frequently? Do you pay to have second lines installed at your employee's houses, with company scanner boxes attached?
my experience with Attributor by bcrowell · 2010-03-05 03:47 · Score: 5, Informative

I've had an experience with Attributor myself, and it's given me a pretty low opinion of them. I'm the author of a CC-BY-SA-licensed calculus textbook, titled "Calculus." Someone posted a copy of the pdf on Scribd, as allowed by the license. So one day I got an email from one of the people who runs Scribd, saying that Attributor had sent them a takedown notice, which they were skeptical about. Attributor hadn't supplied any useful information about what they thought was a violation. I called Scribd, and they checked and said it was a mistake -- they were working for Macmillan, which publishes another book titled "Calculus." So here they were, serving a DMCA notice under penality of perjury, and they hadn't even checked whether the name of the author was the same, or whether any of the text was the same. Their bot just found that the title, "Calculus," was the same as the title of one of their client's books. Pretty scummy.

--
Find free books.
1. Re:my experience with Attributor by bcrowell · 2010-03-05 03:59 · Score: 3, Informative
  
  Oops, important correction to the parent post: "I called Attributor, and they checked and said it was a mistake -- they were working for Macmillan..."
  
  --
  Find free books.
Re:Robots.txt by Registered+Coward+v2 · 2010-03-05 03:49 · Score: 3, Interesting

Seriously. Following robots.txt is not law, only convention. I'm sure it doesn't take much to convince themselves to ignore it. Money, "doing the right thing", etc. If you view the copyright infringers as pirates, then why should Attributor follow their wishes?
I'd go even farther to say that sites that use robot.txt to eliminate crawling are probably not major targets - if they don't show up in search engine sthen tehy probably don't generate enough traffic to be worth the effort. Sites that are high traffic are much better targets - their revenue stream form ads is prbabaly significant enough that they don't want to risk losing it. Once enough fall into line they can worry about the ones that are not indexed - in fact they may just want to kill them off to preserve traffic to licensed sites.

--
I'm a consultant - I convert gibberish into cash-flow.
I hope their algorithm can keep up by aarenz · 2010-03-05 04:28 · Score: 3, Interesting

I suspect that many sites that are using this type of content will find ways of hiding that fact by using non-display characters, breaking the article into multiple pages and the like to cover the fact that they are using the content. Would love to see their system in action on some test sites to figure out how much you need to do to cover the content and make it not match the original.
Re:i'm a little clueless here by ASBands · 2010-03-05 05:21 · Score: 3, Interesting

One idea would be to use the many available cloud services like EC2, Google App Engine and Azure. The IP blocks those services come in are going to remain fairly regular, but they are so common that it might not be acceptable for a site to block everything from ghs.l.google.com (and whatever EC2 and Azure live on). It is still blockable, though, so it probably would have been better for them (from a technical standpoint) if they hadn't announced their existence and these sites had been slowly indexed by their service before anybody knew what was happening.
Another (better) idea would be to use a service like Tor. Sure, their latency is going to skyrocket, but that's not a big deal since interactivity isn't a primary concern of an indexing service. It's still blockable, if infringing site admins block Tor nodes. This may or may not be doable, as I would imagine many users of said infringing sites use anonymizing networks for their normal traffic.
Sure, either of the solutions I've come up with in five minutes can be circumvented, but the idea isn't to totally eliminate piracy, its to make it inconvenient enough to make getting the legitimate version easier.

--
My UID is a prime number. Yeah, I planned that.
Re:Robots.txt by Joe+U · 2010-03-05 05:59 · Score: 3, Insightful

Right... because a judge will find that offer, consideration, and acceptance of a contract took place between a webserver and a bot? The court case you cite is irrelevant to an automated program that has no understanding and cannot accept conditions presented online.
Awesome, so anyone can DoS a server, send mass spam or distribute a virus as long as a bot does it, because a judge will rule that the bot acted on its own and wasn't developed or set loose by anyone at all.
If the software wrote itself you might have a point, otherwise the people who wrote it are the ones responsible for how it acts.