Web Copyright Crackdown On the Way
Hugh Pickens writes "Journalist Alan D. Mutter reports on his blog 'Reflections of a Newsosaur' that a coalition of traditional and digital publishers is launching the first-ever concerted crackdown on copyright pirates on the Web. Initially targeting violators who use large numbers of intact articles, the first offending sites to be targeted will be those using 80% or more of copyrighted stories more than 10 times per month. In the first stage of a multi-step process, online publishers identified by Silicon Valley startup Attributor will be sent a letter informing them of the violations and urging them to enter into license agreements with the publishers whose content appears on their sites. In the second stage Attributor will ask hosting services to take down pirate sites. 'We are not going after past damages' from sites running unauthorized content says Jim Pitkow, the chief executive of Attributor. The emphasis, Pitkow says is 'to engage with publishers to bring them into compliance' by getting them to agree to pay license fees to copyright holders in the future. Offshore sites will not be immune from the crackdown: almost all of them depend on banner ads served by US-based services, and the DMCA requires the ad service to act against any violator. Attributor says it can interdict the revenue lifeline at any offending site in the world." One possible weakness in Attributor's business plan, unless they intend to violate the robots.txt convention: they find violators by crawling the Web.
I'm sure these guys have no compunction against ignoring robots.txt if it makes them money by doing so.
There is a war going on for your mind.
What on earth is the DMCA supposed to achieve, in the context of Ad-providers?
Sounds pretty scary to me.
Love over Gold.
This one.
On the other hand, that's an utterly asinine comment to have made (the one you quote, not yours). Of course they'll ignore it, why on Earth wouldn't they? It is in no way binding, and robots are free to ignore it, just as site owners are free to block connections from specific incoming IP addresses, the owners of those IPs are free to switch to new ones, and so on, ad infinitum.
It's official. Most of you are morons.
Sounds like they've learned their lesson from the RIAA. I'm not saying I agree with them and think they are right to do this. But, if you're going to try to enforce your interpretation of the law, this is at least a sane philosophy of doing so. Not going after damages is a smart move.
A lot of aggregator sites like this one base a lot of their topical content on articles printed elsewhere. While most (incl. /.) don't print whole articles intact, a lot of them do quote heavily (what used to be called "fair use," back when that phrase actually meant anything). So their first step is to go after the sites that reprint the articles whole-cloth. But will they stop there?
SJW: Someone who has run out of real oppression, and has to fake it.
all this harrassment is going to do will be to push the global small internet publishers to services in other countries. Datacenters, Ad services in u.s. will lose customers. There are already strong companies servicing in those areas in Eu. Eu will be happy to receive that amount of business.
the stupor of american corporatism is overwhelming. they can even go to the extent of shooting themselves in the foot.
Read radical news here
And in the process take down all those inane blogs whose sole purpose is to scrape and repost articles so they get an advertising hit.
I am Slashdot. Are you Slashdot as well?
"Offshore sites will not be immune from the crackdown: almost all of them depend on banner ads served by US-based services, and the DMCA requires the ad service to act against any violator. "
Not sure this is such a great idea - when you're broke you don't starve off the little income you're still getting... I'm inclined to think that in the near future, things will more likely go in the opposite direction, grey-legal stuff will be fully legalized to provide some as much extra economic stimulus as possible.
"I bless every day that I continue to live, for every day is pure profit."
The Robots exclusion standard. Not that it will stop them; as others have pointed out, if they think they're "doing the right thing," I'm sure they will not be concerned about such a standard.
The worry here really isn't so much for the people who are hosting sites with infringing content. I'm sure a moral argument could be made that Attributor is well within the right to disregard the wishes of those who are breaking copyright law. However, I run several sites that have no infringing content whatsoever, sites with things that have content that, while not private, I don't particularly want spiders crawling. I'm not so naive to think that they don't do it anyway; I have server logs proving that they do. However, in this case, we have a company that is claiming to be legitimate completely ignoring my--someone who is not infringing--wishes and doing it.
Put another way, by convention, my neighbors don't use binoculars to peer into my house windows to see what I'm doing although there's currently not really anything stopping them from doing so. Even though I don't particularly have anything to hide, if I find that they are violating our polite social contract, then I'll put up shades just because it's none of their damn business.
I don't think that the robots.txt convention will be the thing that stops Attributor. I think that it will be that it won't take long for web site authors to figure out what user agents, IP address, etc. that Attributor is using and will block access from Attributor to their sites. Like I said, I have no infringing content on my sites, but if Attributor is going to ignore me politely asking their robots not to scan my sites, then I'm fully in the right to take further steps to forcibly prevent them from doing so.
Slashdot for it's copy-pasted copies
News publishers using Attributor probably won't attack Slashdot for excerpting one paragraph from a ten-paragraph story any time soon. From the summary:
the first offending sites to be targeted will be those using 80% or more of copyrighted stories
Ok, here's an argument.
http://blog.internetcases.com/2010/01/05/browsewrap-website-terms-and-conditions-enforceable/
So, the terms of use of a website are binding, at least according to this court. If the terms spell out mandatory following of robots.txt, is robots.txt now binding?
As I understand it, advertisers targeting readers in the United States tend to choose ad networks that operate or at least have some sort of assets in the United States, not ad networks that operate in the European Union. Advertisers who target readers in the European Union probably will not want to pay to reach readers in the United States, especially for a product not available in the United States.
A coalition of traditional and digital publishers this month will launch the first-ever concerted crackdown on copyright pirates on the web, initially targeting violators who use large numbers of intact articles.
Details of the crackdown were provided by Jim Pitkow, the chief executive of Attributor, a Silicon Valley start-up that has been selected as the agent for several publishers who want to be compensated by websites that are using their content without paying licensing fees.
In a telephone interview yesterday, Pitkow declined to identify the individual publishers in his coalition, but said they include “about a dozen” organizations representing wire services, traditional print publishers and “top-tier blog networks.”
The first offending sites to be targeted will be those using 80% or more of copyrighted stories more than 10 times per month.
In the first stage of a multi-step process aimed at encouraging copyright compliance instead of punishing scofflaws, Pitkow said online publishers identified by his company will be sent a letter informing them of the violations and urging them to enter into license agreements with the publishers whose content appears on their sites.
If copyright pirates refuse to pay, Attributor will request the major search engines to remove offending pages from search results and will ask banner services to stop serving ads to pages containing unauthorized content. The search engines and ad services are required to immediately honor such requests by the federal Digital Millennium Copyright Act (DMCA).
If the above efforts fail, Attributor will ask hosting services to take down pirate sites. Because hosting services face legal liability under the DCMA if they do not comply, they will act quickly, said Pitkow.
“We are not going after past damages” from sites running unauthorized content said Pitkow. The emphasis, he said is “to engage with publishers to bring them into compliance” by getting them to agree to pay license fees to copyright holders in the future.
License fees, which are set by each of the individual organizations producing content, may range from token sums for a small publisher to several hundred dollars for yearlong rights to a piece from a major publisher, said Pitkow.
Attributor identifies copyright violators by scraping the web to find copyrighted content on unauthorized sites. A team of investigators will contact violators in an effort to bring them into compliance or, alternatively, begin taking action under DMCA.
click the link to read the last 21%
easy enough to search google cache and bypass the robots.txt problem....
heck.. they SHOULD proclaim the spider name-- drum up a lot of informaiton
and focus on sites that mention it in robots.txt to check from other sources
every day http://en.wikipedia.org/wiki/Special:Random
Sometimes I really wish we could just go back to the early 90's when big media thought the internet was a joke, we didnt need them then and frankly I usually think we would be better off without them now.
Home Internet access in the early to mid 1990s was dial-up. Do you want to go back to that?
I think the key there is the visibility of the terms:
If I write a robot to crawl a site looking for certain keywords (e.g. Metallica), I will not necessarily ever have had visibility of those terms.
It's official. Most of you are morons.
this is the beginning of an arms race
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Put another way, by convention, my neighbors don't use binoculars to peer into my house windows to see what I'm doing although there's currently not really anything stopping them from doing so.
Curtains?
Since, as you say, robots.txt will likely do nothing against them, the bigger question becomes "how do they plan to do their crawling?". Crawling from a well defined IP block, using software with user agent Attributor_copy_cop, will be laughably simple to block or present false noninfringing content to.
Spoofing the UA strings and(if necessary) some of the behavior of common web browsers is a simple software problem, so I assume that they'll do that(unless they are terminally incompetent). Out of curiosity, though, does anybody know how easy and cheap it would be (using legitimate methods not botnet style stuff) for such a commercial entity to obtain a reasonably large number of, ideally "residential looking", IPs that change fairly often? Do you just call verizon and say "I want 500 residential DSL lines brought out to so-and-so location"? Would you obtain the services of one of the sleazy datacenter operators who caters to spammers and the like and knows how to switch IP blocks frequently? Do you pay to have second lines installed at your employee's houses, with company scanner boxes attached?
If a site posts articles yet has them excluded by robots.txt doesn't that defeat the purpose of posting the article where it can be indexed and found?
In other words if an article is posted, but robots.txt says to not index it, that article isn't going to show up in a search. Its a bit like rebroadcasting an NFL game in a movie theatre with no one in the theatre to watch it.
Prosser, in both his article and in the Restatement (Second) of Torts at 652A-652I, classifies four basic kinds of privacy rights:
1. unreasonable intrusion upon the seclusion of another, for example, physical invasion of a person's home (e.g., unwanted entry, looking into windows with binoculars or camera, tapping telephone), searching wallet or purse, repeated and persistent telephone calls, obtaining financial data (e.g., bank balance) without person's consent, etc.
http://www.rbs2.com/privacy.htm
My turnips listen for the soft cry of your love
pasted too soon
"Only the second of these four rights is widely accepted in the USA. In addition to these four pure privacy torts, a victim might recover under other torts, such as intentional infliction of emotional distress, assault, or trespass.
Unreasonable intrusion upon seclusion only applies to secret or surreptitious invasions of privacy. An open and notorious invasion of privacy would be public, not private, and the victim could then chose not to reveal private or confidential information. For example, recording of telephone conversations is not wrong if both participants are notified before speaking that the conversation is, or may be, recorded. There certainly are offensive events in public, but these are properly classified as assaults, not invasions of privacy."
My turnips listen for the soft cry of your love
I welcome them to crawl my sites and ignore my robots.txt files. They won't get very far though. When my server detects that behavior it passes the IP to my firewall which adds it to the "drop these packets into a black hole" list.
I have quite a large table of IP addresses of idiots that violated robots.txt.
1) Put up a file sharing site with lots of music and movie files.
...
2) Craft a robots.txt to keep out the RIAA and MPAA.
Profit!!!
Robots.txt is a convention that was never intended to restrict checking for illegal content. The idea behind robots.txt is only to keep site indexers such as Google, Yahoo, etc. out of certain directories.
Cheers,
Dave
They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
Ben
Except that the recent RIAA case ruling that you don't need to have actually seen a copyright notice in order to be bound by it, due to the ubiquity of the notice, ToS are similarly ubiquitous, so you should be bound by that as well, seeing it or not.
Canada: The US's more awesome sibling.
A robot can't enter into a contract though, I would imagine.
I've had an experience with Attributor myself, and it's given me a pretty low opinion of them. I'm the author of a CC-BY-SA-licensed calculus textbook, titled "Calculus." Someone posted a copy of the pdf on Scribd, as allowed by the license. So one day I got an email from one of the people who runs Scribd, saying that Attributor had sent them a takedown notice, which they were skeptical about. Attributor hadn't supplied any useful information about what they thought was a violation. I called Scribd, and they checked and said it was a mistake -- they were working for Macmillan, which publishes another book titled "Calculus." So here they were, serving a DMCA notice under penality of perjury, and they hadn't even checked whether the name of the author was the same, or whether any of the text was the same. Their bot just found that the title, "Calculus," was the same as the title of one of their client's books. Pretty scummy.
Find free books.
i dont think that france, germany, spain, scandinavian countries and rest of the eu will just sit and accept u.s. as dominator of the world information.
Read radical news here
No, but the person who deployed the robot can implicitly do so by its actions.
Same argument used by the guy who had his cat click the EULA confirmations, and same flaw. He’s still liable.
Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
Copyright treaty and "Cybersecurity" ruses are massive police state measures which will fundamentally alter life as we know it. We need as many minds as we can muster working out Articles of Impeachment just as fast as we can.
A robot can't enter into a contract though, I would imagine.
If I cannot be liable for my robot breaking a contract, why couldn't I just make a robot which spiders the net and copies everything, violating copyright? Somehow I think your logic is flawed.
To me this is a natural culmination of larger traditional media outlets who are still on the whole managed and run by people who simply don't understand what the internet is, or how it works, nor how people engage with and use theis ifnormation. I'm increasingly surrounded by people who have little or no background in media or internet publishing (they call themselves professional managers) who are telling us how things will work in the future without so much as a weeks worth of shop-floor experience (I've worked for a large UK media owner for 11+ years).
Look at Murdoch's utter inability to understand what the web represents and his reactionary walled garden approach to media delivery and consumption. What you've got are senior managers all desperately trying to create mechanisms to restrict access to their content because they believe that scarcity will somehow shore-up revenues. What they've failed to understand is that the fundamental rules of engagement have changed. It's time they stood aside/down and let those people with the understanding and foresight to get on with building their company's future.
if publisher's want to build a future for themselves then listen up. Open up your content to developers, engage with your audience & readership, partner with well selected commercial entities to extend your markets, limit the amount of advertising you provide but make that advertising relevant and engaging for you and your audience (because otherwise everyone will start using Adblock), offer unique content, know where your content is being consumed and what revenues you're generating on the back of it and crucially understand that as a publisher you are no-longer able to call the shots as you once did on how people access your content. Technological innovation is something they should be embracing in all it's scary, unfettered raw glory, not something to hide away from and build walls to defend against.
It's not just copyright. The slow but steady alignment of copyright holders, oppressive governments, legal changes, media pressure and surveillance technology has wound itself around the internet worldwide, and now the real pressure is being applied. This is a secular change, largely unobservable over smaller intervals, but the end result is that the web in 10 and 20 years time will be a noticeably less free place than it is today. Everything you do online will be monitored, everything will be logged, everything will be legally defined and controlled, and every infringement will be subject to criminal penalties.
The parties responsible have the support of the politicians, the censors, the press, the money men and most of the public. We used to have the support of the geeks and their creativity in bypassing censorship. But let's face it; geeks have not created a truly disruptive technology since BitTorrent almost ten years ago. While Geekdom slept, the likes of Cisco and the major Telcos have constructed a frightening array of technologies for surveillance and control of the internet, and the fruit of their efforts can be seen in China, Iran and now even countries like Australia. Soon it will be seen all over the world.
The Web has changed. Governments are no longer going to tolerate the freedom and anarchy that it grants to the population at large. They now have the means, method and opportunity to put this genie back in the bottle. This crackdown is the first offensive on what is going to be a wide front. Expect the free net to lose.
May the Maths Be with you!
just a few days ago, when I found a real magazine had copied without permission, integrally and without attribution, an article of mine. I wrote this: http://stop.zona-m.net/node/112 then asked them to please cancel their copy and they immediately did it.
I suspect that many sites that are using this type of content will find ways of hiding that fact by using non-display characters, breaking the article into multiple pages and the like to cover the fact that they are using the content. Would love to see their system in action on some test sites to figure out how much you need to do to cover the content and make it not match the original.
Wrong. robots.txt asks you to not index certain pages. It does not give permission to index the rest of the pages.
Permission to read the pages is implicit in the fact that you’re serving them freely to whoever or whatever makes an HTTP request for them.
Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
without explicit permission, copyright states that you cannot copy.
It's only convention (not law) that says you can.
Actually, you’re wrong. The law defines when you can copy. It’s called fair use.
Asking the server for a copy of a page so that you can read it is considered fair use, and there’s nothing in the law that says a robot is any different than a human in this regard.
Alexander Peter Kristopeit bought his basement from his mommy for one dollar.
Offshore sites will not be immune from the crackdown, said Pitkow, because almost all of them depend on banner ads served by U.S.-based services. Because the DMCA requires the ad service to act against any violator, Attributor says it can interdict the revenue lifeline at any offending site in the world.
Attributor already has been engaged by several major book publishers to get unauthorized eBooks off unauthorized sites. "And we have 99% success rate," he said.
:(
Since when did not toeing a corporate line with regard to intangibles turn otherwise upstanding citizens into criminals?
This BS began with the EULA, gained traction with the DMCA, and will be solidified with ACTA.
Stop the madness! Vote the bums out!
Huh?
If they are RSS syndicating their content, that constitutes a legitimate use of their work. The 100% copy is done with the permission of the author. You are allowed to read that all you want.
If YOU are RSS syndicating complete articles that someone else wrote, this is not a legitimate use of their work. You could, however, RSS syndicate 79.9% of each and every one of their articles and put a link to the original article for your readers to read the other 20.1%. Common courtesy would actually dictate that you post a few paragraphs only, then link back to the source for the rest of the article. Credit where credit is due.
Seriously, if you are routinely using complete articles written by someone else, and you aren't compensating them for that use, you are violating copyright.
"This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
One idea would be to use the many available cloud services like EC2, Google App Engine and Azure. The IP blocks those services come in are going to remain fairly regular, but they are so common that it might not be acceptable for a site to block everything from ghs.l.google.com (and whatever EC2 and Azure live on). It is still blockable, though, so it probably would have been better for them (from a technical standpoint) if they hadn't announced their existence and these sites had been slowly indexed by their service before anybody knew what was happening.
Another (better) idea would be to use a service like Tor. Sure, their latency is going to skyrocket, but that's not a big deal since interactivity isn't a primary concern of an indexing service. It's still blockable, if infringing site admins block Tor nodes. This may or may not be doable, as I would imagine many users of said infringing sites use anonymizing networks for their normal traffic.
Sure, either of the solutions I've come up with in five minutes can be circumvented, but the idea isn't to totally eliminate piracy, its to make it inconvenient enough to make getting the legitimate version easier.
My UID is a prime number. Yeah, I planned that.
Will they be writing angry letters to Google too? You know... those that index all their content, so it can be found in the first place?
Let them kill themselves with their delusional business model. The space and jobs will quickly be filled with something else.
Any sufficiently advanced intelligence is indistinguishable from stupidity.
IT. IS. LAW!!! You need to make a copy of a copyrighted work to view that work on the internet. You cannot make a copy unless you have permission. robots.txt gives that permission BY CONVENTION on the internet.
Let me guess: you check robots.txt every time you browse to a new website and if that website has no robots.txt, you leave because you don't have permission to view anything?
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
On the other hand, that's an utterly asinine comment to have made (the one you quote, not yours).
It's a kdawson quote... what more can you expect?
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
They expect people to provide them with free information (they call it "interviews" and "fact gathering" and all that) and then turn around and try to sell it. Oh, but they do add something: overpaid upper-middle-class bias and political favoritism in return for being allowed to hobnob with the imporant people and getting invited to all the right parties.
We can only hope that the big players in this industry will go bankrupt.
Wonder if they will apply this to sites that feature FAQ-type writeups. I remember reading a small strategy guide for MW2 Multiplayer mode on a website that I Googled. It was nearly verbatim to the original one on a competitor's site, just without the pictures and the same formatting. Hell, they even tried to use slightly different sentence structure in some places, but still used the same adjectives and adverbs in many places (much like how someone plagarizing a term paper would "re-write" it in their own words). All with zero attribution to the original source.
These folks don't seem to have thought their cunning plan all the way through. No matter how they try to dress it up, they're vigilantes in pursuit of their idea of justice and there are some legal issues that they are going to have to deal with.
I'll let someone like NYCL describe those issues in detail - and I don't have any of anyone else's material online but it might be fun to do so, collect a DMCA complaint from these clowns - then sue them and watch them try to dance for the judge.
I was in the bathroom. You got a problem with that?
Try posting with your account, you dickless wonder.
At least have the decency to do so yourself.
Anything can be found funny, from a certain point of view.
Hosting providers shouldn't just take down a site based on a letter from Attributor. There's something called "Fair Use". also why should the hosting provider take the risk of taking down a site? Whose to say that Attibutor is not making a mistake and accusing the wrong publisher? Let a court decide, not some stupid start up trying to make a buck.
The day Microsoft creates a product that doesn't suck, it will be known as the Microsoft Vaccuum Cleaner!
is it just me or have none of these business execs ever read "goose that lays the golden eggs" as a kid- the fact of the matter is that restricting information actually discourages people from wanting to read it- it doesn't in any way encourage them to pay for it, people will in all likelihood just watch the news or look for stories that are free somewhere else which means both more disinformation and bias being treated as news.
I think you're mixing up two distinct issues here. Copyright notice is not necessary to somebody owning a copyright and enforcing the associated rights. It has nothing to do with "the ubiquity of the notice."
I guess you missed this story:
http://www.wired.com/threatlevel/2010/02/former-teen-cheerleader-dinged-27750-for-infringing-37-songs/
Most specifically this part: " the Copyright Act precludes such a defense if the legitimate CDs of the music in question provide copyright notices."
This, despite her claim that she never had the actual CDs to see the notice.
Canada: The US's more awesome sibling.
While I had not read that story, it doesn't materially change my post. I just read the case (2010 WL 653322 - it's not available in a federal reporter yet), and here is the jist of why my original post stands and your analogy is a false analogy.
.") to Maverick Recording Co. v. Harper is a non-starter.
To start, you said that "in order to be bound by [the copyright notice)", "you don't need to have actually seen a copyright notice." This is always true. It has nothing to do with the ubiquity of notice. After the US signed Berne, we eliminated formalities like notice, and - as I said - "Copyright notice is not necessary to somebody owning a copyright and enforcing the associated rights."
This is different from the ToS because ToS are contractual. Copyright is statutory. That is why ubiquity matters to ToS. We should know better that many sites have ToS, so we are bound. That said, I don't think every jurisdiction is so liberal with ToS application.
In any case, for copyright, it is not that "we should know better, so we're bound by copyright." This is why your analogy ("except. .
And even false analogy aside, you misunderstood what happened in the case. Harper was trying to assert the innocent infringer defense in order to lower damages. But " 402(d) . . . gives publishers the option to trade the extra burden of providing copyright notice for absolute protection against the innocent infringer defense."
So what happened was that Harper made out a prima facie case for innocent infringement according to the district court, thus (as an issue of fact) the matter would be left to the jury. However, the court of appeals said, "hold on, even if you make the prima facie case ( 405(b)) the publisher has an absolute defense."
Harper lost because of 402(d), not because notice is ubiquitous. 402(d) would apply even if nobody knew that CDs had copyright notices on them. Once the notice is on the original, it's good to go.