Archive.org Sued By Colorado Woman
An anonymous reader writes "The Internet Archive is being sued by a Colorado woman for spidering her site. Suzanne Shell posted a notice on her site saying she wasn't allowing it to be crawled. When it was, she sued for civil theft, breach of contract, and violations of the Racketeering Influence and Corrupt Organizations act and the Colorado Organized Crime Control Act. A court ruling last month granted the Internet Archive's motion to dismiss the charges, except for the breach of contract claim. If Shell prevails on that count, sites like Google will have to get online publishers to 'opt in' before they can be crawled, radically changing the nature of Web search."
robots.txt? NOARCHIVE?
People should have to take an exam or at least an IQ test before being allowed near a computer, Women doubly so
OK, since we've got here already, let me preempt the next 500 factually incorrect "moral high ground" type posts.
Fallacy: By putting your content on the web, you're giving permission for archive sites to duplicate it.
Reality: By putting your content on the web, you're giving permission for visitors to read it. Under the law in many jurisdictions, they are also allowed to make personal copies of the work under "fair use" style legislation. However, nothing about this gives any permission to republish it in any jurisdiction I know of, and indeed it's hard to see how it could do for any nation that is a signatory to the major WIPO treaties. Even if this were the case, such permission would be implicit, and there was an explicit notice on the web site in this case making her wishes clear.
Fallacy: She should have just used robots.txt/<meta> tags/whatever instead.
Reality: This argument fails for several reasons. Firstly, these protocols are optional; they have no special legal weight. Secondly, not everyone is aware of these conventions, so while using them might count for something, failure to do so is unlikely to mean anything in law unless knowledge of them and ability to use them effectively is demonstrated. Thirdly, copyright is not opt-in, it is opt-out.
Fallacy: This isn't fair: software can't read arbitrary contracts!
Reality: This is not her problem. If someone wants to use software to copy stuff that isn't theirs, it is their responsibility to make sure that doing so is legal.
Fallacy: What archive.org is doing is just like keeping information in a browser or ISP cache.
Reality: Again, this analogy is flawed for several reasons. Browser caches are for personal use, and do not republish work to others. ISP caches are part of the Internet infrastructure, and their use is transient, while the use of archive.org is not necessary to normal web browsing. ISP caches will typically update or remove pages fairly quickly after the original is updated or removed from the web, while archive.org is intended to preserve sites in perpetuity and redisplay them to others.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Oops! Looks like someone doesn't understand the law or the Internet.
robots.txt is an optional protocol. It has no special standing in law, nor does it force spiders not to index or copy a site if they choose to ignore it.
Oh, spare us your feeble attempt at hyperbole. I doubt that there are 1,000,000 people in the world who know about robots.txt, never mind 1,000,000,000s. The vast majority of Internet users today are not geeks, and have never heard of these obscure protocols, yet they still publish information on the web.
If you can't appreciate the difference between a private copy of information made on a single computer during routine browsing and a public copy of information that is being republished without permission by a third party then I'm afraid you're woefully unqualified to be in this discussion. For all the bluster in your post, nothing in it has the slightest legal or ethical merit.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Are you really that much of a loser? You talk as if robots.txt is in the same ballpark as gravity. It's not "just the way it fucking works." It's how a few businesses have condescendingly dictated to you how you might be able to control some aspects of their automated software as it repeatedly hammers your website, using your resources for their profit.
If you want to talk about how the internet works, why don't you butch the fuck up and show me the RFC. Oh wait, you won't be able to because the only goddamned document regarding robots.txt that ever started the path towards being an internet standard is entitled "draft-koster-robots-00.txt" and it expired in June 1997. It never made it to being an RFC. You've heard of RFCs, right? The actual standards documents that define the internet. If anything, people who use robots.txt are specifically not within standards since the draft document was allowed to expire without ever being adopted.
Really? Billions of people? Billions? Lie much? I bet you tell the girls you have a fifty-six inch cock. There aren't billions of people who can tell you what HTTP stands for, let alone the meaning of an after thought, didn't even make it as an add-on, specification.
And that's the way it's worked from the very beginning, really? Truly you are showing depths of cluelessness that rival members of congress. The draft document specifying the robots.txt format was written in November 1996, and published in December 1996. About seven years after "the very beginning." We'll not even get into your claim about that being the way it was designed. Talk about not even doing the most basic research.
Yeah, and going back to how the internet actually "fucking works" you can download and read the HTTP RFCs and see that caching is explicitly included in the standards. There are headers that specifically dicate whether a page should be cached or not and if cached, for how long. Having something in your browser cache doesn't give you the right to republish, you ignorant turd.
So after parsing your barely literate double negative, you're claiming that because people do not know about browser caches and robots.txt such things exist? That is some real solid logic. I bet you ax people questions too.
In light of the fact that robots.txt is actually not an internet standard, are you really equating the dictates of a few dozen companies as being the carved-in-stone, laws-of-physics, "way the internet works?" As opposed to the thousands of actual RFCs that define the "way the internet works."
ps - Before you start painting your cardboard placards with "I heart robots.txt" you migh
Secondly, you don't have a right to not be disagreed with, regardless of the validity of the argument used in explaining the disagreement. If I were a malicious sort your threats of legal action could get you in trouble.
Thirdly, surprisingly enough, you hit on my exact point without understanding it at all. Municipalities and states have used the processes and methods required by statute to specify the nature of the notices for no trespassing signs. I, as a trespasser, cannot require that you provide me with notice written in very small print on the back of a hundred dollar bill. Robots.txt is not law. It's not even an RFC. It's just an expired draft document, portions of which are used by various companies. Archive.org saying "you didn't have a robots.txt so we're going to republish your content" is like me telling a cop "the road isn't painted green, so I'm going to drive 140 MPH."
Finally, you know what's really ironic? The first paragraph of Archive.org's Terms of Use is this:
Wait! They're putting up an agreement page. But... but... what about robots.txt? If I was this lady, I'd just print that out and take it to court. Seems like a very clear statement that archive.org believes that such agreements are binding.
Crap. Unless you have a complete dick for legal representation.
"Your Honor, we submit that because the plaintiff chose not to provide a file based on an ten year old, expired draft document that was once considered, and never annointed as an accepted standard, they clearly aren't interested in protecting their intellectual property, and ask for a motion of summary judgment."
Hahahaha! No.