Slashdot Mirror


Archive.org Sued By Colorado Woman

An anonymous reader writes "The Internet Archive is being sued by a Colorado woman for spidering her site. Suzanne Shell posted a notice on her site saying she wasn't allowing it to be crawled. When it was, she sued for civil theft, breach of contract, and violations of the Racketeering Influence and Corrupt Organizations act and the Colorado Organized Crime Control Act. A court ruling last month granted the Internet Archive's motion to dismiss the charges, except for the breach of contract claim. If Shell prevails on that count, sites like Google will have to get online publishers to 'opt in' before they can be crawled, radically changing the nature of Web search."

7 of 797 comments (clear)

  1. GRRRRRRRRRR by ico2 · · Score: 0, Flamebait

    robots.txt? NOARCHIVE?
    People should have to take an exam or at least an IQ test before being allowed near a computer, Women doubly so

    1. Re:GRRRRRRRRRR by Anonymous Coward · · Score: -1, Flamebait

      "Please. Let's try not to live up to the "geeks are frustrated male misogynists" stereotypes, shall we?"

      One word too many in the stereotype.

      Lady geeks are also misogynists.

  2. Allow me to preempt the next 500 posts by Anonymous+Brave+Guy · · Score: -1, Flamebait

    OK, since we've got here already, let me preempt the next 500 factually incorrect "moral high ground" type posts.

    Fallacy: By putting your content on the web, you're giving permission for archive sites to duplicate it.
    Reality: By putting your content on the web, you're giving permission for visitors to read it. Under the law in many jurisdictions, they are also allowed to make personal copies of the work under "fair use" style legislation. However, nothing about this gives any permission to republish it in any jurisdiction I know of, and indeed it's hard to see how it could do for any nation that is a signatory to the major WIPO treaties. Even if this were the case, such permission would be implicit, and there was an explicit notice on the web site in this case making her wishes clear.

    Fallacy: She should have just used robots.txt/<meta> tags/whatever instead.
    Reality: This argument fails for several reasons. Firstly, these protocols are optional; they have no special legal weight. Secondly, not everyone is aware of these conventions, so while using them might count for something, failure to do so is unlikely to mean anything in law unless knowledge of them and ability to use them effectively is demonstrated. Thirdly, copyright is not opt-in, it is opt-out.

    Fallacy: This isn't fair: software can't read arbitrary contracts!
    Reality: This is not her problem. If someone wants to use software to copy stuff that isn't theirs, it is their responsibility to make sure that doing so is legal.

    Fallacy: What archive.org is doing is just like keeping information in a browser or ISP cache.
    Reality: Again, this analogy is flawed for several reasons. Browser caches are for personal use, and do not republish work to others. ISP caches are part of the Internet infrastructure, and their use is transient, while the use of archive.org is not necessary to normal web browsing. ISP caches will typically update or remove pages fairly quickly after the original is updated or removed from the web, while archive.org is intended to preserve sites in perpetuity and redisplay them to others.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  3. Re:Posted notice? by Anonymous+Brave+Guy · · Score: -1, Flamebait

    Oops! Looks like somebody doesn't understand the internet.

    Oops! Looks like someone doesn't understand the law or the Internet.

    Robots.txt is the way to block web spiders from your site.

    robots.txt is an optional protocol. It has no special standing in law, nor does it force spiders not to index or copy a site if they choose to ignore it.

    When you have a blank or non-existant robots.txt, it's understood by billions of people on the internet that you don't mind if web spiders crawl your site and add it to their index, make cached copies, etc.

    Oh, spare us your feeble attempt at hyperbole. I doubt that there are 1,000,000 people in the world who know about robots.txt, never mind 1,000,000,000s. The vast majority of Internet users today are not geeks, and have never heard of these obscure protocols, yet they still publish information on the web.

    Also, every person who visits your site gets a complete copy of the pages they visit in their browser cache. Once your page is cached in my browser, I have that information forever. I can delete it, view it, save it to CD, make a PDF, etc.

    If you can't appreciate the difference between a private copy of information made on a single computer during routine browsing and a public copy of information that is being republished without permission by a third party then I'm afraid you're woefully unqualified to be in this discussion. For all the bluster in your post, nothing in it has the slightest legal or ethical merit.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  4. Re:Posted notice? by Score+Whore · · Score: -1, Flamebait

    Oops! Looks like somebody doesn't understand the internet.

    Robots.txt is the way to block web spiders from your site. That's not somebody "dictating your rights", that just the way it fucking works.

    Are you really that much of a loser? You talk as if robots.txt is in the same ballpark as gravity. It's not "just the way it fucking works." It's how a few businesses have condescendingly dictated to you how you might be able to control some aspects of their automated software as it repeatedly hammers your website, using your resources for their profit.

    If you want to talk about how the internet works, why don't you butch the fuck up and show me the RFC. Oh wait, you won't be able to because the only goddamned document regarding robots.txt that ever started the path towards being an internet standard is entitled "draft-koster-robots-00.txt" and it expired in June 1997. It never made it to being an RFC. You've heard of RFCs, right? The actual standards documents that define the internet. If anything, people who use robots.txt are specifically not within standards since the draft document was allowed to expire without ever being adopted.

    When you have a blank or non-existant robots.txt, it's understood by billions of people on the internet that you don't mind if web spiders crawl your site and add it to their index, make cached copies, etc. That's the way it was designed, and that's the way its worked from the very beginning. It's not rocket science.

    Really? Billions of people? Billions? Lie much? I bet you tell the girls you have a fifty-six inch cock. There aren't billions of people who can tell you what HTTP stands for, let alone the meaning of an after thought, didn't even make it as an add-on, specification.

    And that's the way it's worked from the very beginning, really? Truly you are showing depths of cluelessness that rival members of congress. The draft document specifying the robots.txt format was written in November 1996, and published in December 1996. About seven years after "the very beginning." We'll not even get into your claim about that being the way it was designed. Talk about not even doing the most basic research.

    Also, every person who visits your site gets a complete copy of the pages they visit in their browser cache. Once your page is cached in my browser, I have that information forever. I can delete it, view it, save it to CD, make a PDF, etc. Just like the person who owns a book that's no longer published. There's not some magic "delete fairy" who goes around deleting everyone's browser cache when you decide to delete a page.

    Yeah, and going back to how the internet actually "fucking works" you can download and read the HTTP RFCs and see that caching is explicitly included in the standards. There are headers that specifically dicate whether a page should be cached or not and if cached, for how long. Having something in your browser cache doesn't give you the right to republish, you ignorant turd.

    Maybe not everyone knows about their browser cache or robots.txt, but that doesn't mean they don't exist.

    So after parsing your barely literate double negative, you're claiming that because people do not know about browser caches and robots.txt such things exist? That is some real solid logic. I bet you ax people questions too.

    You can't change the way the internet works because a bunch of morons failed to do even the most basic research before throwing their crap on the web.

    In light of the fact that robots.txt is actually not an internet standard, are you really equating the dictates of a few dozen companies as being the carved-in-stone, laws-of-physics, "way the internet works?" As opposed to the thousands of actual RFCs that define the "way the internet works."

    ps - Before you start painting your cardboard placards with "I heart robots.txt" you migh

  5. Re:Posted notice? by Score+Whore · · Score: 0, Flamebait
    Since we know precisely who "they" is in the original statement your attempt to generalize your way out of looking like an idiot fails. If you wanted to support you claim that archive.org is not engaging in copyright violations, you would have to identify why they in particular are not. Pointing out that under specific conditions republishing is not copyright violation is not enough.

    Secondly, you don't have a right to not be disagreed with, regardless of the validity of the argument used in explaining the disagreement. If I were a malicious sort your threats of legal action could get you in trouble.

    Thirdly, surprisingly enough, you hit on my exact point without understanding it at all. Municipalities and states have used the processes and methods required by statute to specify the nature of the notices for no trespassing signs. I, as a trespasser, cannot require that you provide me with notice written in very small print on the back of a hundred dollar bill. Robots.txt is not law. It's not even an RFC. It's just an expired draft document, portions of which are used by various companies. Archive.org saying "you didn't have a robots.txt so we're going to republish your content" is like me telling a cop "the road isn't painted green, so I'm going to drive 140 MPH."

    Finally, you know what's really ironic? The first paragraph of Archive.org's Terms of Use is this:

    This terms of use agreement (the "Agreement") governs your use of the collection of Web pages and other digital content (the "Collections") available through the Internet Archive (the "Archive"). When accessing an archived page, you will be presented with the terms of use agreement. If you do not agree to these terms, please do not use the Archive's Collections or its Web site (the "Site").


    Wait! They're putting up an agreement page. But... but... what about robots.txt? If I was this lady, I'd just print that out and take it to court. Seems like a very clear statement that archive.org believes that such agreements are binding.
  6. Re:Posted notice? by Achromatic1978 · · Score: 0, Flamebait

    Failure to provide a simple robots.txt file evidences a lack of reasonable precaution and undermines plaintiff's claims to redress in a court of law.

    Crap. Unless you have a complete dick for legal representation.

    "Your Honor, we submit that because the plaintiff chose not to provide a file based on an ten year old, expired draft document that was once considered, and never annointed as an accepted standard, they clearly aren't interested in protecting their intellectual property, and ask for a motion of summary judgment."

    Hahahaha! No.