Slashdot Mirror


Archiving Web Pages - Legal or Illegal?

Dyer asks: "I used to run several high-trafficked anonymous surfing sites and if I wasn't getting emailed by a lawyer telling me to block someone's site from being accessed I was being woken up at 2am with a telephone call from a crazy person yelling, sometimes swearing at me with the impression that my site copied theirs and it resided on my server, when in actuality it was being accessed by my server at that instant and being relayed to the user. This is my point, how do services like Archive.org and Google's cache get away with what they're doing? You can call their services whatever you like, but it doesn't change the fact that they are copying people's websites and saving them onto their servers for everyone to access."

7 of 102 comments (clear)

  1. It SHOULD be legal by Anonymous Coward · · Score: 4, Interesting

    Well, it should be legal/allowed. If you don't want it read and archived, don't put it on the Web.

    Everything should go, except for things like malicious alteration and theft (taking stuff and claiming it is yours)

    1. Re:It SHOULD be legal by lightspawn · · Score: 5, Interesting

      Well, it should be legal/allowed. If you don't want it read and archived, don't put it on the Web.

      You know, I've been wondering about Java/Shockwave games. Certainly most kids would love a CD full of those games, and many companies have many different games online which mostly disappear a few months later.

      Is anybody archiving these? Do we need to start?

      Would the companies object?

      You can play The Hitchhiker's Guide to the Galaxy on Douglas Adams' web site. As it happens, if you know what you're doing you can also download the .z5 file and play it offline on any zip interpreter. Would the copyright owners object to it? I own that Infocom 33-game collection and all 5 books; the reason the game wasn't included in the collection is copyright hassles. Am I "entitled" to play it offline?

      This ties in to today's "is ROM collecting wrong" story, except in this case you're actually offered the games, under mostly unclear terms.

  2. RTFF by kalidasa · · Score: 5, Informative

    Archive .org FAQ

    How can I remove my site's pages from the Wayback Machine?
    The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine.
    See our exclusion policy.
    You can find exclusion directions at exclude.php. If you cannot place the robots.txt file, opt not to, or have further questions, email wayback2@archive.org.

    In other words, by your NOT including a robots.txt file, you are implicitly granting them permission to cache your content. Also, the content is cached as it was published, complete with the appropriate markings, and is only publicly accessible content, so you'd be hard press to argue there is any economic harm from the caching, which means there would be likely be no damages from a successful copyright suit, which means a copyright suit would be pretty damned unlikely.

    IANAL.

  3. My 9/11 Archive by limekiller4 · · Score: 4, Interesting

    On the day of 9/11, I began to think that maybe a lot of things would be online that would disappear on the next update, forever. We tend to think of 1880 newspaper clippings as being perishable, not online media, but the opposite is true. So all day on 9/11 I archived news sites and about two hundred blogs using "wget -p".

    Over the next week I archived some 4,600 blogs. They've kind of been sitting around waiting for me to weed through and organize. I've also been wgetting 30 or so large news sites' front page every 15 minutes or so on the hunch that I'll grab something emerging even if I'm AFK. Well ...what can I do with this data?

    The answer(s) to this question will definitely be of use to me. Thanks for asking it. Slash, thanks for posting it.

    --
    My .02,
    Limekiller
  4. An idea by revmoo · · Score: 4, Insightful

    Here's a thought, a rather complicated one, but I Think it just might do the trick...

    DON'T POST THINGS YOU DON'T WANT PEOPLE TO SEE ON A PUBLIC NETWORK.

    It's quite simple really.

    --
    I would expect such blatant racism on Fark, but on Slashdot? Mods please ban this asshole.
  5. *copy* right by ccady · · Score: 4, Interesting

    (FWIW, IANAL) Web site content is copyrighted. Therefore, you have a right to make your own personal copy, and backup copies, but it is not legal to redistribute those copies without the site owner's permission. I cannot imagine that the Wayback machine or the Google cache is legal. They are blatantly disregarding the site owners' copyright.

    That said, I think the law should be changed or at least clarified, because it is patently (pun intended) obvious that those services are doing a vast social good, and should be encouraged.

    --
    J'aime mieux les méchants que les imbéciles, parce qu'ils se reposent. -- Alexandre Dumas
    1. Re:*copy* right by SeanAhern · · Score: 4, Informative
      Mod parent up! This link to the US Code is very useful in this context.

      Heck, it's so useful that I'm going to quote some of it here:

      TITLE 17 > CHAPTER 5 > Sec. 512. Prev | Next

      Sec. 512. - Limitations on liability relating to material online

      (a) Transitory Digital Network Communications. -

      A service provider shall not be liable for monetary relief, or, except as provided in subsection (j), for injunctive or other equitable relief, for infringement of copyright by reason of the provider's transmitting, routing, or providing connections for, material through a system or network controlled or operated by or for the service provider, or by reason of the intermediate and transient storage of that material in the course of such transmitting, routing, or providing connections, if -

      (1)

      the transmission of the material was initiated by or at the direction of a person other than the service provider;

      (2)

      the transmission, routing, provision of connections, or storage is carried out through an automatic technical process without selection of the material by the service provider;

      (3)

      the service provider does not select the recipients of the material except as an automatic response to the request of another person;

      (4)

      no copy of the material made by the service provider in the course of such intermediate or transient storage is maintained on the system or network in a manner ordinarily accessible to anyone other than anticipated recipients, and no such copy is maintained on the system or network in a manner ordinarily accessible to such anticipated recipients for a longer period than is reasonably necessary for the transmission, routing, or provision of connections; and

      (5)

      the material is transmitted through the system or network without modification of its content.

      (b) System Caching. -

      (1) Limitation on liability. -

      A service provider shall not be liable for monetary relief, or, except as provided in subsection (j), for injunctive or other equitable relief, for infringement of copyright by reason of the intermediate and temporary storage of material on a system or network controlled or operated by or for the service provider in a case in which -

      (A)

      the material is made available online by a person other than the service provider;

      (B)

      the material is transmitted from the person described in subparagraph (A) through the system or network to a person other than the person described in subparagraph (A) at the direction of that other person; and

      (C)

      the storage is carried out through an automatic technical process for the purpose of making the material available to users of the system or network who, after the material is transmitted as described in subparagraph (B), request access to the material from the person described in subparagraph (A),

      if the conditions set forth in paragraph (2) are met.

      (2) Conditions. -

      The conditions referred to in paragraph (1) are that -

      (A)

      the material described in paragraph (1) is transmitted to the subsequent users described in paragraph (1)(C) without modification to its content from the manner in which the material was transmitted from the person described in paragraph (1)(A);

      (B)

      the service provider described in paragraph (1) complies with rules concerning the refreshing, reloading, or other updating of the material when specified by the person making the material available online in accordance with a generally accepted industry standard data communications protocol for the system or network through which that person makes the material available, except that this subparagraph applies only if those rules are not used by the person described in paragraph (1)(A) to prevent or unreasonably impair the intermediate storage to which this subsection applies;