Slashdot Mirror


Inside the Internet Archives

blackbearnh writes "O'Reilly Media is running an interview with Gordon Mohr, Chief Technologist for the Internet Archive (archive.org). If you've ever wondered how pages are selected for archiving, or just how they manage such a huge quantity of data, the answers are here. The interview also touches on the problems of intellectual property in archives, archiving the Internet in a post Web 2.0 world, and the potential vulnerabilities exposed by archiving web sites that may include security exploits."

1 of 85 comments (clear)

  1. Re:Wayback by SydShamino · · Score: 5, Insightful

    If it wasn't true, then a site owner would have no way to remove his content from the Wayback Machine retrospectively. I don't necessarily disagree with their policy, but this is the wrong argument for it.

    If you publish something, you lose the right to withdraw it from the public archives retrospectively. That's part of the "contract" (term used figuratively) with the public that establishes the foundation of copyright law.

    If you don't want it to appear on the Wayback Machine, you have an ability called robots.txt. That's already more than you have if you publish a book and want to keep it out of libraries. In neither case, though, do you have the right to demand or expect the content to be removed from the archive on your request.

    I see what the archive does to be a courtesy service, not something that the site owners should expect.
    --
    It doesn't hurt to be nice.