Slashdot Mirror


Interview with Brewster Kahle

Netmonger writes "A fascinating interview with the man behind The Wayback Machine. Some specs from the article: "It's 150-odd standard PC cases, with four drives in each.. 'Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.." All I can say is.. Wow!"

14 of 195 comments (clear)

  1. A lot of internet information is crap... by nofx_3 · · Score: 2, Interesting

    So why would you want to preserve all of it? Why not just get the good stuff and maybe he won't need so many comptuers. I understand that just choosing the good stuff would be very subjective, but do we really need archives of pr0n sites and popups?

    --
    Visualize Whirled Peas
    1. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 5, Interesting

      We're not qualified to judge what "good stuff" is.

      For example, a ciouple of centuries ago old household accounts would have been considered valueless. But today's historians find a wealth of social data in them - what did people eat? how much did they get paid? did families tend to enter service together? how often did servants get new clothes?

      Disc space is cheap. Keep everything, let future historians sort it out.

  2. Transient Moments by szyzyg · · Score: 5, Interesting

    It's a shame that some fo the more interesting moments in Internet history are so transient the wayback machine can't catch them.

    e.g. The Ded Kitty picture we put up when napster shut down at the star of september, it was only there for a few hours but it will be lost.

    Of course, some of the more interesting transient events are websites that are hacked, but there exist dedicated archives for this kind of event, so you can relive the hilarity of RIAA.org being repeatedly defaced.

  3. Move over Borges by doogieh · · Score: 2, Interesting

    As Borges once said about the Libaray of Babel wayback now...

    The universe (which others call the Library) is composed of an indefinite and perhaps infinite number of hexagonal galleries, with vast air shafts between, surrounded by very low railings.
    Looks like he wasn't too far off...

    ...The Library is a sphere whose exact center is any one of its hexagons and whose circumference is inaccessible.

    Well, maybe not...

  4. Maybe we can help them to get this info for us? by Ninja+Programmer · · Score: 2, Interesting

    Perhaps we need to propose an extension to the robots.txt file to tell certain classes of search crawler to visit more frequently or at specific times?

  5. On a related note, look up the Long Now Foundation by JJAnon · · Score: 3, Interesting

    Here. They seek to create physical items (clocks and libraries are two items they name) that will last for very, very long periods of time. This diagram shows what is meant by the "long now", and this is a link to their first prototype clock that is on display in the Science Museum in the UK (the second clock on the page).

  6. Robots.txt - That was how the RIAA was hacked by szyzyg · · Score: 3, Interesting

    Hint: Don't put security pages in your robots.txt which aren't supposed to be linked.... or at least secure them with a password.

    http://www.zone-h.org/en/news/read/id=894/

  7. True story and a small thanks.... by Anonymous Coward · · Score: 3, Interesting

    Small personal thanks from me. I had put an online exhibit of my artwork up a few years ago, but unfortunately lost all of it by a harddrive failure. Much to my surprise I was able to find nearly all of my site, http://www.gpapassavas.com online and backed up on the WBM.

  8. "That's X Pages!" analogies are silly. by tambo · · Score: 2, Interesting

    I always have to chuckle when I see these analogies. "If you printed all of the data on a CD-ROM, it would reach Mars!"... that's super.

    There are at least two problems with such analogies:

    1) People use them to comment on the marvelous efficiency of technology - but in reality, it's only a comment on the hideous inefficiency of print. It doesn't say much at all about technology. It might be useful to convince people to digitize/OCR their printed matter - but is anyone *not* doing this? Even the Library of Congress is scanning its texts now.

    2) In this case it's a particularly bad analogy, because it assumes that all data is printed as hex. Example: images, which are obviously a huge, huge chunk of the Wayback archive. Virtually all website images are small enough to print on a printed page at full resolution. But consider a 500x500-pixel image, at 16 bits (2 bytes per pixel, 2 chars to represent each byte)... that's 1,000,000 characters, or 1,000 pages!

    Basically the analogy is good for wildly inflating some numbers to stun the 0.00001% of the population that doesn't already realize these things.

    - David Stein

    --
    Computer over. Virus = very yes.
  9. Re:Odd, no copyright questions by Wesley+Felter · · Score: 3, Interesting

    In presentations, Brewster says his policy is to take out the complainers. So if you think having your site in the Wayback Machine is a copyright infringement, he'll just take it out. Meanwhile he's taking the Napster approach: assume what you're doing is legal until someone tells you to stop. Hopefully that day won't arrive.

  10. Vaguely uncomfortable by Anonymous Coward · · Score: 1, Interesting
    I understand what the Internet Archive is meant to do, and in alot of ways I admire Brewster Kahle. But...they are archiving and republishing millions of pages that were never intended to last forever. And without permission at that. I don't mean this from a legal perspective, as I have no idea what the laws are on this, but something seems at least slightly wrong about that.

    If there is a way to permanantly erase pages from the archive, I would be a little less worried. But I can never tell if they let you delete stuff, or just "block" it. "Blocking" is crap, we all know what that will be worth if somebody really wants the info someday and knows the Archive has it.

    1. Re:Vaguely uncomfortable by Maul · · Score: 3, Interesting

      I disagree completely.

      If you put something on the web, you have put it up for the world to see. The whole point of putting information on the web is making that information available to lots of people.

      What the Internet Archive is doing is no different than libraries storing old copies of newspapers and magazines. With an increasing amount of things being published online, we need an archive of those things.

      Years from now archives of web pages will be quite useful for those doing research on the events of today.

      Say you are a student in the year 2050 and are doing a report on the "history of the web." Wouldn't it be nice to have copies of the web pages from the 1990s to show how the "early" web looked like?

      --

      "You spoony bard!" -Tellah

  11. Any bets.... by MDX-F1 · · Score: 5, Interesting

    on how long before a politician has to resign because of some over the top statements he/she made in a flamewar back in college? Or maybe that webpage of ethnic jokes that seemed so hilarious back in high school.

    I have a feeling we are either going to have to become way more forgiving, or we're going to be stuck with only faceless boring types with no opinions as our leaders (no wisecracks, it could be much worse than it is now).

  12. Re:Odd, no copyright questions by Obiwan+Kenobi · · Score: 3, Interesting
    Or, as the buddhists say:


    "It is easier to ask for forgiveness than permission."