Slashdot Mirror


Interview with Brewster Kahle

Netmonger writes "A fascinating interview with the man behind The Wayback Machine. Some specs from the article: "It's 150-odd standard PC cases, with four drives in each.. 'Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.." All I can say is.. Wow!"

18 of 195 comments (clear)

  1. How many by FunkSoulBrother · · Score: 4, Funny

    How many miles of shelf space equal one Library of Congress? Lets use standard units here people!

  2. Transient Moments by szyzyg · · Score: 5, Interesting

    It's a shame that some fo the more interesting moments in Internet history are so transient the wayback machine can't catch them.

    e.g. The Ded Kitty picture we put up when napster shut down at the star of september, it was only there for a few hours but it will be lost.

    Of course, some of the more interesting transient events are websites that are hacked, but there exist dedicated archives for this kind of event, so you can relive the hilarity of RIAA.org being repeatedly defaced.

  3. stupid Joe Six-Pack metaphors by p_rotator · · Score: 4, Funny


    As plain text in book form, that'd be over 3000 miles of shelf space.."

    Huh? How about "If all data was spoken at once, it would be as loud as 674 jet engines!" Or "If this archive were a planet, it would be as large as Jupiter!"

  4. According to... by Cyclopedian · · Score: 4, Informative
    this, the LOC pales in comparision to "3000 miles of shelf space".

    -Cyc

  5. I don't understand terabytes.... by nebenfun · · Score: 4, Funny

    "Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.."

    I don't understand terabyte or the shelf space analogy...
    I need to know how many banana's.

    nbfn

    1. Re:I don't understand terabytes.... by gid · · Score: 4, Funny

      Well since bananas can't directly hold data that well since they rot so quickly, well have to use those bananas to store data by some other indirect means.

      So, how many bananas would it take to feed all the monkeys needed to store the data? Monkey's aren't that smart so lets approximate each monkey can hold 4k worth of data.

      100 TB = 100 * 1024 * 1024 * 1024 KB = 107374182400 KB

      107374182400 KB / 4 = 26843545600 monkeys

      Now we'd want redundancy so lets have triplictate monkeys for all our data, in case one dies, or runs away, or simply forgets.

      26843545600 * 3 = 80530636800 monkeys

      But now want want to figure out how many bannas they're gonna eat, lets say 5 bananas a day per monkey?

      80530636800 * 5 = 402653184000 bananas to feel all monkeys per day

      402653184000 * 365 = 146968412160000 bananas to feed all monkeys per year

      146,968,412,160,000 or 146 trillion bananas per year, which is probably just slightly over the nation debt.

      Overall, I think your method of using bananas to store all this data is quite ridiculous. The latency and dataloss would be unbearable. Plus think of all the poop these monkeys would create, and you'd NEVER be able to get PETA off your back.

  6. Wayback technology by watchful.babbler · · Score: 5, Informative

    There's an excellent interview with Kahle on technical details at O'Reilly's own archive -- here.

    --
    "Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
  7. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 5, Interesting

    We're not qualified to judge what "good stuff" is.

    For example, a ciouple of centuries ago old household accounts would have been considered valueless. But today's historians find a wealth of social data in them - what did people eat? how much did they get paid? did families tend to enter service together? how often did servants get new clothes?

    Disc space is cheap. Keep everything, let future historians sort it out.

  8. Re:A lot of internet information is crap... by 0xdeadbeef · · Score: 4, Funny

    You don't consider the archiving of pr0n a noble cause? Don't be so selfish, man, think of future generations!

    I mean, hell, forget pr0n, just imagine the blackmail value for the kids of 2020, to be able to dig up pictures of their parents on amihotornot.

  9. Another site, with pics by RhBaby · · Score: 5, Informative

    http://www.mindjack.com/feature/archive.html

    In the interest of full disclosure, I wrote it, so be gentle.

  10. Picture of a Picture by paughsw · · Score: 4, Funny

    I put in www.archive.org into the wayback machine and my computer exploded!

  11. See also by danlyke · · Score: 4, Informative

    For other Brewster Kahle interviews, see also the Slashdot story that pointed to the O'Reilly interview and the Slashdot story that pointed to the Feed magazine interview (which is currently unaccessible from my machine).

  12. Odd, no copyright questions by dsanfte · · Score: 5, Insightful

    I was curious to how the Wayback Machine's operators view its legal status... I mean, it's not really a search engine in the broadly accepted meaning of the term. It doesn't just search what's out there, it archives entire pages of old information; And while search engine sites do this (google), this is ALL the wayback machine site does.

    Surely they must know they're treading on untested legal ground. All it might take is one offended copyright holder to bring the whole thing to its knees. Basing it in a country other than the USA might have been smarter, then, given the existence of laws like the DMCA which could serve to shut the site down.

    --
    occultae nullus est respectus musicae - originally a Greek proverb
  13. Re:A lot of internet information is crap... by garcia · · Score: 4, Insightful

    I think that storing everything on computers will make historians jobs MUCH less difficult but a lot less fun.
    Doing historical research is fun b/c you get to get your hands dirty (literally). I spent 6 hours a day for three weeks researching crime rates in Toledo, OH during prohibition (before, during, and after) and b/c the books were all handwritten and they were so old my hands turned black for days at a time...
    It would have been MUCH easier if all the information was sorted and easily found I guess it would make future historians jobs easier but what fun would that be?

    Just my worthless .02

  14. Why only four? by pla · · Score: 4, Insightful

    Out of curiosity, why only four drives per PC?

    With a simple $10 PCI IDE card (per additional 4), you could have gotten at *least* 8 drives, possibly as many as 16, per case. Granted, not many cases will let you *mount* that many, but I would expect paying a few bucks extra for the IDE cards and a better case would save quite a bit of money (and physical space) by halving or quartering the number of PCs you need ($100 extra to save $1500 per $2000, not counting the drives themselves?).

    88lf of machines vs 22lf. One requires an entire room, one would fit on a standard sized 3-or-4-tier storage rack. Of course, speaking of racks (of a different sort)... What on earth made you go with an array of standard PCs rather than a raid-in-a-rack?

    1. Re:Why only four? by jandrese · · Score: 5, Informative

      Probably the limiting factor there is the PCI bus. Modern ATA HDDs tend to saturate vanilla PCI busses (which is why most chipsets have custom busses between the north and southbridge these days). Add ATA cards and your PCI bus quickly becomes saturated and not very good for serving webpages. Worse, since the NIC probably sits on the PCI bus as well, you can easily starve your NIC with too many ATA devices on PCI ATA controllers.

      I know, I have a fileserver at home that has this exact problem, but I don't care if my fileserver is slow so it's not a problem.

      --

      I read the internet for the articles.
  15. The Wayback machine is a lie by corebreech · · Score: 5, Insightful

    Try accessing news stories immediately prior to and after the September 11 attack and you'll see just how valuable this website is... or rather, isn't.

    I have also personally ran a website which contained fairly controversial material (based on this story) that I saw listed on their website and then removed shortly thereafter. Tell me, why would a service like this ever have occasion to remove material once it's been archived, especially if there are *NO* copyright issues and the webmaster of the archived site never asked them to remove it?

    The answer is simple: the powers-that-be saw how dangerous it was to make all this information available to anyone on demand so they took control. It would be a great service were it allowed to operate unfettered, but the reality is quite different.

    And I'm the first to mention this here so far? You should all be modded down -1 for naiveté.

  16. Any bets.... by MDX-F1 · · Score: 5, Interesting

    on how long before a politician has to resign because of some over the top statements he/she made in a flamewar back in college? Or maybe that webpage of ethnic jokes that seemed so hilarious back in high school.

    I have a feeling we are either going to have to become way more forgiving, or we're going to be stuck with only faceless boring types with no opinions as our leaders (no wisecracks, it could be much worse than it is now).