Interview with Brewster Kahle
Netmonger writes "A
fascinating interview with the man behind The Wayback Machine. Some specs from the article: "It's 150-odd standard PC cases, with four drives in each.. 'Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.." All I can say is.. Wow!"
How many miles of shelf space equal one Library of Congress? Lets use standard units here people!
It's a shame that some fo the more interesting moments in Internet history are so transient the wayback machine can't catch them.
e.g. The Ded Kitty picture we put up when napster shut down at the star of september, it was only there for a few hours but it will be lost.
Of course, some of the more interesting transient events are websites that are hacked, but there exist dedicated archives for this kind of event, so you can relive the hilarity of RIAA.org being repeatedly defaced.
As plain text in book form, that'd be over 3000 miles of shelf space.."
Huh? How about "If all data was spoken at once, it would be as loud as 674 jet engines!" Or "If this archive were a planet, it would be as large as Jupiter!"
-Cyc
/.'s 10 Millionth
"Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.."
I don't understand terabyte or the shelf space analogy...
I need to know how many banana's.
nbfn
There's an excellent interview with Kahle on technical details at O'Reilly's own archive -- here.
"Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
We're not qualified to judge what "good stuff" is.
For example, a ciouple of centuries ago old household accounts would have been considered valueless. But today's historians find a wealth of social data in them - what did people eat? how much did they get paid? did families tend to enter service together? how often did servants get new clothes?
Disc space is cheap. Keep everything, let future historians sort it out.
You don't consider the archiving of pr0n a noble cause? Don't be so selfish, man, think of future generations!
I mean, hell, forget pr0n, just imagine the blackmail value for the kids of 2020, to be able to dig up pictures of their parents on amihotornot.
http://www.mindjack.com/feature/archive.html
In the interest of full disclosure, I wrote it, so be gentle.
I put in www.archive.org into the wayback machine and my computer exploded!
For other Brewster Kahle interviews, see also the Slashdot story that pointed to the O'Reilly interview and the Slashdot story that pointed to the Feed magazine interview (which is currently unaccessible from my machine).
I was curious to how the Wayback Machine's operators view its legal status... I mean, it's not really a search engine in the broadly accepted meaning of the term. It doesn't just search what's out there, it archives entire pages of old information; And while search engine sites do this (google), this is ALL the wayback machine site does.
Surely they must know they're treading on untested legal ground. All it might take is one offended copyright holder to bring the whole thing to its knees. Basing it in a country other than the USA might have been smarter, then, given the existence of laws like the DMCA which could serve to shut the site down.
occultae nullus est respectus musicae - originally a Greek proverb
I think that storing everything on computers will make historians jobs MUCH less difficult but a lot less fun.
.02
Doing historical research is fun b/c you get to get your hands dirty (literally). I spent 6 hours a day for three weeks researching crime rates in Toledo, OH during prohibition (before, during, and after) and b/c the books were all handwritten and they were so old my hands turned black for days at a time...
It would have been MUCH easier if all the information was sorted and easily found I guess it would make future historians jobs easier but what fun would that be?
Just my worthless
Out of curiosity, why only four drives per PC?
With a simple $10 PCI IDE card (per additional 4), you could have gotten at *least* 8 drives, possibly as many as 16, per case. Granted, not many cases will let you *mount* that many, but I would expect paying a few bucks extra for the IDE cards and a better case would save quite a bit of money (and physical space) by halving or quartering the number of PCs you need ($100 extra to save $1500 per $2000, not counting the drives themselves?).
88lf of machines vs 22lf. One requires an entire room, one would fit on a standard sized 3-or-4-tier storage rack. Of course, speaking of racks (of a different sort)... What on earth made you go with an array of standard PCs rather than a raid-in-a-rack?
Try accessing news stories immediately prior to and after the September 11 attack and you'll see just how valuable this website is... or rather, isn't.
I have also personally ran a website which contained fairly controversial material (based on this story) that I saw listed on their website and then removed shortly thereafter. Tell me, why would a service like this ever have occasion to remove material once it's been archived, especially if there are *NO* copyright issues and the webmaster of the archived site never asked them to remove it?
The answer is simple: the powers-that-be saw how dangerous it was to make all this information available to anyone on demand so they took control. It would be a great service were it allowed to operate unfettered, but the reality is quite different.
And I'm the first to mention this here so far? You should all be modded down -1 for naiveté.
Is this truly the only Earth I can live on?
on how long before a politician has to resign because of some over the top statements he/she made in a flamewar back in college? Or maybe that webpage of ethnic jokes that seemed so hilarious back in high school.
I have a feeling we are either going to have to become way more forgiving, or we're going to be stuck with only faceless boring types with no opinions as our leaders (no wisecracks, it could be much worse than it is now).