How the Wayback Machine Works
tregoweth writes: "O'Reilly has an interview with Brewster Kahle about how The Internet Archive's Wayback Machine works, with lots of juicy details about how the biggest database ever built works."
← Back to Stories (view on slashdot.org)
They don't seem to think the history of their site would be interesting: http://web.archive.org/web/*/http://web.archive.or g/ lredirects you to their index.html! boring!
Now, that would really be a test for their apps. Same as if Google indexed www.google.com (entirely).
100 TBs do not make the biggest DB ever. I am personally working on an 60-70TB ERP system that's also writeable; I am sure there are bigger systems out there (e.g. Wal-Mart's or GM's ERP systems come to mind).
A read-only DB containing highly-compressible text does not really make for a very challenging datamine. Just because it's on and about the Web and sexier than a stodgy ERP system should not make you overlook the real technology.
I just visited some sites from which I hoped that they dissappeared completely from cyberspace. The only defense I've got now are the old cryptic URLs of these monstrosities... Indexing that database would be a disaster, especially with an unusual name like mine...
(Yes, I was stupid enough to use my real name
Damn you, wayback
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]