Interview with Brewster Kahle
Netmonger writes "A
fascinating interview with the man behind The Wayback Machine. Some specs from the article: "It's 150-odd standard PC cases, with four drives in each.. 'Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.." All I can say is.. Wow!"
read the article, it is backed up in two seperate locations, as well as all their old disks.
I did a quick price check and for 100 terabytes of data on 80GB drives (Best price/size ratio I could find), that's about $111,250 worth of storage. Of course, I guess they would get bulk discounts :).
Just because I doubt myself does not mean I find your position compelling.
-Cyc
/.'s 10 Millionth
There's an excellent interview with Kahle on technical details at O'Reilly's own archive -- here.
"Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
http://www.mindjack.com/feature/archive.html
In the interest of full disclosure, I wrote it, so be gentle.
For other Brewster Kahle interviews, see also the Slashdot story that pointed to the O'Reilly interview and the Slashdot story that pointed to the Feed magazine interview (which is currently unaccessible from my machine).
Probably the limiting factor there is the PCI bus. Modern ATA HDDs tend to saturate vanilla PCI busses (which is why most chipsets have custom busses between the north and southbridge these days). Add ATA cards and your PCI bus quickly becomes saturated and not very good for serving webpages. Worse, since the NIC probably sits on the PCI bus as well, you can easily starve your NIC with too many ATA devices on PCI ATA controllers.
I know, I have a fileserver at home that has this exact problem, but I don't care if my fileserver is slow so it's not a problem.
I read the internet for the articles.
Technologists have promised the digital library for decades. In 1945, Vannevar Bush, who was technology adviser to several US presidents, wrote an article in The Atlantic magazine outlining how computers might one day augment libraries.
Those who find this subject interesting, but who may not be familiar with Vannevar Bush's work, might want to read the paper to which Brewster Kahle refers.
Please donate your spare CPU cycles to help fight cancer and other diseases
And I'm the first to mention this here so far? You should all be modded down -1 for naiveté.
Hm. And yet the WayBack Machine has the Project Censored page here, and even the AlterNet story linked therein. Ah, but yes, it must be a conspiracy by the Big Eye In The Pyramid -- someone call Hagbard Celine. Fnord.
-1, Delusional.
"Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
Sigh.
Didn't you mean this?
I worked on some projects with the Internet Archive from 1998 - 2000.
The Archive's first storage device (circa 1996) was a large StorageTek tape robot with a multi-gigabyte disk cache to handle user requests for archived pages. As drives and processors became cheaper, it became more interesting to use them instead of tape. The cost penalty of using drives over tape is only 2x - 3x, with the enormous win of increased bandwidth and decreased latency (when the request queue for the bot got large, the wait time for a page could be 16 hours. With disk, it's a fraction of a second).
The first hard-drive based Archive storage used multiple 4U and 5U 12-20 drive Linux/FreeBSD boxes with ~80G IDE drives and Promise cards.
Drive density is greater now - you can get 200G IDE drives and 320G IDEs are on the way, so you can use regular PCs as opposed to custom or niche-market (rackable server) boxes.
--Pat / zippy@cs.brandeis.edu