Slashdot Mirror


Interview with Brewster Kahle

Netmonger writes "A fascinating interview with the man behind The Wayback Machine. Some specs from the article: "It's 150-odd standard PC cases, with four drives in each.. 'Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.." All I can say is.. Wow!"

13 of 195 comments (clear)

  1. Re:Is this thing backed up? by Anonymous Coward · · Score: 1, Informative

    read the article, it is backed up in two seperate locations, as well as all their old disks.

  2. 100 Terabytes! by insanecarbonbasedlif · · Score: 3, Informative

    I did a quick price check and for 100 terabytes of data on 80GB drives (Best price/size ratio I could find), that's about $111,250 worth of storage. Of course, I guess they would get bulk discounts :).

    --
    Just because I doubt myself does not mean I find your position compelling.
    1. Re:100 Terabytes! by dougmc · · Score: 3, Informative
      The math (100 terabytes, 150 computers, 4 drives per computer) works out to an average of 171 GB/drive. Of course, they said `over 100 TB' so it's actually higher than that.

      Obviously they're using IDE drives. Modern ones. And they must have replaced almost everything at once -- there could a mixture of 200 GB and 120 GB drives, but it would have to be mostly 200 GB drives.

      Pretty neat, but still doesn't hold a candle to google's massive setup :)

      (google must have a *team* of people who's sole job is finding failed computers/drives and replacing them :)

  3. According to... by Cyclopedian · · Score: 4, Informative
    this, the LOC pales in comparision to "3000 miles of shelf space".

    -Cyc

  4. Wayback technology by watchful.babbler · · Score: 5, Informative

    There's an excellent interview with Kahle on technical details at O'Reilly's own archive -- here.

    --
    "Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
  5. Another site, with pics by RhBaby · · Score: 5, Informative

    http://www.mindjack.com/feature/archive.html

    In the interest of full disclosure, I wrote it, so be gentle.

  6. See also by danlyke · · Score: 4, Informative

    For other Brewster Kahle interviews, see also the Slashdot story that pointed to the O'Reilly interview and the Slashdot story that pointed to the Feed magazine interview (which is currently unaccessible from my machine).

    1. Re:See also by Orne · · Score: 3, Informative

      Hehe, that's what the Wayback Machine is for!

      Feed magazine interview, back from the grave...

  7. Re:Why only four? by jandrese · · Score: 5, Informative

    Probably the limiting factor there is the PCI bus. Modern ATA HDDs tend to saturate vanilla PCI busses (which is why most chipsets have custom busses between the north and southbridge these days). Add ATA cards and your PCI bus quickly becomes saturated and not very good for serving webpages. Worse, since the NIC probably sits on the PCI bus as well, you can easily starve your NIC with too many ATA devices on PCI ATA controllers.

    I know, I have a fileserver at home that has this exact problem, but I don't care if my fileserver is slow so it's not a problem.

    --

    I read the internet for the articles.
  8. Vannevar Bush by Mannerism · · Score: 3, Informative

    Technologists have promised the digital library for decades. In 1945, Vannevar Bush, who was technology adviser to several US presidents, wrote an article in The Atlantic magazine outlining how computers might one day augment libraries.

    Those who find this subject interesting, but who may not be familiar with Vannevar Bush's work, might want to read the paper to which Brewster Kahle refers.

  9. Re:The Wayback machine is a lie by watchful.babbler · · Score: 2, Informative
    I have also personally ran a website which contained fairly controversial material (based on this story) that I saw listed on their website and then removed shortly thereafter. ...

    And I'm the first to mention this here so far? You should all be modded down -1 for naiveté.

    Hm. And yet the WayBack Machine has the Project Censored page here, and even the AlterNet story linked therein. Ah, but yes, it must be a conspiracy by the Big Eye In The Pyramid -- someone call Hagbard Celine. Fnord.

    -1, Delusional.

    --
    "Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
  10. Link by Anonymous Coward · · Score: 1, Informative

    Sigh.

    Didn't you mean this?

  11. Archive architecture by yppiz · · Score: 3, Informative

    I worked on some projects with the Internet Archive from 1998 - 2000.

    The Archive's first storage device (circa 1996) was a large StorageTek tape robot with a multi-gigabyte disk cache to handle user requests for archived pages. As drives and processors became cheaper, it became more interesting to use them instead of tape. The cost penalty of using drives over tape is only 2x - 3x, with the enormous win of increased bandwidth and decreased latency (when the request queue for the bot got large, the wait time for a page could be 16 hours. With disk, it's a fraction of a second).

    The first hard-drive based Archive storage used multiple 4U and 5U 12-20 drive Linux/FreeBSD boxes with ~80G IDE drives and Promise cards.

    Drive density is greater now - you can get 200G IDE drives and 320G IDEs are on the way, so you can use regular PCs as opposed to custom or niche-market (rackable server) boxes.

    --Pat / zippy@cs.brandeis.edu