Slashdot Mirror


How the Wayback Machine Works

tregoweth writes: "O'Reilly has an interview with Brewster Kahle about how The Internet Archive's Wayback Machine works, with lots of juicy details about how the biggest database ever built works."

8 of 134 comments (clear)

  1. Interesting Thoughts by nurightshu · · Score: 3, Insightful

    I was glad to see the interviewee was brutally honest about free software -- both its benefits and its drawbacks. Usually discussions among my friends usually degenerate into holy wars, with both of us spouting cliches at one another until we all storm off in huffs.

    Free software can save the world, I think. We just need to realize that it needs a lot more work to get there.

    --
    They that would sacrifice their .sig space for that cliched Franklin quote deserve neither.
  2. Quite a lofty goal... by NOT-2-QUICK · · Score: 3, Insightful
    As per the article, Brewster Kahle states that:

    "The idea is to build a library of everything, and the opportunity is to build a great library that offers universal access to all of human knowledge."

    Not only does this sound like a rather far fetched plot from an old StarTrek episode, but it also seems to be an a physical and theoretical impossibility. Even if adequate storage space did exist for such a task (a 10 TB database would be but a small start), I do not foresee any type of technology that could ever adequately capture new data at a sufficient speed to harness that which is human innovation and creativity.

    It is a nice thought, however, and I certainly wish him all the best in her pursuits...
    --
    Beer is proof that God loves us and wants us to be happy. -- Benjamin Franklin
  3. sigh by __aahlyu4518 · · Score: 2, Insightful

    This world needs more people like that... driven to make this world a better place... and having fun doing it, being proud of what they're trying to accomplish... This interview sent shivers down my spine... These are the kind of interviews that inspire people... It makes me think about humanity a little less sceptic... There's still hope.

  4. good article, lesson on human spirit by f00zbll · · Score: 2, Insightful

    The article was good with all it's warts and gems. I take it more as a testament to the human spirit. I seriously doubt it's the largest database, though it might be the largest publicly accessible database. I'm sure the NSA could easily dwarf their database considering how much data they collect from around the world every day.

  5. Distributed Computing solution... by Tazzy531 · · Score: 3, Insightful

    The interview talked a little about throwing more machines on when the demand deems necessary. I wonder if it is possible to do this over the internet? I mean, I'm seeing something along the lines of SETI, where millions of people worldwide donate their unused processor power. Would it be possible to distribute the searches to remote computers over the internet in real time?

    --


    _______________________________
    "I'm not Conceited...I'm just a realist..."
  6. 200 transactions/second? by selan · · Score: 4, Insightful

    Having so few transactions for a database of this size probably helps them run without needing large expensive machines. Many VLDBs support thousands of transactions per second. I found a list here of top ten winners of a very large database scalability contest. The winner for peak performance was something like 20,000+ TPS.

  7. Re:Not the biggest DB by costas · · Score: 3, Insightful

    I find the claim dubious. Bigger than what kind of database? Wal-Mart if famous for tracking every single little thing about their supply chain. Most grocers or hypermart chains do the same. I can easily see, say A&P or Tesco or Carrefour having multi-TB DBs, even petabyte DBs.

    Also, the size is not the only thing that defines a database installation: numbers of simultaneuous users or concurrent transactions, read or write access, ability to rollback, quality of service standards are way more important in my book (and also for most big companies). Part of the reason DBs in that size range are rare is exactly that current technology does not scale up to those levels while maintaining rollbacks, read-write and fast user response.

    I like the Wayback machine, but to compare it to a proper database is ludicrous. EMC or Veritas will give you much more for their 100TBs of storage than 400 x85 PCs... instant backups for one and way larger MTBF.

  8. Isn't this illegal? by russ-smith · · Score: 3, Insightful
    The majority of information being collected by Archive.org is covered by copyright law. It is up to Archive.org to get permission before they republish the information. If you look at the Archive web site they run banner ads for the Alexa toolbar. This Alexa service provides the marketing with information somewhat similar to the Nielson ratings for TV. Archive.org has received complaints about their service contrary to the statements made in the published article. Archive.org has refused to respond to any meaningful way to these issues. Archive.org is trying to put burden on the publisher to determine that The Archive is publishing it, find it within TheArchive web site and then provide them a notarized statement. see their FAQs at

    http://www.archive.org/exec/faqsidos/about/faqs.ht ml?index=2 and
    http://www.archive.org/exec/faqsidos/about/faqs.ht ml?index=26

    The claims made in these faqs are just not consistent with the law. Are they going to repost everything that was available on Napster?

    They also have some problems with their algorithm so that some domains that are redirected fool their algorithm into associating content with a site that was never actually associated with the site. To try to find copywritten works would be a nightmare. Archive.org has refused to respond to any of these issues and, in fact, are lying about it if the quotes in the article are factual.

    Russ Smith