Slashdot Mirror


Internet Archive Opens Crawler Code Under LGPL

ramakant writes: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"

4 of 186 comments (clear)

  1. Oldest /. emtry by Anonymous Coward · · Score: 5, Interesting
  2. Infamous? by BitchAss · · Score: 4, Interesting

    the infamous Wayback Machine

    Why is it infamous? I haven't heard anything bad about it.

    --
    Like sex? Read and write about it! Indecent Blogging
  3. Old slashdot news by AyeFly · · Score: 5, Interesting

    here is a slashdot story from wayback i just found.

    "IBM announces a 25 gigger

    Posted by Hemos on Wednesday November 11, @10:11AM
    from the why-i-could-put-3/4-my-cd-collection dept.
    Booker writes "So IBM announces a 25 gig hard drive... does the world need this yet? Unless this is in a RAID, would you really want to trust 25 gigs on a single drive? What would you use this for? 400+ hours of MP3s comes to mind... "
    Read More...
    64 comments"

    Just thought it was interesting to see, since we now have 200gig HDs

    --
    Sig- http://www.dreamhost.com/rewards.cgi?ayefly
  4. Re:score by corebreech · · Score: 4, Interesting

    I'll use it if you promise not to delete shit that doesn't hew to your ideology.

    That's what really sucks about the Wayback Machine.

    Ever try reading articles from the aftermath of 9/11? It's a great big hole, so much stuff has been deleted.