Slashdot Mirror


Internet Archive Opens Crawler Code Under LGPL

ramakant writes: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"

2 of 186 comments (clear)

  1. Oldest /. emtry by Anonymous Coward · · Score: 5, Interesting
  2. Old slashdot news by AyeFly · · Score: 5, Interesting

    here is a slashdot story from wayback i just found.

    "IBM announces a 25 gigger

    Posted by Hemos on Wednesday November 11, @10:11AM
    from the why-i-could-put-3/4-my-cd-collection dept.
    Booker writes "So IBM announces a 25 gig hard drive... does the world need this yet? Unless this is in a RAID, would you really want to trust 25 gigs on a single drive? What would you use this for? 400+ hours of MP3s comes to mind... "
    Read More...
    64 comments"

    Just thought it was interesting to see, since we now have 200gig HDs

    --
    Sig- http://www.dreamhost.com/rewards.cgi?ayefly