Slashdot Mirror


Internet Archive Opens Crawler Code Under LGPL

ramakant writes: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"

2 of 186 comments (clear)

  1. no articles for 4 hours on a weekday morning? by zontroll · · Score: 0, Offtopic

    Did Taco die or something?

  2. Re:Mr peabody! by reub2000 · · Score: 0, Offtopic

    Mod parent funny.