Slashdot Mirror


Internet Archive Opens Crawler Code Under LGPL

ramakant writes: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"

9 of 186 comments (clear)

  1. Mr peabody! by Anonymous Coward · · Score: 5, Funny

    They've open sourced your wayback machine! Now you've lost the monopoly!

  2. Oldest /. emtry by Anonymous Coward · · Score: 5, Interesting
  3. score by TedCheshireAcad · · Score: 5, Funny

    Score! Now I can run my own wayback machine!

    I only have a 30G hard drive though, what do you guys think, bzip should take care of it?

    1. Re:score by bamf · · Score: 5, Funny

      If you limit yourself to only archiving the useful parts of the interweb, you should be able to fit it all on floppy disk or two.

  4. That sounds like a good working app. by DeKoNiNG · · Score: 5, Funny

    From their FAQ: if you are comfortable grabbing code directly from CVS, wrestling with incomplete documentation, and running into undocumented limitations, would you want to use the current software.
    Undocumented limitations? That sounds like a lot of fun!

    --
    Troll: Large Giant, 63 hp, AC 16, Usually chaotic evil.
  5. Maaaaamories... by Dorf+on+Perl · · Score: 5, Funny

    This is a great step forward, I welcome our archiving overlords, etc. Right now when I want to share some of my history (the good stuff, natch) with my kids, I have to dig out an old, musty shoebox full of junk. When they want to share theirs with their kids, they'll just beam a URL into my grandkids' in-skull HUDs. While in their flying cars. "Oh look, here's another stupid post to Slashdot by Grandpa..."

  6. Old slashdot news by AyeFly · · Score: 5, Interesting

    here is a slashdot story from wayback i just found.

    "IBM announces a 25 gigger

    Posted by Hemos on Wednesday November 11, @10:11AM
    from the why-i-could-put-3/4-my-cd-collection dept.
    Booker writes "So IBM announces a 25 gig hard drive... does the world need this yet? Unless this is in a RAID, would you really want to trust 25 gigs on a single drive? What would you use this for? 400+ hours of MP3s comes to mind... "
    Read More...
    64 comments"

    Just thought it was interesting to see, since we now have 200gig HDs

    --
    Sig- http://www.dreamhost.com/rewards.cgi?ayefly
  7. Slashdot wayback then... by OpCode42 · · Score: 5, Funny

    Just been looking at some slashdot pages from 1997... quote from the "Post your comments here!" form : "If you don't have anything worthwhile to say, don't say it. If people continue to abuse this feature, I will have to remove it."

    Oh how different things could have been... ;-)

    If the trolls had time machines...

  8. Unless the Archive caves in... by turambar386 · · Score: 5, Informative


    "Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations..."

    That is, unless the digital artifacts in question are, like Operation Clambake opposed to rich and powerful sects. In which case, they are blocked by the Wayback machine after the Archive caves in to DMCA notices.