Slashdot Mirror


Internet Archive Opens Crawler Code Under LGPL

ramakant writes: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"

27 of 186 comments (clear)

  1. Mr peabody! by Anonymous Coward · · Score: 5, Funny

    They've open sourced your wayback machine! Now you've lost the monopoly!

    1. Re:Mr peabody! by Anonymous Coward · · Score: 2, Funny

      I don't know about you but I have no problem traveling forward in time. It is getting back that is the real trick.

  2. Cultural artifacts? by SexyKellyOsbourne · · Score: 2, Funny

    You mean works of art like this?

    B1FF#S K3WL H0M3 PAG3!!!

    1. Re:Cultural artifacts? by Lev13than · · Score: 3, Funny

      What I want to know is, how do they keep it from crashing when it reaches here?

      --
      When you have nothing left to burn you must set yourself on fire
    2. Re:Cultural artifacts? by Anonymous Coward · · Score: 0, Funny

      >You mean works of art like this?

      The goggles! They do NOTHING!

    3. Re:Cultural artifacts? by JPelorat · · Score: 1, Funny

      Holy buckets. More like a cultural fartifact.

      --
      Hokey statistics and ancient misconceptions are no match for a good thought in your head, kid!
  3. I thought it sounded like... by Anonymous Coward · · Score: 0, Funny

    ...Heretics or yet another dumb Matrix reference. Or possibly both.

  4. score by TedCheshireAcad · · Score: 5, Funny

    Score! Now I can run my own wayback machine!

    I only have a 30G hard drive though, what do you guys think, bzip should take care of it?

    1. Re:score by bamf · · Score: 5, Funny

      If you limit yourself to only archiving the useful parts of the interweb, you should be able to fit it all on floppy disk or two.

  5. That sounds like a good working app. by DeKoNiNG · · Score: 5, Funny

    From their FAQ: if you are comfortable grabbing code directly from CVS, wrestling with incomplete documentation, and running into undocumented limitations, would you want to use the current software.
    Undocumented limitations? That sounds like a lot of fun!

    --
    Troll: Large Giant, 63 hp, AC 16, Usually chaotic evil.
  6. old torrents by kyoko21 · · Score: 3, Funny

    Nothing like crawling for old, recycled, and dead torrents.

  7. Maaaaamories... by Dorf+on+Perl · · Score: 5, Funny

    This is a great step forward, I welcome our archiving overlords, etc. Right now when I want to share some of my history (the good stuff, natch) with my kids, I have to dig out an old, musty shoebox full of junk. When they want to share theirs with their kids, they'll just beam a URL into my grandkids' in-skull HUDs. While in their flying cars. "Oh look, here's another stupid post to Slashdot by Grandpa..."

  8. Heritrix? by elgrinner · · Score: 3, Funny

    Sounds a bit like Asterix' grandfather.

    --
    But my Mom says I'm cool! -Milhouse
  9. Uh? by Zog+The+Undeniable · · Score: 4, Funny
    Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess.

    WTF is inheritess? I think we have recursive typos here...my head is going to explode!

    --
    When I am king, you will be first against the wall.
  10. Slashdot wayback then... by OpCode42 · · Score: 5, Funny

    Just been looking at some slashdot pages from 1997... quote from the "Post your comments here!" form : "If you don't have anything worthwhile to say, don't say it. If people continue to abuse this feature, I will have to remove it."

    Oh how different things could have been... ;-)

    If the trolls had time machines...

  11. Re:Infamous? by hey · · Score: 3, Funny

    Just wait 20 years when you are trying to get a CEO job and somebody produces your embarrassing old weblog.

  12. Re:Heritrix by hplasm · · Score: 3, Funny
    And what, pray tell, is "inheritess" ?

    A Heritrix.

    --
    ...and he grinned, like a fox eating shit out of a wire brush.
  13. Re:Infamous? by Lester67 · · Score: 3, Funny

    The batting cage that I frequent with the kids hates the fact their web-coupon (with no expiration date) is still stored in the Wayback.

    I think they might agree with "infamous". :-)

  14. Re:In case of /.ing... by Anonymous Coward · · Score: 4, Funny

    Don't you mean: I doubt it'll get slashdotted, but I needed the Karma.

  15. Re:no articles for 4 hours on a weekday morning? by skidoo2 · · Score: 2, Funny

    I was wondering the same thing. Last night I posted a cool article about weird slime on Mars, and it hasn't even been rejected yet.

  16. Re:gpl vs. lgpl? by Anonymous Coward · · Score: 2, Funny

    One is communist, the other is socialist.

  17. What if there's another archive.org by British · · Score: 3, Funny

    ...and archive.org tries to archive it? Will it go into an infinite loop,or just have 2 copies of the interweb?

  18. Re:Oldest /. emtry by Anonymous Coward · · Score: 1, Funny
    But anti-MS comments in da hizzouse!!

    Yea, Slashdot was great before the Microsoft fanboys showed up. Those were the days.

  19. finally! by badansible · · Score: 3, Funny

    I will be able to look at that exciting gopher site everybody was talking about! Yes?

  20. What will happen if... by balbord · · Score: 2, Funny

    ...wayback inadvertently archives itself?!?!

    That reminds me... once I though of googling for "google"... but I didn't since it, no doubtly, wold create a black hole or something!

    --
    "If I have been able to see so far, It is because I went out and bought a damn binoculars" - Ze da Esquina
  21. Even better! by Inoshiro · · Score: 3, Funny

    " Ooopsies...
    Tim
    Sat Dec 20 at 6:37PM EST

    Guess I should read the article before I post. I was under the impression that the next release of IE4 *would* support HTML 4.0...Oh well.
    "

    Guess I should read the article before I post? What a crazy, upside-down world it was back then!

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  22. Re:Infamous? by marnanel · · Score: 2, Funny
    --
    GROGGS: alive and well and living in