Internet Archive Opens Crawler Code Under LGPL
ramakant writes: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"
They've open sourced your wayback machine! Now you've lost the monopoly!
You mean works of art like this?
B1FF#S K3WL H0M3 PAG3!!!
...Heretics or yet another dumb Matrix reference. Or possibly both.
Score! Now I can run my own wayback machine!
I only have a 30G hard drive though, what do you guys think, bzip should take care of it?
From their FAQ: if you are comfortable grabbing code directly from CVS, wrestling with incomplete documentation, and running into undocumented limitations, would you want to use the current software.
Undocumented limitations? That sounds like a lot of fun!
Troll: Large Giant, 63 hp, AC 16, Usually chaotic evil.
Nothing like crawling for old, recycled, and dead torrents.
This is a great step forward, I welcome our archiving overlords, etc. Right now when I want to share some of my history (the good stuff, natch) with my kids, I have to dig out an old, musty shoebox full of junk. When they want to share theirs with their kids, they'll just beam a URL into my grandkids' in-skull HUDs. While in their flying cars. "Oh look, here's another stupid post to Slashdot by Grandpa..."
Sounds a bit like Asterix' grandfather.
But my Mom says I'm cool! -Milhouse
WTF is inheritess? I think we have recursive typos here...my head is going to explode!
When I am king, you will be first against the wall.
Just been looking at some slashdot pages from 1997... quote from the "Post your comments here!" form : "If you don't have anything worthwhile to say, don't say it. If people continue to abuse this feature, I will have to remove it."
;-)
Oh how different things could have been...
If the trolls had time machines...
Just wait 20 years when you are trying to get a CEO job and somebody produces your embarrassing old weblog.
A Heritrix.
...and he grinned, like a fox eating shit out of a wire brush.
The batting cage that I frequent with the kids hates the fact their web-coupon (with no expiration date) is still stored in the Wayback.
:-)
I think they might agree with "infamous".
Don't you mean: I doubt it'll get slashdotted, but I needed the Karma.
I was wondering the same thing. Last night I posted a cool article about weird slime on Mars, and it hasn't even been rejected yet.
One is communist, the other is socialist.
...and archive.org tries to archive it? Will it go into an infinite loop,or just have 2 copies of the interweb?
Yea, Slashdot was great before the Microsoft fanboys showed up. Those were the days.
I will be able to look at that exciting gopher site everybody was talking about! Yes?
...wayback inadvertently archives itself?!?!
That reminds me... once I though of googling for "google"... but I didn't since it, no doubtly, wold create a black hole or something!
"If I have been able to see so far, It is because I went out and bought a damn binoculars" - Ze da Esquina
" Ooopsies...
Tim
Sat Dec 20 at 6:37PM EST
Guess I should read the article before I post. I was under the impression that the next release of IE4 *would* support HTML 4.0...Oh well."
Guess I should read the article before I post? What a crazy, upside-down world it was back then!
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Beware the Ghost of Usenet^H^H^H^H^HBlog Postings Past!</gratuitous>
GROGGS: alive and well and living in