Slashdot Mirror


Internet Archive Opens Crawler Code Under LGPL

ramakant writes: "It looks like the Internet Archive, which hosts the infamous Wayback Machine has opened its newest in-development crawler code under the LGPL. From the announcement: 'Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess. Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.'"

186 comments

  1. Mr peabody! by Anonymous Coward · · Score: 5, Funny

    They've open sourced your wayback machine! Now you've lost the monopoly!

    1. Re:Mr peabody! by dukeluke · · Score: 1

      The Way Back machine - one way that we as a species of tech savvy gurus can travel back in time...now, if only they could figure out how to reverse the technology and travel forward ;-)

      no sig needed to make this message unique

    2. Re:Mr peabody! by Anonymous Coward · · Score: 0

      Quiet you!

    3. Re:Mr peabody! by Anonymous Coward · · Score: 2, Funny

      I don't know about you but I have no problem traveling forward in time. It is getting back that is the real trick.

    4. Re:Mr peabody! by Anonymous Coward · · Score: 0

      Good 2 Whack

    5. Re:Mr peabody! by ackthpt · · Score: 1
      They've open sourced your wayback machine! Now you've lost the monopoly!

      Mr. Peabody never makes a mistake. Didn't you learn anything, Sherman? It was the right thing to do.

      Trivia: I bought the season 1 DVD of Rocky and Bullwinkle and saw the original spelling was 'WAYBAC'

      --

      A feeling of having made the same mistake before: Deja Foobar
    6. Re:Mr peabody! by Anonymous Coward · · Score: 0

      hell, i wrote this comment and didn't even think it deserved +4.. hell, if I'd have known that, I wouldn't have checked 'post anonymously'

    7. Re:Mr peabody! by dukeluke · · Score: 1

      (LOL) - true, travelling forward in time is actually quite feasible. It would take a man travelling at insane velocities - and then he would have to stop at a calculated time. This would slow time down for him - while on Earth, everything remained normal. He could travel years in advance - yet, the technology is not there for us to achieve and sustain the said procedures.

      I've always wondered why we'd want to travel forward in time - if, if it was as of yet unknown how to come back.

      Any ideas on that?

    8. Re:Mr peabody! by Issue9mm · · Score: 1

      I travel forward in time all day long. I'm doing it now!!!

      -9mm-

    9. Re:Mr peabody! by reub2000 · · Score: 0, Offtopic

      Mod parent funny.

    10. Re:Mr peabody! by nutsy · · Score: 1

      Yes, but did you figure out that WAYBAC was a riff on ENIAC, EDVAC, and UNIVAC? (The Rocky and Bullwinkle shows: still going over viewers' heads after all these years!)

    11. Re:Mr peabody! by ackthpt · · Score: 1
      Yes, but did you figure out that WAYBAC was a riff on ENIAC, EDVAC, and UNIVAC?

      Yeah, I knew that as a _kid_ back in the 70's. It's still classic humor and the coldwar overtones would probably work under Bush's regime these days.

      "Look Muhammad, is Moose and Squirrel!"

      --

      A feeling of having made the same mistake before: Deja Foobar
  2. gpl vs. lgpl? by Anonymous Coward · · Score: 3, Interesting

    could someone summarize the differences?

    fp?

    1. Re:gpl vs. lgpl? by Anonymous Coward · · Score: 2, Insightful

      this ain't OT. The guy asked what the difference was between the GPL and LGPL. LGPL being the license the wayback code is being placed under, the opening of the code being the topic of discussion. Therefore, the post couldn't be any more on-topic.

      For chrissakes moderators! It says that the code is LGPL in the freakin' article HEADLINE!! We already have enough trouble with people not RTFA, an occasional someone who didnt read the submitter's post, and now we have moderators not RTFH to deal with too!!

    2. Re:gpl vs. lgpl? by Anonymous Coward · · Score: 2, Funny

      One is communist, the other is socialist.

    3. Re:gpl vs. lgpl? by TheSpoom · · Score: 1
      From the GNU LGPL Preamble:

      Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs.

      When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library.

      We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances.

      For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License.

      In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system.

      Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library.
      --
      It's better to vote for what you want and not get it than to vote for what you don't want and get it.
      - E. Debs
  3. Cultural artifacts? by SexyKellyOsbourne · · Score: 2, Funny

    You mean works of art like this?

    B1FF#S K3WL H0M3 PAG3!!!

    1. Re:Cultural artifacts? by Lev13than · · Score: 3, Funny

      What I want to know is, how do they keep it from crashing when it reaches here?

      --
      When you have nothing left to burn you must set yourself on fire
    2. Re:Cultural artifacts? by Anonymous Coward · · Score: 0, Funny

      >You mean works of art like this?

      The goggles! They do NOTHING!

    3. Re:Cultural artifacts? by JPelorat · · Score: 1, Funny

      Holy buckets. More like a cultural fartifact.

      --
      Hokey statistics and ancient misconceptions are no match for a good thought in your head, kid!
    4. Re:Cultural artifacts? by bgarcia · · Score: 1
      Take a look at the source HTML for that page. It's actually very organized & easy to read.

      It's a shame that the resulting page hurts my eyes so much!

      --
      I'm a leaf on the wind. Watch how I soar.
  4. In case of /.ing... by Dave2+Wickham · · Score: 4, Informative

    The source download is available on sourceforge.

    I doubt it'll get slashdotted, but you never know...

    1. Re:In case of /.ing... by Anonymous Coward · · Score: 4, Funny

      Don't you mean: I doubt it'll get slashdotted, but I needed the Karma.

    2. Re:In case of /.ing... by Dave2+Wickham · · Score: 1

      Ah, typical, it's sourceforge which has decided to slow down, not crawler.archive.org.

      *sigh*

    3. Re:In case of /.ing... by Dave2+Wickham · · Score: 1

      Not really, I already have excellent karma, and even if I didn't, who cares about it?

    4. Re:In case of /.ing... by Anonymous Coward · · Score: 0

      Don't you mean: I doubt it'll get slashdotted, but I needed the Karma.


      Not everybody thinks like you.

  5. I thought it sounded like... by Anonymous Coward · · Score: 0, Funny

    ...Heretics or yet another dumb Matrix reference. Or possibly both.

  6. Then maybe by caston · · Score: 4, Insightful

    OSDN can decide to open source source forge...

    --
    Beings aspergers AND pulling chicks... I enjoy the challenge!
    1. Re:Then maybe by Anonymous Coward · · Score: 0

      The formerly open source SourceForge stuff is now managed under a project called GForge. They've made many advances to the code and generally cleaned it up.

    2. Re:Then maybe by moeffju · · Score: 1

      You mean, as in http://alexandria.sf.net/ ?

      --
      follow me on Twitter: http://twitter.com/moeffju
  7. Oldest /. emtry by Anonymous Coward · · Score: 5, Interesting
    1. Re:Oldest /. emtry by CmdrTostado · · Score: 1

      The oldest /. entry has a link to older articles? That's too weird, even for me.

    2. Re:Oldest /. emtry by eraserewind · · Score: 1

      Wow, slashdot used to look much nicer than it's current ugly bloated mess.

    3. Re:Oldest /. emtry by Anonymous Coward · · Score: 1, Funny
      But anti-MS comments in da hizzouse!!

      Yea, Slashdot was great before the Microsoft fanboys showed up. Those were the days.

    4. Re:Oldest /. emtry by Anonymous Coward · · Score: 0

      And here's a nice little profetic quip that would foreshadow much of future slashdot content!!

      Tim wrote:
      >"Guess I should read the article before I post."

    5. Re:Oldest /. emtry by Anonymous Coward · · Score: 0

      Wow. Slashdot specially formatted for 640x480 resolution. Wow.

  8. score by TedCheshireAcad · · Score: 5, Funny

    Score! Now I can run my own wayback machine!

    I only have a 30G hard drive though, what do you guys think, bzip should take care of it?

    1. Re:score by bamf · · Score: 5, Funny

      If you limit yourself to only archiving the useful parts of the interweb, you should be able to fit it all on floppy disk or two.

    2. Re:score by mahdi13 · · Score: 1

      Thats a great idea!
      I'm sure you can find 4-5 Terrabytes of drive space laying around somewhere!
      I have about 60GB I can donate! =P

      --
      "Some things have to be believed to be seen." - Ralph Hodgson
    3. Re:score by Anonymous Coward · · Score: 0

      No... I think you're going to need lzip for this one.

    4. Re:score by Elendil · · Score: 1

      You can't. SCO now claims ownership of every line of GPL code. Barely stretching it, the Internet Archive (and thus Internet itself) can be seen as SCO's IP as "derivative work". You'll send a $699 check to the order of D. McBride, Salt Lake City UT 84101 every time you connect to your ISP. Ka-ching!

    5. Re:score by corebreech · · Score: 4, Interesting

      I'll use it if you promise not to delete shit that doesn't hew to your ideology.

      That's what really sucks about the Wayback Machine.

      Ever try reading articles from the aftermath of 9/11? It's a great big hole, so much stuff has been deleted.

    6. Re:score by OverclockedMind · · Score: 0

      um, dude, i think there is alot more pron than that!!

      --
      if you can read this, good, because i sure cant
    7. Re:score by netsharc · · Score: 1

      I'll take your offer of donation. :)

      --
      What time is it/will be over there? Check with my iPhone app!
    8. Re:score by Anonymous Coward · · Score: 0

      Ooh? Examples, or has that been discussed/reported with examples anywhere?
      A quick search for articles, 9-11, deleted did not reveal anything, but if you have a link or two....

      Please, check out "Uncovered: The Whole Truth About the Iraq War", a movie by Robert Greenwald. He interviews "(...) Joseph Wilson, the retired diplomat who investigated claims that Iraq was shopping for uranium in Niger and found them baseless; Patrick Lang, former chief Middle East analyst for the Pentagon's Defense Intelligence Agency; Chas Freeman, former ambassador to Saudi Arabia; CIA veteran Robert Baer; and more than a dozen others." (Salon Quote. It also shows clips from speeches made (by Bush, Rumsfeld, etc) before the Iraq "incident" (liberation, invation, you pick one). Seeing them today really is a strange feeling, as the tune has changed so much, and few of the mainstream media has asked critical questions about the claims that were put forward in those speeches.

      I would like to believe it was not picked up again because of the load of NEW news, but if someone is deleting... (Picking up the hat of you-know-what.)

      The movie can legally by download with Bittorrent. Check out the documentary section of suprnova.org and look for "UNCOVERED (Hi-Q) Why Bush REALLY (...)".
      Note: Most of the other downloads there are not so legal.

  9. The code is pretty clean, too... by tcopeland · · Score: 4, Informative

    ...some unused variables and such-like in there, though, as reported by PMD.

  10. That sounds like a good working app. by DeKoNiNG · · Score: 5, Funny

    From their FAQ: if you are comfortable grabbing code directly from CVS, wrestling with incomplete documentation, and running into undocumented limitations, would you want to use the current software.
    Undocumented limitations? That sounds like a lot of fun!

    --
    Troll: Large Giant, 63 hp, AC 16, Usually chaotic evil.
  11. old torrents by kyoko21 · · Score: 3, Funny

    Nothing like crawling for old, recycled, and dead torrents.

  12. This is great news by CompWerks · · Score: 2, Informative

    Open source that handles over 300tb of data!

    --
    If you can read this sig - the bitch fell off.
  13. Gordon Mohr by Orasis · · Score: 3, Informative

    Congrats Gojomo!

    This project was written by the brains behind bitzi and some really cool P2P stuff.

    He's one of those guys thats going to be working on important stuff for years to come.

  14. What about... by herrvinny · · Score: 3, Insightful

    Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess.

    I know some grammar nazi is going to see this, so I might as well get it first. What about heretic: one who dissents from an accepted belief or doctrine.

    1. Re:What about... by Anonymous Coward · · Score: 0


      s/grammar/(lexicography|semantics|philolop hy|anyth ing but not grammar)/
      </nazi>

    2. Re:What about... by FrankoBoy · · Score: 1

      I've heard it will be available on Unique-based systems soon, stay tuned.

    3. Re:What about... by Anonymous Coward · · Score: 0

      I don't get it...what about "heretic"? It's a different word entirely (and derived from Greek, not Latin).

  15. Fortune cookie by __aahlyu4518 · · Score: 1

    Beneath this article I noticed this fortune cookie:

    "Insanity is hereditary. You get it from your kids. "

  16. Heritrix by Anonymous Coward · · Score: 0

    > Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / > > heritix / heretix / heratix) is an archaic word for inheritess.

    And what, pray tell, is "inheritess" ?

    1. Re:Heritrix by hplasm · · Score: 3, Funny
      And what, pray tell, is "inheritess" ?

      A Heritrix.

      --
      ...and he grinned, like a fox eating shit out of a wire brush.
    2. Re:Heritrix by Anonymous Coward · · Score: 0

      A female who inherits your goods when you die, oftentimes known as wife, or daughter.

      -- vranash

  17. Maaaaamories... by Dorf+on+Perl · · Score: 5, Funny

    This is a great step forward, I welcome our archiving overlords, etc. Right now when I want to share some of my history (the good stuff, natch) with my kids, I have to dig out an old, musty shoebox full of junk. When they want to share theirs with their kids, they'll just beam a URL into my grandkids' in-skull HUDs. While in their flying cars. "Oh look, here's another stupid post to Slashdot by Grandpa..."

    1. Re:Maaaaamories... by Pseudonym · · Score: 1

      You misspelled "maaaaammaries". This is the web we're talking about, you know.

      Hope this helps.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    2. Re:Maaaaamories... by TheSpoom · · Score: 1

      With the amount of pr0n out there, I think he's hit it head on ;^)

      --
      It's better to vote for what you want and not get it than to vote for what you don't want and get it.
      - E. Debs
  18. Infamous? by BitchAss · · Score: 4, Interesting

    the infamous Wayback Machine

    Why is it infamous? I haven't heard anything bad about it.

    --
    Like sex? Read and write about it! Indecent Blogging
    1. Re:Infamous? by hey · · Score: 3, Funny

      Just wait 20 years when you are trying to get a CEO job and somebody produces your embarrassing old weblog.

    2. Re:Infamous? by Lester67 · · Score: 3, Funny

      The batting cage that I frequent with the kids hates the fact their web-coupon (with no expiration date) is still stored in the Wayback.

      I think they might agree with "infamous". :-)

    3. Re:Infamous? by BitchAss · · Score: 1

      So, don't use your real name :)

      --
      Like sex? Read and write about it! Indecent Blogging
    4. Re:Infamous? by powlow · · Score: 1

      ha haha :)

      just checked it out and it is kind of scary that its all there...my old site versions...

      new slogan :

      way back machine : your permanent record, online, all day, everyday!

    5. Re:Infamous? by glaHHg · · Score: 1

      Infamous is when you're more than famous. This wayback machine is not just famous, it's INfamous.

    6. Re:Infamous? by BitchAss · · Score: 1

      Not so much...here's some dictionary.com action:

      - Having an exceedingly bad reputation; notorious.

      - Causing or deserving infamy; heinous: an infamous deed.

      Don't mean to be all geeky, but, this *IS* slashdot :)

      --
      Like sex? Read and write about it! Indecent Blogging
    7. Re:Infamous? by Anonymous Coward · · Score: 0

      You're an idiot.

    8. Re:Infamous? by acceleriter · · Score: 1

      Then you just DMCA them, like the (few) savvy companies that had embarrasing information in archive.org. They'll take it down.

      --

      CEE5210S The signal SIGHUP was received.

    9. Re:Infamous? by MushMouth · · Score: 1

      It easier than that, you just ask them nicely and they take it down.

    10. Re:Infamous? by Anonymous Coward · · Score: 0

      It's easier still - you add a robots.txt, and their server will take it down.

    11. Re:Infamous? by kevcol · · Score: 1

      Shouldn't they be happy that it is still driving business to them or does the coupon offer totally free service?

    12. Re:Infamous? by Lester67 · · Score: 1

      Nah... The old coupon was 5 for $5. The new coupon is 5 for $7.

      You'd think they'd look at it like that, but they don't. Instead that look at it all scowling and pissed... across the counter... and the poor guy holding the coupon...

    13. Re:Infamous? by marnanel · · Score: 2, Funny
      --
      GROGGS: alive and well and living in
    14. Re:Infamous? by kevcol · · Score: 1

      That's too funny- "Hi- I'm an archive.org customer- I'd like my usual, please! And easy on the scowl, if you don't mind." I'm going to have to scour for other old coupons just to be pain in the ass. :-)

      Is the '5' five minutes in the cage? Not that it matters- I just haven't gone to a batting cage in more years than I care to admit and I was just curious what they get.

    15. Re:Infamous? by base3 · · Score: 1

      That's not an option for most corporations, unfortunately :).

      --
      One CPU cycle wasted on digital restrictions management is ONE TOO MANY.
  19. Re:Google's IPO by agentforsythe · · Score: 1

    in english?

  20. Uh Oh by ResQuad · · Score: 1

    I think we /.'d sf.net...either that or its conviently not accessable right after I see it linked from slashdot.

  21. Heritrix? by elgrinner · · Score: 3, Funny

    Sounds a bit like Asterix' grandfather.

    --
    But my Mom says I'm cool! -Milhouse
  22. Uh? by Zog+The+Undeniable · · Score: 4, Funny
    Heritrix (sometimes spelled heretrix , or misspelled or missaid as heratrix / heritix / heretix / heratix) is an archaic word for inheritess.

    WTF is inheritess? I think we have recursive typos here...my head is going to explode!

    --
    When I am king, you will be first against the wall.
    1. Re:Uh? by gojomo · · Score: 2, Informative

      'Inheritess' is femal form of 'inheritor' -- 'someone who inherits' (female). AKA 'heiress'.

    2. Re:Uh? by phiala · · Score: 2, Informative
      The OED online is my friend!

      As a confirmed sesquipedalian, and obsessive research-addict, how could I overlook the oportunity to learn new words? And of course, share my newfound knowledge with you all...

      The OED would like us all to know:
      heritrix, heretrix: A female heir or heritor; an heiress.
      heritress: An heiress, an inheritress.
      inheritress: A female inheritor; an heiress. (Less technical than inheritrix.)
      inheritrix: Latinized fem. of INHERITOR

      inheritess: not a word

      And there you have it, courtesy of madmen and murderers. Well, one anyway, plus a whole collection of fellow logophiles.

      --
      I prefer to be called Evil Scientist.
    3. Re:Uh? by jdavidb · · Score: 1

      Inheritess is not a typo for inheritance. It means a female who inherits.

    4. Re:Uh? by corrie · · Score: 1

      There is no such word as "inheritess".

      The nearest existing English word is inheritress, the archaic form of which is inheritrix

  23. Old slashdot news by AyeFly · · Score: 5, Interesting

    here is a slashdot story from wayback i just found.

    "IBM announces a 25 gigger

    Posted by Hemos on Wednesday November 11, @10:11AM
    from the why-i-could-put-3/4-my-cd-collection dept.
    Booker writes "So IBM announces a 25 gig hard drive... does the world need this yet? Unless this is in a RAID, would you really want to trust 25 gigs on a single drive? What would you use this for? 400+ hours of MP3s comes to mind... "
    Read More...
    64 comments"

    Just thought it was interesting to see, since we now have 200gig HDs

    --
    Sig- http://www.dreamhost.com/rewards.cgi?ayefly
    1. Re:Old slashdot news by Anonymous Coward · · Score: 0

      640 gigs outta be enough for anyone

    2. Re:Old slashdot news by Anonymous Coward · · Score: 0

      In 1992, at a computer expo there was a HUGE sale going on, $1 per meg of space, to a max of 140 megs (who the hell would ever use a whole 140 megs anyway?). 1997- What do you mean you have a file system that requires at least 100 megs of free space for file tables?!!!

    3. Re:Old slashdot news by WuphonsReach · · Score: 1

      Just thought it was interesting to see, since we now have 200gig HDs

      Check your rear-view mirrors more closely... that's a 300Gb drive passing you by (Maxtor 300GB Ultra ATA/133 for only ~$275-$290). Price is falling pretty nicely for them too (when they came out in September they were $350).

      Of course, we saw the same arguments that you quoted there when the 300Gb drives came out... does the world need this yet? Unless this is in a RAID, would you really want to trust 300 gigs on a single drive? What would you use this for?...

      --
      Wolde you bothe eate your cake, and have your cake?
  24. Slashdot wayback then... by OpCode42 · · Score: 5, Funny

    Just been looking at some slashdot pages from 1997... quote from the "Post your comments here!" form : "If you don't have anything worthwhile to say, don't say it. If people continue to abuse this feature, I will have to remove it."

    Oh how different things could have been... ;-)

    If the trolls had time machines...

    1. Re:Slashdot wayback then... by stevesliva · · Score: 1
      Oh how different things could have been
      Notice the unattributed slashdot quote of the day today, "I'm not proud."
      --
      Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
    2. Re:Slashdot wayback then... by adpowers · · Score: 1

      And right now it is "Spelling is a lossed art."

      Maybe later today it'll become "Duplicates are unavoidable."

  25. I probably would have done this differently... by Rahga · · Score: 4, Insightful

    Ever since the wayback machine started making waves, I'd guess about 2 years ago, I've noticed 2 things: There are far less updates of the archives, and it seems that the archive is regularly unable to keep up with the client load we impose on it.

    To be honest, I don't have a great answer for the second problem. The only thing that could help there is the passage of time and advancement of technology, really. For the first problem, though, perhaps a SETI-ish distributed "Heritrix" could help make regularly archiving all of these sites a managable affair. IA sends marching orders out to the distributed volunteer network, each clients downloads, compares MD5 of the pages with other clients, compresses them, and sends them back to a master archive. Sounds great in theory, at least at first, to me...

    Then again, would I do this, or even continue the project if I was in charge? No, I wouldn't. While, ideally, every page on the internet would be in XHTML, striking a major blow against signal:noise (hey, my own page is XHTML validated, how about yours?), the vast majority of time spidering is undoubtable wasted on re-downloading several dozen kilobytes of dynamically generated junk surrounding the content on sites such as CNN.com... While it's a noble cause, it's also a futile one.

    1. Re:I probably would have done this differently... by benja · · Score: 2, Interesting

      Ever since the wayback machine started making waves, I'd guess about 2 years ago, I've noticed 2 things: There are far less updates of the archives, and it seems that the archive is regularly unable to keep up with the client load we impose on it.

      I think that they possibly intentionally limit their bandwidth, so that it's faster to browse the real Web than them (because they don't want to become Google cache when a site is slashdotted, for example).

      (Although they only would if the page in question is old enough... they have a policy of pages going in only 6 months after they have been spidered, probably for the same reason as above.)

    2. Re:I probably would have done this differently... by adpowers · · Score: 1

      I thought the reason they don't get the pages for 6 months is because Alexa (in exchange for sponsorship) gets the exclusive rights to the archive for the first 6 months. I'm too lazy to look it up now, but I think I read that.

    3. Re:I probably would have done this differently... by Anonymous Coward · · Score: 0

      While, ideally, every page on the internet would be in XHTML, striking a major blow against signal:noise

      XHTML is no different to HTML in this regard; it contains all the deprecated presentational cruft that is in HTML 4.01.

      The difference is between Transitional and Strict - the difference between these two versions of XHTML are the same differences you will find between HTML 4.01 Transitional and Strict.

    4. Re:I probably would have done this differently... by Anonymous Coward · · Score: 0

      (hey, my own page is XHTML validated, how about yours?)

      The Hyper Text Transfer Protocol is used to transfer Hyper Text Markup Language documents. If you wanna fuck everything up, then host your "own page" over some XHTTP protocol so that I have no chance to stumble across it and see a pitiful rant enjoining people to take the road towards the bastardization of documents.

    5. Re:I probably would have done this differently... by benja · · Score: 1

      Could well be, then I would've jumped to conclusions on that point, sorry :)

    6. Re:I probably would have done this differently... by adpowers · · Score: 1

      I found this in the FAQ:

      Why are there no recent archives in Wayback?

      Wayback does not add pages less than 6 months after they are collected. Updates can take up to 12 months in some cases.

      There is no access to files before they appear in Wayback.
      -------------------

      I couldn't find exactly what I was looking for, but I am pretty sure that is how it works. However, this quote is interesting:

      "The Internet Archive contains over 100 Terabytes of compressed data. This data is collected in collaboration with Alexa Internet. Alexa sends its crawlers out into the web roughly once every 2 months, retrieves copies of virtually everything it encounters, and donates a copy of this data to the Internet Archive. During periods of particular interest, such as a presidential election or extraordinary breaking news, relevent sites will be crawled more frequently, roughly every 2 to 8 hours.

      The Internet Archive began archiving data in 1996. The archive grows at a rate of approximately 70 megabytes per second. A data pool of this magnitude offers a myriad of research ideas worth exploring and we encourage you to do so!"

      from http://www.archive.org/web/researcher/data_availab le.php

    7. Re:I probably would have done this differently... by Afromelonhead · · Score: 1
      For the first problem, though, perhaps a SETI-ish distributed "Heritrix" could help make regularly archiving all of these sites a managable affair. IA sends marching orders out to the distributed volunteer network, each clients downloads, compares MD5 of the pages with other clients, compresses them, and sends them back to a master archive. Sounds great in theory, at least at first, to me...

      There actually is a program out there called Grub that tries to follow this concept. I had contributed to the project in its infancy, but once it was bought out by LookSmart, I kinda moved away from it. A lot of people were complaining about Grub's utter lack of respect for no crawl sections of sites and robots.txt. It might have changed a little bit since then to actually support robots.txt, so it might be worth your try.

      --
      Procrastination sucks.
  26. Wayback = Genealogy of AI Minds by Mentifex · · Score: 3, Interesting


    The Internet Archive serves the hidden purpose of preserving the AI source-code DNA of artificial Minds.
    Each AI Mind leaves a source code trace of itself as it evolves and proliferates across the 'Net and the parsecs of nearby meatspace.
    Robot Minds will be able to look up their ancestors in the Internet archive, just as we humans do. However, when the Joint Stewardship of Earth by man and cyborg has arrived in the form of the Technological Singularity, robots will be able to resurrect their AI Mind ancestors and bring them back to alife from the Internet Archive.

  27. Clone by RoC+MasterMind · · Score: 1

    I wonder how long it will be till we see a new site open using the code...

    1. Re:Clone by v_1_r_u_5 · · Score: 1

      All hell would break loose if the two competing sites started archiving each other!

  28. Re:[OT] Gnome 2 question by fatwreckfan · · Score: 1
  29. Redundancy? by Anonymous Coward · · Score: 3, Interesting

    The Internet is huge. But get rid of all the redundancy and the size goes down by a huge factor. How many copies of the Linux kernel and distros are there? How many copies of Matrix Reloaded? Do an MD5 sum and store pointers in order to recreate the structure of the net, keeping only one copy of what is unique. Terrabyte servers are cheap these days. Wouldn't need more than a few at the most to archive everything.

  30. no articles for 4 hours on a weekday morning? by zontroll · · Score: 0, Offtopic

    Did Taco die or something?

    1. Re:no articles for 4 hours on a weekday morning? by skidoo2 · · Score: 2, Funny

      I was wondering the same thing. Last night I posted a cool article about weird slime on Mars, and it hasn't even been rejected yet.

    2. Re:no articles for 4 hours on a weekday morning? by Anonymous Coward · · Score: 0

      Don't worry. Give it a few minutes, and it will be.

  31. Infamous? by Anonymous Coward · · Score: 0, Interesting

    which hosts the infamous Wayback Machine has opened?

    What exactly is infamous about the Wayback Machine? I did not know it was generally hated.

  32. Disgusting... Ban this freak. by Anonymous Coward · · Score: 0

    Pedophilia too. Ugh!

  33. Re:Eddie Gentry, Sad Victim of Slashdot by Anonymous Coward · · Score: 0


    entertaining fiction - but sort of unproductive.

  34. Inflammable by Anonymous Coward · · Score: 0

    "Did you know that flammable and inflammable mean the same thing?
    Boy, did I find that out the hard way" - Woody from Cheers

  35. Biff by rs79 · · Score: 1

    I know BIFF. BIFF is my friend. SexyKellyOsbourne you are no BIFF.

    (BIFF never used numbers)

    --
    Need Mercedes parts ?
  36. Kinda scary.... by imsabbel · · Score: 1

    Slashdot without comments would have around the same information density as a book without letters...

    --
    HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
  37. Asterix for a New Generation by Anonymous Coward · · Score: 0

    "Matrix" - The guy in the village you see in the background, and notice that there are about 30 of the same guy.
    "Unix" - The guy sued by Roman IP attorneys
    "Asterick" - Because no one can pronounce "Asterisk" any more.
    "Xenix" - To make the Asterix comics more inclusive, they have now added a female warrior defender for the village.

  38. Hmm by mobby_6kl · · Score: 0

    Why is archive.org arichived :) so many times on the 18 Sept 2001? There are actually more - "Note some duplicates are not shown. See all." then there are about 7500 entries, mostly in the same year. I opend about 10 of them and they seem to be the same.

  39. Re:[OT] Gnome 2 question by Anonymous Coward · · Score: 0

    Is someone standing next to you, holding a gun to your head? Why are you forced to use Gnome?

  40. Cause it doesn't work half the time? by rs79 · · Score: 1

    It's a great (cough) offsite backup, but very frustrating when you can't get all the pieces.

    --
    Need Mercedes parts ?
    1. Re:Cause it doesn't work half the time? by smitty45 · · Score: 1

      the web frontend is not so great, but rest assured once you get ssh access, everything works excellently, actually.

  41. Unless the Archive caves in... by turambar386 · · Score: 5, Informative


    "Since our crawler seeks to collect the digital artifacts of our culture for the benefit of future researchers and generations..."

    That is, unless the digital artifacts in question are, like Operation Clambake opposed to rich and powerful sects. In which case, they are blocked by the Wayback machine after the Archive caves in to DMCA notices.



    1. Re:Unless the Archive caves in... by burtonator · · Score: 1

      Not true... they are just dark archives.

      The content is still there it's just not available to the CURRENT generation.

      Future researchers and generations will still have this data.

      If you want the latest just go to xenu.net..

      For the record I support Brewster's and the Archives position on this. It's hard to know who is more evil... the CoS of the anti-CoS folks ;)

      (quick answer... the CoS is pure evil! ;)

      I've had a few fights with the CoS myself:

      http://www.peerfear.org/rss/permalink/2002/12/14 /1 039905788-Scientology_and_the_Blocked_Internet_Arc hive.shtml

      http://www.peerfear.org/download/scientology-tak ed own.html

      Kevin

    2. Re:Unless the Archive caves in... by jesterzog · · Score: 1

      In which case, they are blocked by the Wayback machine after the Archive caves in to DMCA notices.

      As upsetting as this is, I don't think it's fair to blame the Wayback Machine for this. They have to protect their own interests first to keep the service going at all. Becoming a martyr in a costly legal battle for political ideals may not fit into that. Companies don't have the freedom or flexibility of individuals, and this is the same reaction that nearly every other business and organisation would probably have, short of those whose primary objective is to fight silly legislation. The only difference is that most aren't directly affected by the DMCA.

      The problem (temporarily ignoring the Church of Scientology) is the flawed legislation that forces businesses and organisations to do this. The DMCA is the reason that the Wayback machine has to do this.

    3. Re:Unless the Archive caves in... by Anonymous Coward · · Score: 0

      If they're not willing to fight for BEING AN ARCHIVE then they should drop the pretense and instead call it "the mirror of stuff no one has complained about".

  42. What if there's another archive.org by British · · Score: 3, Funny

    ...and archive.org tries to archive it? Will it go into an infinite loop,or just have 2 copies of the interweb?

  43. "Heritrix" explained by skidoo2 · · Score: 2, Informative

    Sheesh. Let me put this one to bed before it snowballs into a big cloud of impenetrable Times New Roman.

    I'm tempted to shout, but I won't. Don't make me shout!

    "Heretrix" is a term most often seen in a geneaology context. It denotes a chick who is designated to inherit (or has already inherited) the estate of someone. Example sentence: "Captain Dork married Jack Dipstick's heretrix Gassy Lucy."

    In most cases the word "heretrix" connotes that there was something significant about the inherited estate, e.g. lots of cash.

    Now shut up already! :-)

  44. Re:No it isn't, ignore the AC. by Anonymous Coward · · Score: 0

    Which one?

  45. finally! by badansible · · Score: 3, Funny

    I will be able to look at that exciting gopher site everybody was talking about! Yes?

  46. Do it yourself archiving? by TheRedHorse · · Score: 1

    Guess this solves this guys problem.

  47. Re:[OT] Gnome 2 question by Anonymous Coward · · Score: 0
    Sekrit Gnome version generation script:

    s/K/Gn/g

  48. How long? by Raven42rac · · Score: 0, Redundant

    How long until SCO claims that the code is theirs?

    --
    I hate sigs.
  49. Why use this crawler? by glinden · · Score: 1

    There's a huge number of open source web crawlers available already on SourceForge and elsewhere. Anyone know the advantages and disadvantages of this one over the others?

  50. Re:Wayback = Eternal life for geeks by Dusabre · · Score: 1

    And many a geek without a RL will achieve eternal life when their personality (as expressed through pointed comments), experiences (as expressed through pointless anecdotes) and knowledge (as expressed through worthless advice) and thus their consciousness and LIVING MIND ITSELF, is painstakingly put back together by the same future race which will unfreeze the richer geeks from their cryogenic deathsleeps, from the myriad holographic shreds on the archived internet.

    Think about it...

    Everything you've ever said... it all came from you...

  51. hope they GPL their parallel processing code too by Anonymous Coward · · Score: 0

    I hope the Archive does the same thing with their parallel programming system called P2.

    It's a script execution environment that they use for processing the archive data.

    http://www.archive.org/web/researcher/parallel.p hp

  52. Not at all. by Anonymous Coward · · Score: 0
    Slashdot without comments would have around the same information density as a book without letters.


    More as a bridge without trolls...
    1. Re:Not at all. by cyt0plas · · Score: 1

      When I'm in need of directions, I find the trolls (slightly) more useful than the bridge. Not that I come to slashdot for directions. Talk about the blind leading the blind.

      --
      Contact Me (got tired of viruses emailing me).
    2. Re:Not at all. by Anonymous Coward · · Score: 0

      Slashdot without comments would have around the same information density as a book without letters.

      More as a bridge without trolls...

      ...and without the bridge.

  53. LGPL from Wikipedia (GFDL typo?) by Famatra · · Score: 1

    I went to the GNU main site to try and figure out what the LGPL was about, and no luck at all getting a coherent explanation.

    Wikipeda has a good explanation (below), although I am confused as to why the way back machine choose this particular licence since it seems to really be specifically for software libraries. Perhaps they meant the GFDL (GNU Free Documentation License).

    P.S. Your allowed to copy all the stuff you want from Wikipeda its copylefted with the GFDL itself! :)

    --- Wikipedia Article on LGPL ---
    http://en2.wikipedia.org/wiki/GNU_Lesser_General_P ublic_License

    GNU Lesser General Public License
    From Wikipedia, the free encyclopedia.


    The GNU Lesser General Public License is a software license designed as a compromise between the GNU General Public License and simple permissive licenses such as the BSD license and the MIT License.

    It places a copyleft restriction on individual source code files but does not copyleft the program as a whole. The license is useful for software libraries; it was once called the GNU Library General Public License.

    1. Re:LGPL from Wikipedia (GFDL typo?) by spydir31 · · Score: 1

      Are you trolling?
      why would their crawler not be code?

  54. Re:1nf4m0u5? by Deraj+DeZine · · Score: 1

    Actually, the more-than-famous thing is a reference to the movie Three Amigos where they didn't understand the exact meaning of the word...

    --
    True story.
  55. What will happen if... by balbord · · Score: 2, Funny

    ...wayback inadvertently archives itself?!?!

    That reminds me... once I though of googling for "google"... but I didn't since it, no doubtly, wold create a black hole or something!

    --
    "If I have been able to see so far, It is because I went out and bought a damn binoculars" - Ze da Esquina
  56. Even better! by Inoshiro · · Score: 3, Funny

    " Ooopsies...
    Tim
    Sat Dec 20 at 6:37PM EST

    Guess I should read the article before I post. I was under the impression that the next release of IE4 *would* support HTML 4.0...Oh well.
    "

    Guess I should read the article before I post? What a crazy, upside-down world it was back then!

    --
    --
    Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
  57. Important clarifications (!!!) by gojomo · · Score: 4, Informative

    Heritrix is just a crawler for collecting web resources recursively, within some defined parameters -- it doesn't offer Internet Archive Wayback Machine (IA WM) functionality.

    FYI, there is a GPL'd web access tool that's very much like the IA WM, and even surpasses it in some ways: the NWA (Nordic Web Archive) Toolset 1.0. It doesn't do crawling, but if you can coerce what you've crawled into its input format, it offers URL-based, date-based, and full-text search plus "back-in-time" viewing of an archive. (Check out their demo, but remember it's only got a small number of pages from www.nb.no, so confine your searches to things like "Norway".)

    Heritrix release 0.2.0 was mainly a test of our new release procedure; we would not recommend the code for outside use yet. We use it for crawls of up to hundreds of sites, taking a week or more to complete, but it still requires expert attention to crawl well.

    We intend to improve its stability and scalability until it is capable of web-scale crawls -- billions of pages -- but that requires many incremental improvements, including extension to run on networks of cooperating crawling machines -- not planned until later in the year. (Heritrix currently crawls from a single machine.)

    We are eager for contributors who would like to extend Heritrix in various ways, especially ways that would make it more valuable to researchers, librarians, and archivists. Optional modules for new fetch protocols, new media format link-extractors, or on-the-fly content-analysis to help direct further crawling would all be very interesting to us.

    IA currently receives almost all of its full-web collection via an agreement with Alexa Internet, who have been crawling the web for the Internet Archive since 1996.

    (P.S.: Yes, 'inheritess' should be 'inheritRess'/'heiress'. Oops.)

    1. Re:Important clarifications (!!!) by WampagingWabbits · · Score: 1

      I've been working on an open source web crawler, which is part of Mobilemaps, an open source alternative to Google's new "Search by Location" demo.

      The Mobilemaps spider acts like a traditional crawler except that it also locates US/UK street addresses on Web pages.

      I'm sure the code can be improved on significantly, but it may be worthwhile looking over. It uses MySQL to store the data. It relies on processes rather than threads because of the old LWP perl library I used. It can be hooked up over multiple machines, but is perhaps too processor intensive in the code that conforms to the robots exclusion standard. We've indexed 10's of millions of pages with it so we've had to work around most of the obscure issues you run into doing spidering.

      We would also like to move in the directions you plan, namely collaborative spidering - which makes a lot of sense for a location based search like ours because each search engine only needs their local Web pages indexed on a small machine.

      Philip Abrahamson
      Mobilemaps Development Team

  58. This is not the Wayback Machine code. by InvisiBill · · Score: 2, Interesting
    A friend from another messageboard is working on this project, and just posted to let us know that he's been /.ed (which is sort of a cool thing in the geek world).
    And of course they got it all wrong. Heritrix != WayBackMachine.

    Heritrix gathers web pages (harvests)
    The WayBackMachine gives access to harvested material.

    Also Heritrix is a new web crawler meant to replace the one that IA has been using (which is owned by Alexa Internet).

    That's what he had to say about it. The post and the article both say it's the crawler, but the title states that it's the Wayback Machine. The two parts are separate though, and this is only the crawler part.

  59. Ah, but the thing is... by Kjella · · Score: 1

    ...while there may be unique content, there's certainly not unique versions. I'm sure there's many different rips of Matrix Reloaded. First off, there's all the various screener / preview dvd / telesync / DVD releases.

    Then there's all the corrupted versions (a single unnoticable bit error = different MD5). Different rips (Macrovision removed/not removed, inverse telecine, PAL/NTSC versions, different resizing (bicubic/bilinear/Lanczos3).

    Some made using XviD, some DivX, some WMV, different versions of the codec, different settings of the codec, different audio codecs, different audio/video bitrate mix.

    And even if none of that were true, you really don't seem to get how mind boggingly big the Internet is. I read somewhere that they estimated there was well over 100 years of commercial cinema film (screen time) produced.

    Let's assume that there exists one, and only one copy of each film (in return, we'll assume that every movie exists online, which might not be entirely accurate. But it's not far from it) Say 1h45 & 700mb DivX rips (to be kind) and those are at least 350,000 Tb alone. Multiply by a factor of 8 or so for original DVDs (which are also online many places now)

    Then there's the hundreds of thousands of albums around. www.allmusic.com lists 5,155,636 tracks in their database. At an average of 3mb/song (a low estimate) you're talking another 15,000 Tb.

    Then there's all the CD-ROM titles (applications, games, encyclopedias, whatever), books, databases, statistics and all sorts of other data that are available online. Not to mention all the homemade content that is available online, if only to a limited audience (like e.g. on homepages and such). Even though a couple pics don't do much, they add up when millions of people do it.

    So, on a guesstimate I think you'd rather need a server on the order of 1 exabyte (1,000,000 tb) rather than "a few (terabyte servers) at the most". Already my personal network (desktop+server) has 500gb+ of data alone, so if I share all that you're already past a quarter of your 2 tb server...

    Kjella

    --
    Live today, because you never know what tomorrow brings
  60. Gr. 350 Tb and 15 Tb, respectively. And 1 petabyte by Kjella · · Score: 1

    Did the math using mb, when I thought I was operating in gb. So I was off by a factor of 1000. So the correct guessitmate would be 1 petabyte (1000 Tb).

    Kjella

    --
    Live today, because you never know what tomorrow brings
  61. Re:Wayback = Eternal life for geeks by Anonymous Coward · · Score: 0
    FUCK YOU!<comment> As I was reading your pointless comment<anecdote>, I thought that you should STFU!<advice>

    /welcome to slashdot

  62. spam by krokodil · · Score: 2, Insightful

    I am afraid spammers may use this code
    to harvest web pages for email addresses.

    1. Re:spam by elemental23 · · Score: 2, Informative

      Don't lose any sleep over it, spammers have had tools to harvest the web for e-mail addresses for years.

      Insightful?

      --
      I like my women like my coffee... pale and bitter.
  63. Damn that internet archive! by Lord+Bitman · · Score: 1

    Maybe with the code released, I can find out why it constantly tauns me by having a cache of everything EXCEPT what I want!

    --
    -- 'The' Lord and Master Bitman On High, Master Of All
  64. Stop giving open source movement undeserved credit by jbn-o · · Score: 2, Insightful

    Open source that handles over 300tb of data!

    Please don't be like Mark Webbink, Red Hat's general counsel, and give the open source movement undeserved credit. Adding a license to a list of approved licenses is trivial compared to writing the license and creating a community. The Lesser General Public License (formerly the Library General Public License) was written by the Free Software Foundation well before the open source movement was formed. The LGPL was written as a compromise in order to spread free software but strategically give up the ability to preserve software freedom in derivative works.

  65. Re:gpl vs. lgpl? (answered) by DonGar · · Score: 3, Informative

    I'm quite certain that people will correct me (at length) if I'm wrong, but here goes.

    The GPL says that you can use source and code anyway that you want, but if you release modified versions, you must release the modified source under GPL.

    The LGPL is intended for libraries that are released until the GPL. It says that commercial and other non-GPL projects can use this library without becoming GPL, but that changes to the library itself must be released under the LGPL.

    LGPL is generally considered a lighter weight version of the GPL, and it normally used for things like system libraries. Without the LGPL, it wouldn't be possible to (legally) write closed source software for Linux, since the license for glibc (the standard system library) would require all apps linked against it be GPL.

    --
    plus-good, double-plus-good
  66. Quote from article by mkro · · Score: 1
    From the linked Salon article:
    The administration wasn't talking about finding actual weapons anymore. Now the rhetoric was about weapons programs, which might mean little more than sheets of paper. "I had no faith or confidence that the media would catch them on their moving of their goal," he says. "Suddenly, I could see the headline in a month where they're going to announce victory because they found programs. I flashed back on all those news conferences where they said Iraq is a danger and invoked Armageddon.
    If it IS true a lot of post-9/11 stuff has been deleted, and there is a connection to the White House's changes of focus, I feel... like this must be fiction. The editing job Winston Smith had comes to mind. Uh, yeah, but about the deleted stuff from the Wayback machine? URL?
    --
    I shall go and tell the indestructible man that someone plans to murder him.
    1. Re:Quote from article by Anonymous Coward · · Score: 0

      You wanna explain to me how somebody can post a URL to something that no longer exists?

    2. Re:Quote from article by mkro · · Score: 1

      No, but he might be able to provide a link to an article or a discussion giving examples of that this has happened, as the parent post asked for.

      --
      I shall go and tell the indestructible man that someone plans to murder him.
    3. Re:Quote from article by Anonymous Coward · · Score: 0

      Given that you can't link to something that isn't there, why not review the articles archived around 9/11 and see for yourself?

  67. Re:Eddie Gentry, Sad Victim of Slashdot by Anonymous Coward · · Score: 0

    Since when does a 2 bedroom condo have a basement?

  68. It could reveal a few things... by Skiron · · Score: 1
  69. Re:Gr. 350 Tb and 15 Tb, respectively. And 1 petab by Anonymous Coward · · Score: 0

    gb/mb = grambit/millibit = 1000 grams.

    So either you don't know what you're doing, or you were off by a factor of 1 kg.

  70. Pedantic point of language... by Anonymous Coward · · Score: 0

    "Heritrix" would not mean "inheritness" - I'm not sure that's the word you are seeking either. Either inheritEDness, or for a lot of uses, "heredity" would work better.

    However, neither of these would apply to the latin word in question. "Heritrix" is the female form of "heritor", in this case meaning "she who inherits".
    Still crops up as a legal term in some places..

    Nothing to do with the original post, really, but are we not all committed to the spreading of knowledge?

    Good news about the Heritrix code, any way you read it!

  71. Java... why not COBOL ? by Anonymous Coward · · Score: 0

    I was wondering was written in Perl, Python, C, C++, Ruby ... but no, it's Java... I hope this can run on free VM ;-)

  72. "Thanks for the mammaries, ..." by Anonymous Coward · · Score: 0

    I've always wanted to say that.

  73. Because it's top notch by JohnQPublic · · Score: 1

    Brewster Kahle and Alexa Internet are the real deal. This isn't some undergrad's CS-101 project, it's a tool designed from the very start to archive the entire web. And it does it on a regular basis. Even if there's a really good SourceForge project (you didn't cite any of them), Alexa's should be a first stop for anyone interested in the task.

    1. Re:Because it's top notch by glinden · · Score: 1

      Ah, okay, so you're saying that it's more mature than other options out there. Better code base, higher performance, more robust parsing routines, better error handling, designed to run on a large server farm, etc. Thanks, that's what I wanted to know.

  74. Objective Ministries will hear about this by Laconian · · Score: 1

    Heretics? This will join the FreeBSD devil and "Darwin" on Objective's list as to why Open Source is the spawn of the Devil.

  75. i thought i saw... by burns210 · · Score: 1

    I first read the headline and i thought it said the Internet Archive would be archiving L/GPL code.

    That would be cool actually, like a 1stop shop for all the opensource cvs servers... get to see the linux kernel from .01 to 2.6.0 and a couple thousand other applications too. Oh well, the real story is neat too.

  76. SourceForge *IS* open source by TheSpoom · · Score: 1

    As said above, OSDN *HAS* open sourced SourceForge. You can obtain it at the Alexandria Development Project on SourceForge. Please try to do some research prior to saying things like this. That said, it is true that like many open source projects, SourceForge can only be used for open source software development. For commercial, closed source development using the SourceForge system, try SourceForge Enterprise Edition from VA Software, the original developers of SourceForge.

    --
    It's better to vote for what you want and not get it than to vote for what you don't want and get it.
    - E. Debs
  77. Re:Heritrix, inheritess. by maysonl · · Score: 1

    Archaic for heiress: i.e female who inherits.

  78. Heritix written in Java by Anonymous Coward · · Score: 0
    Interesting, Heritix is written in Java.

    It turns out that there are other open source crawlers that also have been written in Java. For a comprehensive listing go here:

    Crawlers in Java