Slashdot Mirror


Juggernaut GPLd Search Engine

real bio pointed us to Juggernautsearch which actually looks interesting. Its GPLd. It can index 800 million pages every 3 months and deliver 10 million pages a day on a Pentium II. So I guess if you want to run your own Altavista, you can.

3 of 86 comments (clear)

  1. Search Engine: GPL - Database/Crawler: $$$$ by turg · · Score: 4

    From what I read here and here: the "Juggernaut search Engine" and the "Juggernaut Search Engine Crawler" are two separate pieces of software. The former is GPLed. The latter is not for sale but you can purchase the database it creates (or get a demo/sampler subset of the database for free)





    -
    <SIG>
    "I am not trying to prove that I am right... I am only trying to find out whether." -Bertolt Brecht

    --
    <sig>Guvf vf abg n frperg zrffntr
  2. Distributed effort ? by Dilbert_ · · Score: 4

    I have been wondering for a while now : couldn't building the index for such a search engine be distributed (like SETI@HOME or RC5) ? The server would do the actual page serving, querying etc, but the spidering would be done by the clients. They'd each receive a batch of URL's from the server and start indexing them, collecting lists of URL's and sending those back to the server. The server weeds out the doubles, and assigns those URL's to the clients again. The more people would participate, the bigger the index would grow, as the available bandwidth increased also.

    Hmmm... maybe I should patent this...

    --
    superblog.org: all your favourite blogs on o
  3. Make wild claims; get free /. publicity by gbnewby · · Score: 4
    Need a free search engine? Try ht://dig. It's been around awhile, and is stable and highly configurable. It includes a spider, but is more suitable for medium sized collections, not the whole Web.

    Examination of their ftp distribution site reveals this is an early work in progress...most docs are "under construction," and even their helpers.txt (supposedly giving credit to others) is basically empty.

    I'll post more if/when their src tarball ever finishes downloading (54M - whew!...and the site is getting /.'ed right now). My guess is they drew heavily from ht://dig, WAIS, SMART and other public-source search engines and spiders.

    For those who can't get through to the site: they hope to sell subscriptions to their database, so that you can run their search engine internally. It's not clear whether they intend to license the spider/crawler or just the database.

    Meanwhile, to those who have complained that easy searches turn up with nil results: read the page, dudes! It says clearly that you're searching a minimal test collection, but can search the whole thing (on your local system, seems like) for a subscription fee.

    Credibility break: I'm an information science professor and design/evaluate alternate information retrieval systems.