Slashdot Mirror


Open Source Analog to Microsoft's Index Server?

An Anonymous Coward asks: "I have been tasked by my noble employer to find a better way accessing the 4,000 odd management documents and procedures we have. Currently MS Index Server is being used to provide a fairly good searching system. Index Server (for those that don't know) trawls through files and indexes their content.. ASP is then used to search the resulting database. My question is, there has to be a way to do this with nice open source software? Does anyone know of any competitors to index server that can index microsoft office documents? Thanks!" Might not HT://dig be a good foundation on which to build such a system?

9 of 38 comments (clear)

  1. Not Open Source, but... by Fifster · · Score: 2, Interesting

    It's not open source, but Sherlock for MacOS (part of the OS) has always featured hard drive or folder indexing features that can scan contents of documents fairly quickly and efficiently. I've not seen its performance on a /huge/ archive, though.
    --Fifster

  2. Google? by isorox · · Score: 5, Interesting

    Dont google license their engine (which reads word, powerpoint etc?)

    1. Re:Google? by Naikrovek · · Score: 4, Interesting

      Yes: http://www.google.com/appliance/

      However, this is not open source, it is not free and doesn't at all meet the goals of the person asking the question.

  3. two I tried by epine · · Score: 3, Interesting


    I tried mnogosearch and swish-e. Different plusses and minuses. Later on I discovered that mnogosearch has a PHP front end and can be installed from a Debian package.

    My advice is to set up two entirely different search databases. Otherwise it's very difficult to compare hits, ranking performance, or discovered differences in the lexeme policy.

  4. Namazu and Bool by rhkramer · · Score: 2, Interesting

    Check out this page on twiki.org: http://twiki.org/cgi-bin/view/Codev/SearchEngineVs GrepSearch -- it discusses some search engines that have been / are being considered to replace the grep based search on TWiki.

    To me, Namazu and Bool sound promising, but some others are discussed there as well.

    TWiki is a Perl and cgi based wiki, and Namazu seems to be able to integrate into a .cgi based environment quite well, and can index Word documents.

    Hope this helps!

  5. Zope and DocumentLibary by mwr · · Score: 2, Interesting

    Haven't tried the latter, but it may fit the bill. DocumentLibrary home

  6. glimpse by stor · · Score: 2, Interesting

    Hello there.

    I have done all of this before in a commercial environment using Glimpse and Perl.

    I'd recommend you check out glimpse and webglimpse. They ought to do what you are after, for free.

    Cheers
    Stor

    --
    "Yeah well there's a lot of stuff that should be, but isn't"
  7. Some Perl Engines by agentZ · · Score: 3, Interesting

    I don't know if you'd consider using Perl, but I've had some good luck with the Fluid Dynamics Search Engine. By default it can search text and PDF documents, and after some work I was able to get it to search the text of Microsoft Word documents too.

  8. Try Xapian (Was commercial) by samjam · · Score: 2, Interesting

    Try xapian, www.xapian.org, about to undergo it's first release.

    It is based on an temporary open-source release of one of SmartLogik's products.

    I swear by it and find it highly flexible.

    I guess, though, unless you are a hacker - say capable of using to actually index your documents, you might want to wait for the next release.

    I use it in preference to htdig, swish++ and others I have looked at and sadly left; xapian is very fast and easily passes the 2G limit systems such as swish++ suffer from, and supports dynamic aggregation of multiple indexes into one search!

    Sam