Open Source Analog to Microsoft's Index Server?
An Anonymous Coward asks: "I have been tasked by my noble employer to find a better way accessing the 4,000 odd management documents and procedures we have. Currently MS Index Server is being used to provide a fairly good searching system. Index Server (for those that don't know) trawls through files and indexes their content.. ASP is then used to search the resulting database. My question is, there has to be a way to do this with nice open source software? Does anyone know of any competitors to index server that can index microsoft office documents? Thanks!" Might not HT://dig be a good foundation on which to build such a system?
Dont google license their engine (which reads word, powerpoint etc?)
I tried mnogosearch and swish-e. Different plusses and minuses. Later on I discovered that mnogosearch has a PHP front end and can be installed from a Debian package.
My advice is to set up two entirely different search databases. Otherwise it's very difficult to compare hits, ranking performance, or discovered differences in the lexeme policy.
I don't know if you'd consider using Perl, but I've had some good luck with the Fluid Dynamics Search Engine. By default it can search text and PDF documents, and after some work I was able to get it to search the text of Microsoft Word documents too.