Domain: htdig.org
Stories and comments across the archive that link to htdig.org.
Stories · 3
-
Search Engines for Your Intranet or Small Business?
coreboarder asks: "Google recently revamped their nifty little Google Mini. It now does 100,000 documents of 220 different formats, makes your bed, and pours your beer. Where I work we have a reasonably large amount of technical data files (~80,000) of varying formats stored on a number of Windows 2000 and 2003 servers. File access is handled by permissions on the containing folder(s). Over time duplication has crept in because people cannot find what they need where they expect it to be. The $3,000 price point on the Google Mini is very attractive but is their a better way of making files and their content easily findable on a 1000 node network while still retaining their security? We also use ht://dig but it cannot handle all the file formats that would be involved here." In that same vein, Gneral Tsao asks: "As an IT worker for a small research business, I'm trying to find a good text search engine for our subscriber facing publications. After much searching, I've found a few prospects such as Mnogosearch (which we currently use), Nutch, and Swish-e, but really no discussion about or comparison between them. This seems like a job for the Slashdot community. An ideal solution for me would be able to handle 20,000 or so pages, have a customizable PHP frontend, and allow for some amount of control over categorization." Any suggestions? -
A Grep-like Utility That Works on More than Text?
Nutria writes "This article got me thinking: What's a poor Unix-using guy to do, when he needs to grep text, compressed tarballs, OO.o documents, Debian archives, mime-encoded files, Evil Microsoft documents, PDF files, compressed AbiWord files, etc." Is there an extensible searching program for Unix that can handle a variety of different file-types? Search engines like ht://Dig can accomplish part of this task, however currently it doesn't index the whole file (just portions of the metadata). If you had to perform a substring search on a set of documents of different types, what tools would you use to accomplish this task? -
Open Source Analog to Microsoft's Index Server?
An Anonymous Coward asks: "I have been tasked by my noble employer to find a better way accessing the 4,000 odd management documents and procedures we have. Currently MS Index Server is being used to provide a fairly good searching system. Index Server (for those that don't know) trawls through files and indexes their content.. ASP is then used to search the resulting database. My question is, there has to be a way to do this with nice open source software? Does anyone know of any competitors to index server that can index microsoft office documents? Thanks!" Might not HT://dig be a good foundation on which to build such a system?