Slashdot Mirror


Search Engines for Your Intranet or Small Business?

coreboarder asks: "Google recently revamped their nifty little Google Mini. It now does 100,000 documents of 220 different formats, makes your bed, and pours your beer. Where I work we have a reasonably large amount of technical data files (~80,000) of varying formats stored on a number of Windows 2000 and 2003 servers. File access is handled by permissions on the containing folder(s). Over time duplication has crept in because people cannot find what they need where they expect it to be. The $3,000 price point on the Google Mini is very attractive but is their a better way of making files and their content easily findable on a 1000 node network while still retaining their security? We also use ht://dig but it cannot handle all the file formats that would be involved here." In that same vein, Gneral Tsao asks: "As an IT worker for a small research business, I'm trying to find a good text search engine for our subscriber facing publications. After much searching, I've found a few prospects such as Mnogosearch (which we currently use), Nutch, and Swish-e, but really no discussion about or comparison between them. This seems like a job for the Slashdot community. An ideal solution for me would be able to handle 20,000 or so pages, have a customizable PHP frontend, and allow for some amount of control over categorization." Any suggestions?

8 of 29 comments (clear)

  1. Boutell by Intron · · Score: 3, Interesting

    I run the Boutell search engine on my Company's internal website.

    --
    Intron: the portion of DNA which expresses nothing useful.
    1. Re:Boutell by Roadkills-R-Us · · Score: 2, Informative

      We run ht://dig at work. I use swish and w4ais at home (I maintain them) and on some customer sites.

      I've looked into the Google Mini for work but have some concerns.

      1) The Mini doesn't handle access controls.
      2) The yearly costs for all the Google search
      appliances are, IMO, too high.
      3) Google will only sell you one extra year of
      maintenance. In effect, you're supposed to
      pitch this appliance after two years.

      I really, really like the Google appliance concepts, but I really, really dislike their price structure. I think Barracuda did much better with their spam firewall line on price structures, even though I have some minor gripes with them, too.

  2. Wimp. by Anonymous Coward · · Score: 3, Funny


    Give the users a shell and tell them to read the grep manpage.

  3. The Mini is great by jacumba · · Score: 2, Informative

    We picked up a mini about 2 weeks ago. The thing is amazing. From the time I cut open the box it was delivered in, to when I had our entire intranet & internet sites indexed and serving results was only 90 minutes. It's very easy to configure. Overall, it's a steal for any organization needing search.

  4. Nutch by zmarty · · Score: 2, Interesting

    I run Nutch, a project which is now part of Apache Incubator. I'm indexing a few tens of gaming-related websites, on www.playfuls.com. There is a lack of documentationm but if you read and play with the config files, you'll do fine.

    --
    If you can't find a way, make one!
  5. maybe namazu by Shaleh · · Score: 2, Interesting

    Has filters for lots of doc types, you can write more.

    http://www.namazu.org/

  6. Google Desktop Enterprise Edition by La+Camiseta · · Score: 2, Informative

    Why don't you use the recently released Google Desktop Enterprise Edition? It has access controls, the ability to be pushed out to all of the client computers seemlessly, filters for a huge ammount of files, the option of plugins to read more files, and is completely free.

  7. EnterFind Appliance by BigGerman · · Score: 2, Interesting

    http://www.enterfind.com/
    Supports indexing docs on Windows shares directly (as well as HTTP crawling), supports hundreds of document formats (including exotic ones like dwg files), allows precise control over indexing process and allows access via Web Services API as well as browser.
    No limitations on number of users or documents and fully customizable search page.
    Disclaimer: I participated in the development of this product. They (company) are good people, take care of their customers.