Slashdot Mirror


Search Engines for Your Intranet or Small Business?

coreboarder asks: "Google recently revamped their nifty little Google Mini. It now does 100,000 documents of 220 different formats, makes your bed, and pours your beer. Where I work we have a reasonably large amount of technical data files (~80,000) of varying formats stored on a number of Windows 2000 and 2003 servers. File access is handled by permissions on the containing folder(s). Over time duplication has crept in because people cannot find what they need where they expect it to be. The $3,000 price point on the Google Mini is very attractive but is their a better way of making files and their content easily findable on a 1000 node network while still retaining their security? We also use ht://dig but it cannot handle all the file formats that would be involved here." In that same vein, Gneral Tsao asks: "As an IT worker for a small research business, I'm trying to find a good text search engine for our subscriber facing publications. After much searching, I've found a few prospects such as Mnogosearch (which we currently use), Nutch, and Swish-e, but really no discussion about or comparison between them. This seems like a job for the Slashdot community. An ideal solution for me would be able to handle 20,000 or so pages, have a customizable PHP frontend, and allow for some amount of control over categorization." Any suggestions?

29 comments

  1. Boutell by Intron · · Score: 3, Interesting

    I run the Boutell search engine on my Company's internal website.

    --
    Intron: the portion of DNA which expresses nothing useful.
    1. Re:Boutell by Roadkills-R-Us · · Score: 2, Informative

      We run ht://dig at work. I use swish and w4ais at home (I maintain them) and on some customer sites.

      I've looked into the Google Mini for work but have some concerns.

      1) The Mini doesn't handle access controls.
      2) The yearly costs for all the Google search
      appliances are, IMO, too high.
      3) Google will only sell you one extra year of
      maintenance. In effect, you're supposed to
      pitch this appliance after two years.

      I really, really like the Google appliance concepts, but I really, really dislike their price structure. I think Barracuda did much better with their spam firewall line on price structures, even though I have some minor gripes with them, too.

  2. Wimp. by Anonymous Coward · · Score: 3, Funny


    Give the users a shell and tell them to read the grep manpage.

    1. Re:Wimp. by Issue9mm · · Score: 0

      I'm sure this is little more than a glib remark, but you might have noted that he specifically mentioned the platforms are Windows 2000 and Windows 2003, which, in default install lack grep, as well as manpages.

      -9mm-

    2. Re:Wimp. by saintp · · Score: 1

      Not even Windows is a totally insurmountable problem.

    3. Re:Wimp. by Jjeff1 · · Score: 1

      Yes, but Windows has Find and find /?, which works ok.

      oh yea and Grep for windows.

    4. Re:Wimp. by Anonymous Coward · · Score: 0

      What about the dfu factor? Where I work if we gave people shell access they'd faint with lack of a gui and its niceties. God the horror.

    5. Re:Wimp. by i.r.id10t · · Score: 1

      Actually, quite a few of the gnu utils (textutils, fileutils, etc) are available in win32 format. Used to be linked to from the gnu.org page (probably still is somewhere) but it is hosted on sf.

      --
      Don't blame me, I voted for Kodos
    6. Re:Wimp. by eUdudx · · Score: 1

      Was it a joke that AC titled his suggestion to use text-based tools "Wimp"? Everybody knows WIMP stands for Windows, Icons, Mice, and Pointers right?

    7. Re:Wimp. by Mprx · · Score: 1

      Windows, Icons, Menus, Pointers

    8. Re:Wimp. by eUdudx · · Score: 1

      and now we have Windows, IE, and Moronic PopUPs but Mr. Mprx was right, my bad.

  3. Yeesh by Anonymous Coward · · Score: 0
    People will complain about anything, huh? Google makes something familiar and known, that perfectly suits his needs at a great price. So buy it!

    The guy deserves all the "Yuo should switch to Lunix beacuase whats wrong with find and locate? i can always find hello.c." replies he's gonna get.

    1. Re:Yeesh by Anonymous Coward · · Score: 0
      who said he was complaining? besides you being the intellectual giant that you are... he stated relatively clearly that he wanted to make a sh!tload of files and their content EASILY accessible to his users. from that you should be able to gather that he kinda wants his users to get results relatively quickly too. it also DOES not appear to meet his needs completely so that's why he's asking. it's called using your resources.


      yeesh did you get past the red hat stage yet? you neophyte.

  4. The Mini is great by jacumba · · Score: 2, Informative

    We picked up a mini about 2 weeks ago. The thing is amazing. From the time I cut open the box it was delivered in, to when I had our entire intranet & internet sites indexed and serving results was only 90 minutes. It's very easy to configure. Overall, it's a steal for any organization needing search.

  5. Nutch by afabbro · · Score: 1

    One project in this area I've been playing with is Nutch.

    --
    Advice: on VPS providers
  6. Nutch by zmarty · · Score: 2, Interesting

    I run Nutch, a project which is now part of Apache Incubator. I'm indexing a few tens of gaming-related websites, on www.playfuls.com. There is a lack of documentationm but if you read and play with the config files, you'll do fine.

    --
    If you can't find a way, make one!
  7. Permision / Security issues by frangipani · · Score: 1

    I can't figure out from the original post how you expect the Google Mini to crawl your content. The mini is limited to only stuff accessible via a website interface. Also, the Google Mini doesn't have any way for you to securely restrict search access to your various content.

    1. Re:Permision / Security issues by Anonymous Coward · · Score: 0

      iis can mount folders virtually. so mount the doc root as the webroot and point the mini it's way. the thing that i don't get is how can it handle permissions. iis would be accessing the files via iusr.

  8. maybe namazu by Shaleh · · Score: 2, Interesting

    Has filters for lots of doc types, you can write more.

    http://www.namazu.org/

  9. long term by Deternal · · Score: 1

    The long term solution is to put your data into groupware - lotus workplace and domino/notes is the example of how this can and should be done.

    Of course workplace has limits to the amount of formats you can import into it, but definitely not the amount of data (well of course hd space, and whatever limit db2 has applies).

    1. Re:long term by Suppafly · · Score: 1

      I can't see how any solution using domino/notes could be considered a good solution.

    2. Re:long term by Deternal · · Score: 1

      Well thats your problem isn't it?
      Best tool for the job and all - the Lotus products are the best groupware tools thus far.

    3. Re:long term by Suppafly · · Score: 1

      I'm glad you think so, but I imagine you'd be hard pressed to find anyone that agrees with you.

    4. Re:long term by Deternal · · Score: 1

      depends on what I ask them - lots of people only know lotus/domino as an e-mail calendaring application to compete with outlook - and untill the 6.x branch the outlook client definitely was easier to use in this regard.

      However there is ALOT more to lotus/domino then just mail and caledaring.

  10. Google Desktop Enterprise Edition by La+Camiseta · · Score: 2, Informative

    Why don't you use the recently released Google Desktop Enterprise Edition? It has access controls, the ability to be pushed out to all of the client computers seemlessly, filters for a huge ammount of files, the option of plugins to read more files, and is completely free.

    1. Re:Google Desktop Enterprise Edition by Vorondil28 · · Score: 1

      ...as free as beer can be.

      --
      This sig rocks the casbah.
    2. Re:Google Desktop Enterprise Edition by Joe5678 · · Score: 1

      Because they said the files were stored on servers while the Google Desktop tool only searches the local computer (You probably wouldn't want 100 computers indexing your network drive anyway).

      That being said, I remember a post by somebody who installed the Google Desktop tool on a single machine, and then hacked it up to index the network drives, and did some more tweaking to allow searches from other computers. Esentially creating his own Google Mini (although I wouldn't be suprised if this were against something in the EULA)

    3. Re:Google Desktop Enterprise Edition by La+Camiseta · · Score: 1

      IIRC Google Desktop Search won't automatically index network drives on the first go-around, but if you open the file up while GDS is runing it will index the file.

      But while reading the features page, it looks like you can run both the Mini or Search Appliance in tandem with GDS Enterprise to both index your intranet as well as let your employees index and search through their content.

      Looks like it could be quite the time saver if you ask me. Being able to type in something like "Oracle" and pop up all of your intranet docs containing the phrase, as well as all emails, chats, and the docs on your computer seems to just scream "improved efficiency."

  11. EnterFind Appliance by BigGerman · · Score: 2, Interesting

    http://www.enterfind.com/
    Supports indexing docs on Windows shares directly (as well as HTTP crawling), supports hundreds of document formats (including exotic ones like dwg files), allows precise control over indexing process and allows access via Web Services API as well as browser.
    No limitations on number of users or documents and fully customizable search page.
    Disclaimer: I participated in the development of this product. They (company) are good people, take care of their customers.