Slashdot Mirror


What Desktop Search Engine For a Shared Volume?

kriston writes 'Searching data on a shared volume is tedious. If I try to use a Windows desktop search engine on a volume with hundreds of gigabytes the indexing process takes days and the search results are slow and unsatisfying. I'm thinking of an agent that runs on the server that regularly indexes and talks to the desktop machines running the search interface. How do you integrate your desktop search application with your remote file server without forcing each desktop to index the hundred gigabyte volume on its own?'

18 of 232 comments (clear)

  1. Call the NSA by Anonymous Coward · · Score: 5, Funny

    They already have it indexed for you.

  2. wow! by tivoKlr · · Score: 5, Funny
    It's been an hour since this story was posted.

    You've stumped Slashdot. Bravo!

    --
    Ocean is land, covered with water.
  3. Google Enterprise Search by HeavyD14 · · Score: 5, Interesting

    Not that I've ever used it before, but it sounds like it does what you want: http://www.google.com/enterprise/search/gsa.html

  4. solution to hundreds of terabytes of docs by Anonymous Coward · · Score: 3, Interesting

    how about using a program like Documentum? We generate several thousand technical documents and drawing a month, and use it for all our document management needs.

  5. A couple of options by Unhandled · · Score: 4, Informative

    Here's a few options you might want to consider: 1) Use Office SharePoint Server 2007 to index the share 2) Upgrade to Windows Server 2008 (or above) and Windows Vista (or above) and use the Federated search feature: http://trycatch.be/blogs/roggenk/archive/2007/11/05/windows-vista-amp-windows-server-2008-federated-search.aspx

    1. Re:A couple of options by cyber-dragon.net · · Score: 5, Funny

      You are on /. and actually recommending an upgrade to Vista?
      Brave man.

    2. Re:A couple of options by popeyecu · · Score: 3, Interesting

      I got your back. I love Federated Search, and so do my clients. It's way easier than any other solution, because it's in Windows and it "Just Works." Try it before you bash it, /.

    3. Re:A couple of options by blowdart · · Score: 3, Informative

      Oh well, if we're recommending MS solutions on slashdot (ah karma suicide) then good old Windows Desktop Search works just as well. Since V4.0 came out you can have WDS on other machines, indexing away and it's the remote index that is queried - so no need for local machines to index remote shares. Plus, like sharepoint (spit) indexing, and Index Server before that it uses iFilters, so format aware indexing is available for most of the common formats a business uses.

  6. Federated Search by Anonymous Coward · · Score: 5, Informative

    MS does have a solution, it's called Windows Federated Search. Windows 7 with 2008R2 has it .. there might be a way to do with Windows Desktop Search 4.0. Here's some info on it - http://geekswithblogs.net/sdorman/archive/2009/05/14/windows-7-federated-search.aspx

  7. Enterprise Content Management with Alfresco by RicRoc · · Score: 5, Informative

    Yes, Google's Search Appliance (GSA) could be used, I have seen it used with limited success. The main problem was how to respect access control on documents: either you index them or you don't, and if you index them with GSA, sensitive data may show up in search results. Also, we had a lot of trouble "taming" GSA: it would regularly take down servers that were dimensioned for light loads.

    I would suggest using Alfresco http://www.alfresco.com/ as a CIFS (Common Internet File System) or WebDav store for all those documents. This would give you the simplicity of a shared folder and the opportunity to enrich the documents with searchable metadata such as tags, etc. Each folder (or any item, in fact) could have the correct access control that would be respected by the search engine, Lucene. http://lucene.apache.org/java/docs/

    Alfresco comes in both Enterprise and Community Edition, it's very easy to try out -- even our non-techie project manager could install it on his PC within 10 minutes. Try that with Documentum, FileNet or IBM DB2 Content Manager!

    --
    Who?
  8. Mirror it. by palegray.net · · Score: 4, Funny

    You could just rsync the shared volume to a local drive as frequently as needed and run the search engine on the local copy.

    1. Re:Mirror it. by Makoss · · Score: 5, Insightful

      Have you ever actually used rsyng on a decent sized file set? Determining the changed file set requires significant disk activity.

      It's a certain win when compared to just blindly transferring everything. But if you think that rsyncing 20 changed files in a 100 file working set is the same as rsyncing 20 changed files out of a 2,000,000 file working set you are very very wrong.

      Completely aside from the absolute insanity of suggesting that you replicate the full contents of the fileserver to every desktop, which has been covered by others.

      --
      Building a better backup.
      Zettabyte Storage
  9. How about Spotlight? That works on shared volumes. by thedbp · · Score: 3, Informative

    *ducks*

  10. Use MSS 2008 Express, SharePoint, FAST by VTBlue · · Score: 3, Interesting

    Use Microsoft Search Server 2008 Express...its free, all you need is a free server box. Also Check out SharePoint Search and FAST enterprise search.
    http://www.microsoft.com/enterprisesearch

  11. Re:How about Spotlight? That works on shared volum by Henriok · · Score: 3, Informative

    Yeah, my thought exactly? I wasn't aware that it was a problem searching hundreds of gigabytes on shared volumes. We have a couple of terabytes shared by our Mac servers and I don't think I've had search times longer than ten seconds over a couple of million files.. MS Office files, PDFs, movies, audio, pictures, photographs, text, HTML, source code.. all indexed with metadata and contents.

    Even the days before Spotlight, using AppleShare IP Servers in the 90s, finding stuff on the servers was never an issue. It has always been so fast that I have never even reflected over that it was fast. Maybe I should use some other operating system once in a while to experience what the majority experiences. Or not.. I'd rather stay care free and productive.

    Don't call me when you figure this out.

    --

    - Henrik

    - when the Shadows descend -
  12. NO! Try Alfresco by thule · · Score: 4, Informative

    SharePoint is $$$$. Try Alfresco. Alfresco can look like a file share (support SMB, DAV, FTP, etc). The indexing is built is and does not require a separate SQL Server license.

  13. Re:NO! Try Alfresco by Orion+Blastar · · Score: 4, Informative

    You mean the Document Management Alfresco and not the CMS software. The Community Edition is free but unsupported, and the Enterprise edition has a free 30 day trial. It looks like it won a government award for document management which is rare for open source document management software.

    --
    Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
  14. Re:Use Windows Indexing Service by bhpaddock · · Score: 3, Informative

    For indexing files, you're better off using Windows Search 4, a free download for Windows Server 2003. The old content indexing service is deprecated and a much older technology. It's useful in some particular scenarios but for a smaller (100,000 - 250,000 items*) corpus of file content, WS4 will work much better. And for larger repositories, SharePoint and Microsoft Search Server are almost always better options.

    * = Server 2008 R2 / Win7 has a newer version of the Windows Search indexer that scales better to even larger corpuses.