What Desktop Search Engine For a Shared Volume?
kriston writes 'Searching data on a shared volume is tedious. If I try to use a Windows desktop search engine on a volume with hundreds of gigabytes the indexing process takes days and the search results are slow and unsatisfying. I'm thinking of an agent that runs on the server that regularly indexes and talks to the desktop machines running the search interface. How do you integrate your desktop search application with your remote file server without forcing each desktop to index the hundred gigabyte volume on its own?'
They already have it indexed for you.
You've stumped Slashdot. Bravo!
Ocean is land, covered with water.
Not that I've ever used it before, but it sounds like it does what you want: http://www.google.com/enterprise/search/gsa.html
how about using a program like Documentum? We generate several thousand technical documents and drawing a month, and use it for all our document management needs.
How about Everything (assuming the server is Windows & NTFS)? Works well for me (quickest desktop search I've found yet), and can either run locally or connect to an ETP server. The site seems to be down right now, but here's the original Lifehacker article where I found it. Incidentally, I never heard of ETP til I started using it. Anyone know if it's an Everything-specific protocol?
"Once in Hawaii I had sex with a 102 year old male turtle. It is difficult to argue that it was consensual." - Steve Ma
Here's a few options you might want to consider: 1) Use Office SharePoint Server 2007 to index the share 2) Upgrade to Windows Server 2008 (or above) and Windows Vista (or above) and use the Federated search feature: http://trycatch.be/blogs/roggenk/archive/2007/11/05/windows-vista-amp-windows-server-2008-federated-search.aspx
I guess it could work, although you can't index the files directly. You have to run a local copy and one on the server as an EPT Server. www.voidtools.com, although it seems to be down at the moment, so here's a link to the FAQ on Google's Cache: http://74.125.113.132/search?q=cache:fcYHcEJKH3UJ:www.voidtools.com/faq.php
MS does have a solution, it's called Windows Federated Search. Windows 7 with 2008R2 has it .. there might be a way to do with Windows Desktop Search 4.0. Here's some info on it - http://geekswithblogs.net/sdorman/archive/2009/05/14/windows-7-federated-search.aspx
If you have a windows server, you can tell Share point to index the file share. See: http://dotnetmafia.sys-con.com/node/1046930
Curious about Storage and Virtualization? Check out
Yes, Google's Search Appliance (GSA) could be used, I have seen it used with limited success. The main problem was how to respect access control on documents: either you index them or you don't, and if you index them with GSA, sensitive data may show up in search results. Also, we had a lot of trouble "taming" GSA: it would regularly take down servers that were dimensioned for light loads.
I would suggest using Alfresco http://www.alfresco.com/ as a CIFS (Common Internet File System) or WebDav store for all those documents. This would give you the simplicity of a shared folder and the opportunity to enrich the documents with searchable metadata such as tags, etc. Each folder (or any item, in fact) could have the correct access control that would be respected by the search engine, Lucene. http://lucene.apache.org/java/docs/
Alfresco comes in both Enterprise and Community Edition, it's very easy to try out -- even our non-techie project manager could install it on his PC within 10 minutes. Try that with Documentum, FileNet or IBM DB2 Content Manager!
Who?
You could just rsync the shared volume to a local drive as frequently as needed and run the search engine on the local copy.
512 MB RAM, 20 GB disk, 200 GB transfer, five datacenters. $19.95/month.
*ducks*
One way is to set up Microsoft Indexing Service on the server with the shared drive. The MSC console app provides a search capability and one can also use the Indexing Service SDK for client apps.
Use Microsoft Search Server 2008 Express...its free, all you need is a free server box. Also Check out SharePoint Search and FAST enterprise search.
http://www.microsoft.com/enterprisesearch
Yeah, my thought exactly? I wasn't aware that it was a problem searching hundreds of gigabytes on shared volumes. We have a couple of terabytes shared by our Mac servers and I don't think I've had search times longer than ten seconds over a couple of million files.. MS Office files, PDFs, movies, audio, pictures, photographs, text, HTML, source code.. all indexed with metadata and contents.
Even the days before Spotlight, using AppleShare IP Servers in the 90s, finding stuff on the servers was never an issue. It has always been so fast that I have never even reflected over that it was fast. Maybe I should use some other operating system once in a while to experience what the majority experiences. Or not.. I'd rather stay care free and productive.
Don't call me when you figure this out.
- Henrik
- when the Shadows descend -
SharePoint is $$$$. Try Alfresco. Alfresco can look like a file share (support SMB, DAV, FTP, etc). The indexing is built is and does not require a separate SQL Server license.
You mean the Document Management Alfresco and not the CMS software. The Community Edition is free but unsupported, and the Enterprise edition has a free 30 day trial. It looks like it won a government award for document management which is rare for open source document management software.
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
You don't allow every client to index. There's been several suggestions already, but most enterprises intentionally DISABLE desktop search. It absolutely slaughters the share. It's not a big deal when one user is doing it... but when 5,000 are, the I/O load becomes unsustainable.
"Earth allows you to find files across a large network of machines and track disk usage in real time. It consists of a daemon that indexes file systems in real time and reports all the changes back to a central database. This can then be queried through a simple, yet powerful, web interface. Think of it like Spotlight or Beagle but operating system independent with a central database for multiple machines with a web application that allows novel ways of exploring your data." http://open.rsp.com.au/projects/earth
I've tried ducks, but they tend to nibble the occasional one or zero, and they leave an awful mess on the platters when they poop. Try Spotlight instead -- not as cute, but easier on the data, hardware, and the nose.
"First things first, but not necessarily in that order."
- Doctor Who
(Disclaimer: I work for Extensis)
Portfolio Server can continuously index files on SMB/CIFS (and AFP) volumes using a feature called "AutoSync". Web and Desktop (Windows/Mac) clients then search by folder name, file name, document text, or other metadata. Indexing and thumbnail creation takes place on the server, so clients are relieved of any cataloging workload and metadata is centralized.
http://www.extensis.com/en/products/portfolioserver9/overview.jsp
I see you are trying to write a funny post. Would you like me to help with that?
Bark less. Wag more.
FYI, GNU find has xargs built in these days:
find -name '*.php*' -exec grep func {} +
the + instead of ';' makes it collect up multiple arguments to grep
like xargs instead of the traditional find -exec behaviour which is like xargs -n1. I use -exec {} + all the time, because it's less typing, and safe with
filenames with punctuation or whitespace, so you don't have to type -print0 | xargs -0 either. (BTW, if you have a list of filenames that you processes with something line oriented, you can use xargs -d'\n')
#define X(x,y) x##y
Peter Cordes ; e-mail: X(peter@cordes ,
Maybe search, but didn't know what to search for.
If this is a guy who's used to doing home and small business support, with a handful of machines at best, and kind of got thrown into the deep end by management because he's "good with them thar cmpooterz" then he may not be thinking "enterprise" search.
After all, why would anybody need a search engine to find a starship?
Probably searched for "shared drive search engine" or something like that.
"City hall" in German is "Rathaus" Kinda explains a few things......
Except then you have another terrible search solution which isn't meant for the amount of data you'd find on a large server. Worse, you have an operating system that is terrible as a server solution.
On the other hand, you could just use a unix/linux distro of your choice, and beagle (http://beagle-project.org) - which is meant for indexing large amounts of data and has many clients some of which can remotely access it.
BeauHD. Worst editor since kdawson.
For indexing files, you're better off using Windows Search 4, a free download for Windows Server 2003. The old content indexing service is deprecated and a much older technology. It's useful in some particular scenarios but for a smaller (100,000 - 250,000 items*) corpus of file content, WS4 will work much better. And for larger repositories, SharePoint and Microsoft Search Server are almost always better options.
* = Server 2008 R2 / Win7 has a newer version of the Windows Search indexer that scales better to even larger corpuses.
My understanding is that you only get the full document text search when the data is backed by a real SQLServer license. The person was looking for a full search solution. This is built into Alfresco.
SQLServer is per CAL even though the app is a web app.
Spotlight is the obvious answer if you have OS X. Not everybody in the world is lucky enough to be in that
position, most are stuck on one of the inferior platforms. Your rubbing it in, is not helping it just
alienates people who already have been through enough and have it tough.
Full text search in Alfresco uses Lucene. Or at least it did when I deployed it on Debian with PostgreSQL.