Desktop Search Engines Compared

← Back to Stories (view on slashdot.org)

Desktop Search Engines Compared

Posted by timothy on Tuesday January 4, 2005 @01:00PM from the operating-system-handicapped dept.

nutterButter writes "After Google created a stir with its desktop search engine, other engines gained more awareness in the public eye. Slate did a comparison of them and Google was not their top pick; Copernic was. I tried it - and am quite impressed."

7 of 361 comments (clear)

Min score:

Reason:

Sort:

Copernic... by tektek · 2005-01-04 13:04 · Score: 3, Interesting

Copernic is also the only one on TFA that can search Firefox.
Linux anyone? by ewanrg · 2005-01-04 13:05 · Score: 4, Interesting

Is it too much to hope someone might build a strong tool for doing this that will run on Linux? Having Copernic rated #1 is wonderful for folks still running Windows, and Google is wonderful for folks still running Windows, and...
I assume you get the picture :-)
---
Yeah, I'm like this on my blog too ;-)
1. Re:Linux anyone? by rusty0101 · 2005-01-04 15:20 · Score: 4, Interesting
  
  Actually, no they don't use a recursive grep on your hard drive.
  
  They use several filters to build an index of words in the various documents they have filters for.
  
  When you ask Google Desktop, Yahoo Desktop, or other search engines to find documents that might be relavent to your search string, they compare the words in your search string with the words in the index they created earlier. From that index, they then provide you with a list of files on your system ranked by whatever algorythm the developers came up with.
  
  If you happen to have a DVD ISO file on your system somewhere, copy it to a different partition to see how long just copying, not searching, that much material takes. It is not a non-trivial amount of time. Especially when you are looking to present a user with a list of matches in under a second.
  
  Indexing is not just running a variation of 'grep' against your files. It is collecting a list of words from each document, identifying those words that are not 'common' (if, and, but, the, or, a, I, etc.) and identifying where in the document those words exist.
  
  That way when you look for 'President Bush' on your hard drive, it can compare the proximity of the words 'president' and 'bush' and give a better match to those documents that contian both words, closer together. That way your disertaion on Teddy Roosivelt hunting in the deapest affrica will be less likely to come up with a match than your discussion of the relaventce of the first Gulf War to political dinners in Japan.
  
  There are a couple tools out there that provide some of these features for Linux. You can use ht://dig to build a web based interface. If you would rather be able to use either a command line search, or a web based search, you might want to look into Glimpse.
  
  Of course, this being Linux, dozens of people have taken a partial stab at doing this. You could probably work out a method from either the Learning Perl, or Learning Python books, as both are quite capable of building and maintaining indexes. The best part is that it would be optimized for your set of files, rather than just being a generic tool that you have to go out and find third party filters to make use of.
  
  Then again, what do I know. If you think running grep against /dev/hda is a good use of your time, more power to you.
  
  -Rusty
  
  --
  You never know...
2. Re:Linux anyone? by dAzED1 · 2005-01-04 15:49 · Score: 4, Interesting
  
  I was poking him since he didn't understand. You don't understand either, but you're closer. As a dba, I'm quite well aware of how indexes work.
  
  If you're organized, then your docs will be on one general area. As such, running an egrep in there for a phrase really doesn't take much time at all. 20 minutes? hardly. A second, maybe 2. Try it some time.
  
  What it allows me to do is make my /own/ algorythm for what I want displayed.
  
  Is this practical, or even easily plausible, in windows? No. Does everyone know regular expressions? No. Am I saying that no one should use these tools? No. I'm just commenting on the poster that said grep couldn't do what these tools do - they were wrong.
  
  locate doesn't search your emails, nor let you know which files containt things, you could recursive grep, but that doesn't find stuff in pdf files, and takes up a ton of cpu.
  
  Locate - doesn't need to search my emails. gmail does that just fine. Egrep tells me what contains whatever I want. Can google's tool find files that have a line that starts with a number, has 2 words, then repeats the number again? No. Simple regex can blow away anything the google tool can do. I can most certainly find stuff in any binary or doc file, without taking up "a ton of cpu."
  
  See? not saying my way is better for everyone else. Just saying someone who says my way doesn't work, is wrong - my way not only works, its more powerful.
Apple's coming out with something like this... by bennomatic · 2005-01-04 13:06 · Score: 5, Interesting

It's called Mac OS X Tiger. If you've used iTunes, you know how good and how fast searching can be. It's going to be pretty awesome when it comes out.

--
The CB App. What's your 20?
1. Re:Apple's coming out with something like this... by Ludraman · 2005-01-04 13:16 · Score: 3, Interesting
  
  Yeah, it's called Spotlight, and in Tiger will be in the top right corner of the screen. You can search your hard drive like you search your iTunes library, and it will even search in files for keywords. All in no time whatsoever. Rockin'.
  
  --
  
  -- Wanted dead or alive - Schrodinger's cat
the main problem i had with google by jeff+munkyfaces · 2005-01-04 13:10 · Score: 5, Interesting

is that i can only open the file i search for!

i planned to sort out my music collection - so i searched for an artist - 87 results.

can i select them all and move them to a folder in one go? no.

for this kind of thing it's useless - i wonder if i can with copernic..