Slashdot Mirror


Desktop Search Engines Compared

nutterButter writes "After Google created a stir with its desktop search engine, other engines gained more awareness in the public eye. Slate did a comparison of them and Google was not their top pick; Copernic was. I tried it - and am quite impressed."

21 of 361 comments (clear)

  1. Mac version? by Heftklammerdosierer! · · Score: 1, Interesting

    Any plans for one? Otherwise it'll be hard for me to form an opinion of my own.

  2. Copernic... by tektek · · Score: 3, Interesting

    Copernic is also the only one on TFA that can search Firefox.

  3. Linux anyone? by ewanrg · · Score: 4, Interesting
    Is it too much to hope someone might build a strong tool for doing this that will run on Linux? Having Copernic rated #1 is wonderful for folks still running Windows, and Google is wonderful for folks still running Windows, and...

    I assume you get the picture :-)

    ---

    Yeah, I'm like this on my blog too ;-)

    1. Re:Linux anyone? by rusty0101 · · Score: 4, Interesting

      Actually, no they don't use a recursive grep on your hard drive.

      They use several filters to build an index of words in the various documents they have filters for.

      When you ask Google Desktop, Yahoo Desktop, or other search engines to find documents that might be relavent to your search string, they compare the words in your search string with the words in the index they created earlier. From that index, they then provide you with a list of files on your system ranked by whatever algorythm the developers came up with.

      If you happen to have a DVD ISO file on your system somewhere, copy it to a different partition to see how long just copying, not searching, that much material takes. It is not a non-trivial amount of time. Especially when you are looking to present a user with a list of matches in under a second.

      Indexing is not just running a variation of 'grep' against your files. It is collecting a list of words from each document, identifying those words that are not 'common' (if, and, but, the, or, a, I, etc.) and identifying where in the document those words exist.

      That way when you look for 'President Bush' on your hard drive, it can compare the proximity of the words 'president' and 'bush' and give a better match to those documents that contian both words, closer together. That way your disertaion on Teddy Roosivelt hunting in the deapest affrica will be less likely to come up with a match than your discussion of the relaventce of the first Gulf War to political dinners in Japan.

      There are a couple tools out there that provide some of these features for Linux. You can use ht://dig to build a web based interface. If you would rather be able to use either a command line search, or a web based search, you might want to look into Glimpse.

      Of course, this being Linux, dozens of people have taken a partial stab at doing this. You could probably work out a method from either the Learning Perl, or Learning Python books, as both are quite capable of building and maintaining indexes. The best part is that it would be optimized for your set of files, rather than just being a generic tool that you have to go out and find third party filters to make use of.

      Then again, what do I know. If you think running grep against /dev/hda is a good use of your time, more power to you.

      -Rusty

      --
      You never know...
    2. Re:Linux anyone? by dAzED1 · · Score: 4, Interesting
      I was poking him since he didn't understand. You don't understand either, but you're closer. As a dba, I'm quite well aware of how indexes work.


      If you're organized, then your docs will be on one general area. As such, running an egrep in there for a phrase really doesn't take much time at all. 20 minutes? hardly. A second, maybe 2. Try it some time.


      What it allows me to do is make my /own/ algorythm for what I want displayed.


      Is this practical, or even easily plausible, in windows? No. Does everyone know regular expressions? No. Am I saying that no one should use these tools? No. I'm just commenting on the poster that said grep couldn't do what these tools do - they were wrong.


      locate doesn't search your emails, nor let you know which files containt things, you could recursive grep, but that doesn't find stuff in pdf files, and takes up a ton of cpu.


      Locate - doesn't need to search my emails. gmail does that just fine. Egrep tells me what contains whatever I want. Can google's tool find files that have a line that starts with a number, has 2 words, then repeats the number again? No. Simple regex can blow away anything the google tool can do. I can most certainly find stuff in any binary or doc file, without taking up "a ton of cpu."


      See? not saying my way is better for everyone else. Just saying someone who says my way doesn't work, is wrong - my way not only works, its more powerful.

    3. Re:Linux anyone? by mcrbids · · Score: 2, Interesting

      Is it too much to hope someone might build a strong tool for doing this that will run on Linux?

      Some years ago, there was a product called "Excite for Web Servers" or "EWS". It was very good - I used it to index several hundred MB of text on my fire-breathing, 166 Mhz Pentium back in the day.

      Unfortunately, it's getting real, real, real old and is almost impossible to get to work properly on a modern Linux install.

      It's an excellent product, distributed with sources. Unfortunately, without a sufficiently free license behind it, there's no active fork for it, anywhere.

      Anyway, to make it a "personal" tool, run it every night in a cron job against your home directory, then use a local copy of Apache to serve the said home directory.

      Kludgy, but workable. It'd be nice to see this resurrected and turned into something a bit more modern...

      --
      I have no problem with your religion until you decide it's reason to deprive others of the truth.
  4. Apple's coming out with something like this... by bennomatic · · Score: 5, Interesting

    It's called Mac OS X Tiger. If you've used iTunes, you know how good and how fast searching can be. It's going to be pretty awesome when it comes out.

    --
    The CB App. What's your 20?
    1. Re:Apple's coming out with something like this... by Ludraman · · Score: 3, Interesting

      Yeah, it's called Spotlight, and in Tiger will be in the top right corner of the screen. You can search your hard drive like you search your iTunes library, and it will even search in files for keywords. All in no time whatsoever. Rockin'.

      --

      -- Wanted dead or alive - Schrodinger's cat
    2. Re:Apple's coming out with something like this... by SirBeck · · Score: 2, Interesting

      No need to wait for Tiger... http://quicksilver.blacktree.com/ is nearly perfect. Type ahead find on any file, app, or the contents thereof, then run any number of actions on that object. Run it, pipe it, control iTunes with it, bind keys with it... no need for docks or menus ever again.

    3. Re:Apple's coming out with something like this... by blowdart · · Score: 2, Interesting
      All in no time whatsoever

      No time? Wow, is this due to the "faster than light" processor Apple were advertising a couple of years back?

      Please, I realise people swallow marketing speak but saying a search will take no time at all has gone past marketing speak and into blatent lying. At a minimum there's the time to index your disk, then when you search the time to look through that index and the time to display results.

  5. Why is desktop search so hot? by mOoZik · · Score: 2, Interesting

    I can't understand why the regular search function isn't enough. No, I'm serious. What do these products offer that a regular search cannot afford? Seems everyone is on the desktop search bandwagon these days.

  6. Why would anyone trust this? by krbvroc1 · · Score: 2, Interesting

    What amazes me is why would anyone trust this sort of application? Other than a virus scanning program, I really don't want any application to have permission to scan, search, and index every file on my harddisk. I don't care what the privacy policies are ; it's not something I'm willing to risk.

  7. the main problem i had with google by jeff+munkyfaces · · Score: 5, Interesting

    is that i can only open the file i search for!

    i planned to sort out my music collection - so i searched for an artist - 87 results.

    can i select them all and move them to a folder in one go? no.

    for this kind of thing it's useless - i wonder if i can with copernic..

  8. Re:Bias? by koreaman · · Score: 2, Interesting

    Slate is completely journalistically independant of their owner, Microsoft. For instance, I distinctly remember them recommending Firefox.

  9. Enfish by vivarin · · Score: 2, Interesting

    Yesterday marked the tenth anniversary of my first day at work at Enfish, one of the very first desktop search engines. You can try it yourself at enfish.com. I also wrote part of the indexing system for what eventually became X1 at idealab after I left Enfish in 1999.

    Enfish has the best Windows integration, and X1 has a very snappy search. Enfish uses less memory for a large index and supports more data types.

    Linux types can always use glimpse or roll something themselves with Lucene (an apache project).

    Nice to know that it only took a decade for the product category to heat up...

  10. I had a major problem with Copernic by geneing · · Score: 2, Interesting

    I tried Copernic for about a week and then removed it. A major "showstopper" for me was that Copernic would lock files at random (indexing?). When I would try to delete a directory I would get an error that files are in use. It was happening way too often even after I limited the directories I indexed. Another problem was random slowdowns and explorer crashes. I don't have a proof that Copernic was at fault - only circumstantial evidence.

  11. Re:How neccessary is this for home users? by strider_starslayer · · Score: 2, Interesting

    I generally make a point of correctly labeling my files, and making strong directory structures, eveything nessassary for good organization;

    Yet I still desire a tool like this. Why? Because I forget thing- I may remember that two years ago I worked on a programmign project that displayed all the pictures in a directory- but I don't remember the filename, the project it's attached to, or the date I last used it.

    I can search my programming directory, my backup directory, etc; eventually I'll find it, but I'll have to open basically every project I have to do so- by making a search for the contents of the file and searching for notes I would have put into my properly documented pseudo code, or whatever else I can come up with, in an advanced search routine that uses a lot of AND/OR statements, I'll find it.

    --
    -Millions of Monkeys, Millions of typewriters, 6 hours of sorting through faeces encrusted pages to find: This post
  12. Re:How neccessary is this for home users? by MrRTFM · · Score: 2, Interesting

    this is especially useful for home users.

    Considering that these are people who get lost when a desktop shortcut vanishes - "who deleted solitaire?"

    They dont have to think about where files get saved to anymore - they dont even have to think about what app they used to create it - the desktop tools find it for them and all they do is click the web link.

    I also use Google desktop search (and Lookout), but google will be far better when they allow us to choose our own file extensions to search.

    --
    You can't expect to wield supreme executive power, just because some watery tart threw a sword at you
  13. Copernic - humm.. by E+IS+mC(Square) · · Score: 2, Interesting

    I guess it will take time to figure out advance and unique features of Copernic, but some obvious rants can be:
    1. No thunderbird support
    2. Why would I need to allow cookie from copernic if it is a *desktop* search?

    Good thing is that it has firefox/mozilla support, which takes care of your browsing. Default options are set non-aggressively (like searching history is checked off by default, which is insightful), and this is something really good : option of NOT searching images smaller than 16x16 pixels, music files of less than 10 seconds content (not configurable, though) - very thoughtful!!

  14. CDS vs Unicode by loyukfai · · Score: 2, Interesting

    Copernic Desktop Search doesn't seem to support Unicode, which is a major strength of Google's various offerings.

  15. How big are these databases these create? by iamwahoo2 · · Score: 2, Interesting

    I have been wondering what exactly these things index? If they index every single word of every document, I would assume that the overall database becomes enormous, not to mention it must take awhile to create the index. Anybody have insight into what these databases are actually doing?