Slashdot Mirror


A Grep-like Utility That Works on More than Text?

Nutria writes "This article got me thinking: What's a poor Unix-using guy to do, when he needs to grep text, compressed tarballs, OO.o documents, Debian archives, mime-encoded files, Evil Microsoft documents, PDF files, compressed AbiWord files, etc." Is there an extensible searching program for Unix that can handle a variety of different file-types? Search engines like ht://Dig can accomplish part of this task, however currently it doesn't index the whole file (just portions of the metadata). If you had to perform a substring search on a set of documents of different types, what tools would you use to accomplish this task?

2 of 65 comments (clear)

  1. Are you sure you're a "unix guy"? by ivan256 · · Score: 3, Insightful

    What's a poor Unix-using guy to do, when he needs to grep text, compressed tarballs, OO.o documents, Debian archives, mime-encoded files, Evil Microsoft documents, PDF files, compressed AbiWord files, etc.

    Um, why not pipe the output of your favorite program that interacts with the file type you're interested in to grep? Isn't that the "poor unix guy" way?

    catdoc Blah.doc | grep foo
    zcat compressed.txt.gz | grep foo
    apt-cache show package | grep foo
    pdftotext Blah.pdf | grep foo

    etc...

  2. Re:Full text search engine? by shufler · · Score: 2, Insightful

    Just reindex at 9 in the moring, when you've stopped computing for the night.