A Grep-like Utility That Works on More than Text?
Nutria writes "This article got me thinking: What's a poor Unix-using guy to do, when he needs to grep text, compressed tarballs, OO.o documents, Debian archives, mime-encoded files, Evil Microsoft documents, PDF files, compressed AbiWord files, etc." Is there an extensible searching program for Unix that can handle a variety of different file-types? Search engines like ht://Dig can accomplish part of this task, however currently it doesn't index the whole file (just portions of the metadata). If you had to perform a substring search on a set of documents of different types, what tools would you use to accomplish this task?
This would make a great shell script project. You could use file to detect the type and then filter and grep it appropriately. This sounds useful enough that I'll probably write this script this weekend. Thanks for the idea.
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
Seth Nickell has a blog entry discussing solutions in this area, including Gnome Storage, WinFS, Dashboard, Medusa, Spotlight, and Beagle.
Haven't actually tried it out (everything I write seems to be text or TeX), but I remembered reading this article a while back: "How to Index Anything", Linux J., July 1 2003.
Christian Jones
Medicine. Mathematics. Mediocrity.