Internet Searching Using Regular Expressions?
/[Aa]non([aiy]mo?u?s)? [KkCc]ow[ae]rd/ asks: "Remarkably few people have a working understanding of regular expressions. But, those that do know how useful they can be for searching text. Has anyone out there seen a large search engine (like Google) that will take regular expressions for queries? How about a newsgroup search engine?" Aside from the fact that many regular expressions read like snippets of line noise, they are the best thing I've seen for searches, and it's a lot easier than -adding +alot +of +search -terms.
somewhere.*over.*there
Across an entire internet sized search engine? I guess you could pre-select documents containing somewhere and over and there and then proceed with screening them through a "standard" regexp search.
However, doing the prelimiary match using the regexp would definately be resource-prohibitive. In the above example, you would have to read the text of each file in to do the regexp. Not to mention the cost of keeping the text around.
That said, I can see how you could implement a regexp-like front end to a search tool if you had some restrictions as to what you could do with the regular expressions. However, I suspect the idea was more to be able to do advanced conditionals and other funky stuff within the regular expressions, and limiting this would probably limit the usefullness of the product.
So, maybe to summarize my rambling, the initial hurdle would be to re-invent the way normal regexps work in order to be efficient in a multi-giabyte database.
Best to just download search results with a spider and hit them with grep, if you've got the time. [sigh].
*whup* "Get along, little electrons. Heeyah!"