Slashdot Mirror


Any Interest in a Regexp-Based Web Search Engine?

K-Man asks: "From time to time, I've seen people comment that they would be interested in searching the web with regular expressions, but I've seen very little research in this area. Over many months (as part of a project I call 'grepple'), I've gradually assembled some background on the idea (also some work-in-progress not noted in the link), and the idea seems to be approaching the realm of technical possibility. However, my expertise is not in marketing, so I have no idea whether anybody would use this capability. So I ask, if you could search the web for any regular pattern, including html, partial words or wildcards, long phrases, or anything you might grep out of an html file, would you do it? What types of searches would you do?"

4 of 51 comments (clear)

  1. +1 Funny/insightful on the MQR standard by MarkusQ · · Score: 5, Informative

    You have a point, but I have no mod points at the moment, save the ones I coin myself. Any new ability will invite new abuses (or, at least, new forms of old ones).

    -- MarkusQ

    P.S. For the regexp challenged, the parent poster was showing how easy it would be to use a rexular expression search engine to harvest e-mail addresses which the Bad Guy could then send spam to.

  2. It won't work by 0x0d0a · · Score: 2, Informative

    You can't scale it. Indexing systems that could be used as a foundation for regexes (CDAWG structures or similar) don't scale to the level of the Web.

    If you want to do searching of a small intranet, you might be able to get away with it. You might be able to do globbing, but currently using regexes won't work.

    The main regex-related features I suspect people might want are:

    * Phrases. Google and almost all other search engines can already do this, with quotes.

    * NEAR. foo NEAR bar in the document requests documents where foo occurs "near" bar. This is of somewhat more dubious utility, but there are some searches that it's convenient for.

    * Boolean NOT. Google and almost all other search engines can already do this.

  3. Possibly... by WindBourne · · Score: 2, Informative

    an interesting use of this would be on top of the results from say google. Google already seems to give the best results. Now simply use an RE engine on top of that would enable a user to get better results.

    --
    I prefer the "u" in honour as it seems to be missing these days.
  4. Thunderstone Texis... by PDHoss · · Score: 2, Informative

    ...already supports this (you most often see it in a free search engine called Webinator). It's the search db behind Dogpile, some (all?) of Ebay, parts of ZDNet, and a whole bunch of other stuff. Not cheap by any stretch but solid.

    Check it out: http://www.thunderstone.com/

    --
    ======================================
    Writers get in shape by pumping irony.