Slashdot Mirror


Regex Golf, xkcd, and Peter Norvig

mikejuk writes "A recent xkcd strip has started some deep academic thinking. When AI expert Peter Norvig gets involved you know the algorithms are going to fly. Code Golf is a reasonably well known sport of trying to write an algorithm in the shortest possible code. Regex Golf is similar, but in general the aim is to create a regular expression that accepts the strings in one list and rejects the strings in a second list. This started Peter Norvig, the well-known computer scientist and director of research at Google, thinking about the problem. Is it possible to write a program that would create a regular expression to solve the xkcd problem? The result is an NP hard problem that needs AI-like techniques to get an approximate answer. To find out more, read the complete description, including Python code, on Peter Norvig's blog. It ends with this challenge: 'I hope you found this interesting, and perhaps you can find ways to improve my algorithm, or more interesting lists to apply it to. I found it was fun to play with, and I hope this page gives you an idea of how to address problems like this.'"

5 of 172 comments (clear)

  1. Re:FWIW, the Regex Golf game by Garridan · · Score: 3, Informative

    Or, if you want to try your hand at meta regex golf there's a place for that, too.

  2. Re:RegExps by Anonymous Coward · · Score: 2, Informative

    If regular expressions are a programming language, then they are not a very powerful one. The languages they recognize are the so-called regular languages, which are the least expressive category in the Chomsky hierarchy of languages (note the difference between the language of regular expressions and the languages recognized by regular expressions). See the Wikipedia articles on regular languages and the Chomsky hierarchy for details.

  3. This problem has been studied for decades by DogPhilosopher · · Score: 5, Informative

    There's a field called Grammar Induction, and the problem of learning regular languages, aka regular inference, can be considered a subfield. People have been working on this since the '50s. Applications include learning DTDs for XML/wrapper induction, and all kinds of problems in bioinformatics and natural language processing.

    There's a strong link with the graph coloring problem, see
    http://www.cs.ru.nl/~sicco/papers/alt12.pdf

    In this field, the focus is generally on learning FSAs, but these can easily be transformed into regexps. There's work on learning regexps directly, see
    http://www.informatik.uni-trier.de/~fernau/papers/Fer05c.pdf

    Enjoy.

  4. Re:ioccc 2013 US president matching code by hankwang · · Score: 4, Informative

    It takes the first four bytes of the president's name, converts them to int; then applies four modulo operations (%4796 %275 %4). How the author figured out that those four operations would do the job, who knows? Maybe brute-forced the parameter space.

  5. Re:Unless Pyhon has changed recently. by bunratty · · Score: 3, Informative

    Python is very similar to Perl, and also has most of the characteristics that make Lisp a good language to program AI algorithms.

    --
    What a fool believes, he sees, no wise man has the power to reason away.