Regex Golf, xkcd, and Peter Norvig

← Back to Stories (view on slashdot.org)

Regex Golf, xkcd, and Peter Norvig

Posted by samzenpus on Sunday January 12, 2014 @08:20AM from the problem-to-solve dept.

mikejuk writes "A recent xkcd strip has started some deep academic thinking. When AI expert Peter Norvig gets involved you know the algorithms are going to fly. Code Golf is a reasonably well known sport of trying to write an algorithm in the shortest possible code. Regex Golf is similar, but in general the aim is to create a regular expression that accepts the strings in one list and rejects the strings in a second list. This started Peter Norvig, the well-known computer scientist and director of research at Google, thinking about the problem. Is it possible to write a program that would create a regular expression to solve the xkcd problem? The result is an NP hard problem that needs AI-like techniques to get an approximate answer. To find out more, read the complete description, including Python code, on Peter Norvig's blog. It ends with this challenge: 'I hope you found this interesting, and perhaps you can find ways to improve my algorithm, or more interesting lists to apply it to. I found it was fun to play with, and I hope this page gives you an idea of how to address problems like this.'"

5 of 172 comments (clear)

Min score:

Reason:

Sort:

Re:FWIW, the Regex Golf game by Garridan · 2014-01-12 09:09 · Score: 3, Informative

Or, if you want to try your hand at meta regex golf there's a place for that, too.
Re:RegExps by Anonymous Coward · 2014-01-12 09:25 · Score: 2, Informative

If regular expressions are a programming language, then they are not a very powerful one. The languages they recognize are the so-called regular languages, which are the least expressive category in the Chomsky hierarchy of languages (note the difference between the language of regular expressions and the languages recognized by regular expressions). See the Wikipedia articles on regular languages and the Chomsky hierarchy for details.
This problem has been studied for decades by DogPhilosopher · 2014-01-12 09:34 · Score: 5, Informative

There's a field called Grammar Induction, and the problem of learning regular languages, aka regular inference, can be considered a subfield. People have been working on this since the '50s. Applications include learning DTDs for XML/wrapper induction, and all kinds of problems in bioinformatics and natural language processing.
There's a strong link with the graph coloring problem, see
http://www.cs.ru.nl/~sicco/papers/alt12.pdf
In this field, the focus is generally on learning FSAs, but these can easily be transformed into regexps. There's work on learning regexps directly, see
http://www.informatik.uni-trier.de/~fernau/papers/Fer05c.pdf
Enjoy.
Re:ioccc 2013 US president matching code by hankwang · 2014-01-12 10:16 · Score: 4, Informative

It takes the first four bytes of the president's name, converts them to int; then applies four modulo operations (%4796 %275 %4). How the author figured out that those four operations would do the job, who knows? Maybe brute-forced the parameter space.

--
Avantslash: low-bandwidth mobile slashdot.
Re:Unless Pyhon has changed recently. by bunratty · 2014-01-12 10:37 · Score: 3, Informative

Python is very similar to Perl, and also has most of the characteristics that make Lisp a good language to program AI algorithms.

--
What a fool believes, he sees, no wise man has the power to reason away.