Next Generation Regexp
prostoalex writes "Jeffrey E. F. Friedl, author of newly published 2nd edition of Mastering Regular Expressions, wrote a feature article for O'Reilly Network on the recent innovations in the regular expression world. You'd think that such area as regular expressions would be fairly stable, but according to the author, 'when I started to work on the second edition of Mastering Regular Expressions and started refocusing on the field, I was rather shocked to find out how much had really changed'. The article's behind-the-scene purpose is apparently to push a new book that O'Reilly published this month, but it has great educational value for anyone involved with practical extracting and reporting."
Perl6 is going to radically change regular expressions as well. I guess the term "regular expression" is pretty vague/useless these days. You have to identify the language _and_ its revision to get an accurate idea of the regexp feature set you're dealing with. Just throw some variables and control structures into regexp and we'll have a full-blown extremely cryptic language. Maybe we need a RegExp Institute of Excellence with yearly meetings in Sweden or something.
He doesn't even mention the radical changes to regexps in Perl 6, as described in the recent Apocalypse 5 and Synopsis 5.
Perl and other languages should leave "good enough" alone when it comes to regular expressions and instead just make it easy to put chunks of grammars into programs.
Regular expressions haven't changed since the seventies, at the latest. Now if you want to say that implementations of regular expressions are advancing, fine. Let's be precise in our use of language, or not.
> Microsoft has the best doco of ANY software development company.
ROTFL! Clearly you've never seen any DEC software manuals. "ANY" is more that a little bit too strong.
Way back when there was a programming language called "Snobol". It still lives (www.snobol4.com for a good starting point).
Snobol is *THE* string pattern matching language. Nothing else beats it (and I've been playing around with string processing languages for over 20 years).
Yes.. it's syntax is different and the language hasn't changed in years (decades?). But it does the job exceedingly well.
You might also want to take a look at the Icon programming language (www.cs.arizona.edu/icon).
Icon was developed by some of the same folks that developed Snobol. While not quite as powerful as Snobol in terms of expressing patterns, Icon extended some concepts. You can build up your own pattern matching functions.
One of the best quotes I saw in an discussion concerning Icon and regular expressions (the discussion was that Icon lacked a builtin regular expression facility) was
"Putting regular expressions into Icon would be like putting training wheels on a Harley" -- (I really wish I could remember who said that).
Anyway... just something you might want to check into.
Sure, I've used them in a couple small scripts for parsing text, but if you see the majority of programming requiring regex, you definitely need to put your hammer down and pick up a Makita.
Well, I am certainly not advocating the broad use of regexps in application programming, even though it has been demonstrated to be possible. For me, regexps are an important tool in solving side issues/behind the scenes work, such as formatting a series of configuration files in a given manner, or making broad changes to a set of HTML files, and so forth. I don't do Perl, and don't really like to if I can avoid it, but I still use regular expressions on a daily basis, and have found them to be immensely helpful.
A Perl "regular expression" is more powerful than a mathematical "regular expression." Perl's can do backtracking, which a finite automaton can't do.
The Perl "RE" "(a+)b\1" will match aba and aaaabaaa, but not abaa or aaba.
thanks for your book. ...
Everybody here and there is going to say how informative it is. But, what stroke me the most, is that it is well written.
It was very pleasant to read it, apart from the knowledge I got from it. If only all manuals
While you later concede that form input and input from other programs might be good reasons to use a regex, that you would even pose this question is strange. For 90% of the regex fans, form input and screen scraping is exactly what they do. For almost any Web developer, this is the day-in, day-out norm. So your point seems to downplay the very uses that have made regex's so popular.
You realize this does not bolster your claim that regex's are "overrated" -- it merely points out that some developers are overrated. A bad developer does not make a language bad.
Same as above. You're complaining about human error and then blaming the regex system itself.
Of course. But the hastily written software is the other software we interact with, not our own. And that's a broad generalization for many developers, so of course you can find exceptions. But you asked for other people's views, and in my view, regex's are sorely needed -- not so bad developers can stay bad, but so that the good developers can clean up the messes left behind after the bad developers go. It's a nice bonus that good regex developers can pull in hostile data, screen scrape, and cleanse form input. That helped one of my employees get a raise last quarter.
My Greasemonkey scripts for Digg &