Next Generation Regexp
prostoalex writes "Jeffrey E. F. Friedl, author of newly published 2nd edition of Mastering Regular Expressions, wrote a feature article for O'Reilly Network on the recent innovations in the regular expression world. You'd think that such area as regular expressions would be fairly stable, but according to the author, 'when I started to work on the second edition of Mastering Regular Expressions and started refocusing on the field, I was rather shocked to find out how much had really changed'. The article's behind-the-scene purpose is apparently to push a new book that O'Reilly published this month, but it has great educational value for anyone involved with practical extracting and reporting."
Regexps are interesting, sure. Every CS student enjoys (or suffers through!) the regexp section of their Intro to Computability (or equivalent) course. And it is pretty fun thinking about the expressive power of, say (a|b)*a*b*.
However, we have to face the facts, that regexps, as good as they are from a mathematical standpoint at matching things, just aren't that helpful in sorting through the sea of data that is the Internet. The input data just aren't orderly enough for regexps to be of any use.
What has become useful is what Google taps into. And that is the human aspect. Data isn't important because it matches a*(b|c)a*. It's important because it is useful to people. Think about it: when you are looking for wares or porn, where do you go? Perl? Nope. IRC. Why? Because of the human element.
That is why research into regexps is doomed to failure. It is a dead end. From a theoretical standpoint, regexps are cute and interesting, but for serious data prowling, you need something with a brain and a heart.
Karma: Good (despite my invention of the Karma: sig)
Nice to see that things haven't changed much ;)
I don't know you got modded insighful, I think Troll is a more accurate description.
Microsoft has the best doco of ANY software development company.
MSDN Library is the best single reference for everything Microsoft.
Take a look at it some time.
The only computer books i've ever read which actually read well were "Upgrading and Repairing PC's" (So much so i wrote the author) and "The practice of system and network administration".
If only all books could be written as well.. *sigh*...
In-depth... summary. In-depth... Summary.
Is there a regexp to validate XML?
--Giving to trolls for the benefit of us all
Perhaps if you are looking for perl programmers who will need to be doing a lot of textual processing, but that's definitely not the case in other areas.
I prefer to work with people who don't do a lot of regex, because they're less likely to use them for everything. I haven't worked on a large project that used regular expressions in years. I feel pretty good about that.
Sure, I've used them in a couple small scripts for parsing text, but if you see the majority of programming requiring regex, you definitely need to put your hammer down and pick up a Makita.
-- The world is watching America, and America is watching TV.