Next Generation Regexp
prostoalex writes "Jeffrey E. F. Friedl, author of newly published 2nd edition of Mastering Regular Expressions, wrote a feature article for O'Reilly Network on the recent innovations in the regular expression world. You'd think that such area as regular expressions would be fairly stable, but according to the author, 'when I started to work on the second edition of Mastering Regular Expressions and started refocusing on the field, I was rather shocked to find out how much had really changed'. The article's behind-the-scene purpose is apparently to push a new book that O'Reilly published this month, but it has great educational value for anyone involved with practical extracting and reporting."
"Think about it: when you are looking for wares or porn, where do you go? Perl? Nope. IRC. Why? Because of the human element ... but for serious data prowling, you need something with a brain and a heart."
A heart for porn?
Buying a Dell computer is equivalent to dropping the soap in a prison shower.
Yes, and that makes me want to use a decidedly irregular expression:
#@*$^&@#$&#!!!
Microsoft's documentation reads like a novel compared to IBM's. The typical IBM manual has the following format:
...and so on and so on.
PAGE 1:
[COMMAND1] is executed by typing the word [command1] followed by the argument string, followed by enter. The argument string consists of a sequence of non-whitespace characters separated by whitespace characters.
[COMMAND2] is executed by typing the word [command2] followed by the argument string, followed by enter. The argument string consists of a sequence of non-whitespace characters separated by whitespace characters.
[COMMAND3] is executed by typing the word [command3] followed by the argument string, followed by enter. The argument string consists of a sequence of non-whitespace characters separated by whitespace characters.
PAGE 2:
THIS PAGE IS INTENTIONALLY LEFT BLANK
Regarding this last IBM tradition (that others have tried to copy but few have truly mastered), the Spruce DVD Maestro manual has a page with the following text:
Blank page.
(mostly)
RMN
~~~
"Let's be precise in our use of language, or not."
Very compressed contentlessness.
"I see that you are writing a regular expression"
-- Knowing too much can get you killed, but knowing who knows too much can make you rich.
Regexps are interesting, sure.
Not really. I use them all the time and the only time they are interesting is when you're done and they look completely silly.
Every CS student enjoys (or suffers through!) the regexp section of their Intro to Computability (or equivalent) course.
Not really. I got a degree in Computer Engineering from the #2 private engineering school in the country and I was never taught regex. If you know how to program and not just crank out syntax, you can pick up regex on your own pretty fast.
And it is pretty fun thinking about the expressive power of, say (a|b)*a*b*
That is actually a really boring regex. Lots of a's or b's folowed by lots of a's followed by lots of b's. Wow. My brain is fried.
However, we have to face the facts, that regexps, as good as they are from a mathematical standpoint at matching things, just aren't that helpful in sorting through the sea of data that is the Internet.
Wow. You're probably right. I'll bet nothing that searches for things on the internet, such as google.com, uses any regex internally in their code. Now that I'm facing the facts, you're right, regex is worthless when it comes to searching through any amount of data.
The input data just aren't orderly enough for regexps to be of any use.
Yeah, regex is best used for very very simple patterns. Anything more complex than your above example is best suited for some serious hand-parsing in visual basic.
Think about it: when you are looking for wares or porn, where do you go? Perl? Nope.
I don't know WTF you're talking about. I find ALL my porn at www.perlmonks.org
That is why research into regexps is doomed to failure.
Yeah, I should probably throw away all that perl regex code I've written thats made my company lots (and I mean lots) of money in the market. It is doomed. I should writing my pattern matching code in the google.com language.
Thank you for posting about something you apparently know very little about. Good for an afternoon giggle.
Mark me as a troll or whatever but, "What are regular expressions?"
/:+[^:]/ statements? whats the big deal then?
are they those
I'm really, really new to perl, studying it out of an O'Rielly book. What does this mean to me?
forget it.
Not true. Yet.
Perl 5 regexes can solve NP-hard problems, but they're not quite Turing complete. However, they require only four additional stack operators to do that.
Personally, I'm waiting for the first Perl regex to become sentient.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Feb 20, 2042 - The day that the first true sentient artificial intelligence is created.
Feb 21, 2042 - The day it gets converted into a Perl one-liner.