Next Generation Regexp
prostoalex writes "Jeffrey E. F. Friedl, author of newly published 2nd edition of Mastering Regular Expressions, wrote a feature article for O'Reilly Network on the recent innovations in the regular expression world. You'd think that such area as regular expressions would be fairly stable, but according to the author, 'when I started to work on the second edition of Mastering Regular Expressions and started refocusing on the field, I was rather shocked to find out how much had really changed'. The article's behind-the-scene purpose is apparently to push a new book that O'Reilly published this month, but it has great educational value for anyone involved with practical extracting and reporting."
I particularly like this bit:
Nice to see that things haven't changed muchOther than to tell us what is different between the two books. After reading the article I walked away with no general knowledge that was useful in using regular expresions, or what might be coming, or where we came from.
It is a slightly wordy advertisment for why you should upgrade. The fact that it was foisted on us as something else annoys me, as I spent time reading it.
I know, a slashdot reader that actually reads linked stories is such a minority, but come on, quite stuffing articles with advertising. Aren't the ads in the middle of a page enough?
That is one of the most contentless articles I have seen in a long time.
A regex is a type 3 grammar. Type 3 grammars haven't really changed since Chomsky's time.
The smartarses will now proceed to point out that
a) Perl is actually limited type 2
b) Some change noone knows or cares about was made to some definition of the Chomsky hierarchy in ninteen dumdy-dum.
Foo.
Over the course of my career I have come to the rather firm opinion that you are not worth much as a coder if you do not know regular expressions. I don't care what language(s) you're proficient in, or if you've memorized every single design pattern the GoF has ever conceived, of do 4 foot by 6 foot UML diagrams in your head. If you can't do regexps then you're missing a basic skill. I bought Friedl's book a couple of years ago, and although I wound up not using man of the Perl related stuff the rest of the book helped me out immensely.
A programmer without knowledge of regular expressions is like a carpenter without a hammer.
Ummm. Are you a programmer? Sure, you don't need Regexp's to solve every problem (and probably don't need them for MOST problems), but there are many problems that are solved so much more elegently WITH regexp's than without that once you understand them, IF you are a programmer, you wouldn't give them up. They are invaluable tool in a programmers toolkit.
Beyond a certain degree of complexity, it really doesn't make much sense anymore to use regular expressions--a simple built-in parser generator with executable annotations is both clearer and more powerful. Parser generator syntax allows comments, whitespace, with a simple, fairly standard syntax.
Yes, regular expressions should be used to find particular patterns in text and perform basic manipulations on them. Beyond a certain point of complexity it really doesn't make sense to perform more complex manipulations. Get the information you want out of the string using a regular expression, then manipulate it in code.
One has a feeling that regexp engines are just becoming programming languages in and of themselves - the only difference being that the 'program' consists of a string of cryptic single character commands, and the input is limited to a single string.
-josh
It's not just a Perl book, but the language independent and Perl dependent parts are a godsend.
I was a full time Perl programmer (with a two hour commute by rail) when Friedl's book came out. I read it cover to cover, and then recommended it strongly to my co-workers.
Friedl shows how to write powerful, readable, efficient regular expressions that can do a lot of the work your program needs to do. It changed how my group wrote Perl (very much for the better). This is more than highly recommended; after the Blue Camel, and even before the Cookbook, this is a definitive book for all those who call themselves "Perl programmers."
(In the first edition of the book, Friedl discovered some problems with regular expressions in early versions of Perl 5. The very next release of Perl -- 5.003, I think -- immediately fixed these problems. When Larry & Co. pay attention to a Perl book, maybe you should, too?)
Stupid job ads, weird spam, occasional insight at
The problem with regular expressions is that there are so many constraints. for example:
- \<John.+Doe\>
should match:- JohnClark
- ...text...JaneDoe
But this shouldn't match:As you can see, even with a very simple regular expression like this, the text has to be processed a lot to get the results needed. A simple "John AND Doe" would match all of the results while the regular expression puts more restraints on the search, which takes longer to process. For complex regular expressions, the searching of text becomes too slow for large amounts of data, such as the internet.
...interesting if true.
I understand that Perl 6 isn't near being done, and that the "r" in "Perl" doesn't necessarily stand for "regex", depending on who you ask, but Perl will always have the greatest influence over what is called a regex. Or is that going to change with Perl 6?
Well, I don't find it fair that you were modded as a troll. You may be just misinformed.
I can tell you that _any_ decent *nix gives you complete knowledge of what is going on in your machine. Without having to look at source code, without having to go to some central repository of information.
Now, press Ctrl-Alt-Del in your favorite Windows and take a look at the name of the services. Try to enter any of them in the MSDN search. What do you see? Do they tell you what that service does? How is it started? How can you stop it?
Do you still praise MSDN so high when you see that they don't even tell you the basics?
Sounds kind of like what the Regexp::English perl module does.
You may also want to look at the YAPE::Regex series of modules that allow parsing/extracting/explaining of regex.
Heh.. isn't the whole point of posting as AC to be just that, anonymous. Then why the hell did you sign it? ;) (assuming you did and not some impostor)