Slashdot Mirror


Next Generation Regexp

prostoalex writes "Jeffrey E. F. Friedl, author of newly published 2nd edition of Mastering Regular Expressions, wrote a feature article for O'Reilly Network on the recent innovations in the regular expression world. You'd think that such area as regular expressions would be fairly stable, but according to the author, 'when I started to work on the second edition of Mastering Regular Expressions and started refocusing on the field, I was rather shocked to find out how much had really changed'. The article's behind-the-scene purpose is apparently to push a new book that O'Reilly published this month, but it has great educational value for anyone involved with practical extracting and reporting."

7 of 248 comments (clear)

  1. .NET regexps and Microsoft's documentation by Jobe_br · · Score: 4, Insightful

    I particularly like this bit:

    A full chapter on .NET-specific regex issues helps to clarify things, and helps to make up for the exceedingly poor documentation that Microsoft provides with the package.
    Nice to see that things haven't changed much ;)
  2. regexp and programmers by revscat · · Score: 4, Insightful

    Over the course of my career I have come to the rather firm opinion that you are not worth much as a coder if you do not know regular expressions. I don't care what language(s) you're proficient in, or if you've memorized every single design pattern the GoF has ever conceived, of do 4 foot by 6 foot UML diagrams in your head. If you can't do regexps then you're missing a basic skill. I bought Friedl's book a couple of years ago, and although I wound up not using man of the Perl related stuff the rest of the book helped me out immensely.

    A programmer without knowledge of regular expressions is like a carpenter without a hammer.

    1. Re:regexp and programmers by Anonymous Coward · · Score: 4, Insightful

      A programmer without knowledge of regular expressions is like a carpenter without a hammer.

      If ever there was an apt analogy of regular expressions - that's it! They make everything seem like a nail ;).

  3. Re:at some point... by joshv · · Score: 4, Insightful

    Beyond a certain degree of complexity, it really doesn't make much sense anymore to use regular expressions--a simple built-in parser generator with executable annotations is both clearer and more powerful. Parser generator syntax allows comments, whitespace, with a simple, fairly standard syntax.

    Yes, regular expressions should be used to find particular patterns in text and perform basic manipulations on them. Beyond a certain point of complexity it really doesn't make sense to perform more complex manipulations. Get the information you want out of the string using a regular expression, then manipulate it in code.

    One has a feeling that regexp engines are just becoming programming languages in and of themselves - the only difference being that the 'program' consists of a string of cryptic single character commands, and the input is limited to a single string.

    -josh

  4. Friedl's book is a must read for Perl folks by Lumpish+Scholar · · Score: 5, Insightful

    It's not just a Perl book, but the language independent and Perl dependent parts are a godsend.

    I was a full time Perl programmer (with a two hour commute by rail) when Friedl's book came out. I read it cover to cover, and then recommended it strongly to my co-workers.

    Friedl shows how to write powerful, readable, efficient regular expressions that can do a lot of the work your program needs to do. It changed how my group wrote Perl (very much for the better). This is more than highly recommended; after the Blue Camel, and even before the Cookbook, this is a definitive book for all those who call themselves "Perl programmers."

    (In the first edition of the book, Friedl discovered some problems with regular expressions in early versions of Perl 5. The very next release of Perl -- 5.003, I think -- immediately fixed these problems. When Larry & Co. pay attention to a Perl book, maybe you should, too?)

    --
    Stupid job ads, weird spam, occasional insight at
  5. Re:Contentless article by Get+Behind+the+Mule · · Score: 5, Insightful
    That is one of the most contentless articles I have seen in a long time.

    A regex is a type 3 grammar. Type 3 grammars haven't really changed since Chomsky's time.
    You get a B-, Bunky. And here's your cookie.

    After you've finished your untergrad CS theory class, you might go on to discover that implementations of regexes under various paradigms and in the various languages have extremely rich variety regarding syntax, semantics and efficiency. This isn't about the pristine theory of Prof. Chomsky, but about the actual use of regexes as programming constructs, and that's a tremendously complex subject. Friedl's book in the first edition is one of the best I've ever seen that has tackled such complexity and made it accessible and useful for the everyday business of programming.

    The article indicates that the practical use of regexes, far from stagnating since Chomsky's time, continues to evolve and grow. That's only "contentless" if you're stuck in the ivory tower and don't intend to leave.
  6. perl 6 is gonna change all this by millette · · Score: 4, Insightful
    Anyone here that read the latest perl apocalypse, #5 it was, knows full well the regex as we know and love them are out-the-window. The apocalypse is a large document, so I picked this page to give you a little idea of wants going to change. The pages before that mention all the warts that Larry wants to bury.

    I understand that Perl 6 isn't near being done, and that the "r" in "Perl" doesn't necessarily stand for "regex", depending on who you ask, but Perl will always have the greatest influence over what is called a regex. Or is that going to change with Perl 6?