Slashdot Mirror


Next Generation Regexp

prostoalex writes "Jeffrey E. F. Friedl, author of newly published 2nd edition of Mastering Regular Expressions, wrote a feature article for O'Reilly Network on the recent innovations in the regular expression world. You'd think that such area as regular expressions would be fairly stable, but according to the author, 'when I started to work on the second edition of Mastering Regular Expressions and started refocusing on the field, I was rather shocked to find out how much had really changed'. The article's behind-the-scene purpose is apparently to push a new book that O'Reilly published this month, but it has great educational value for anyone involved with practical extracting and reporting."

14 of 248 comments (clear)

  1. .NET regexps and Microsoft's documentation by Jobe_br · · Score: 4, Insightful

    I particularly like this bit:

    A full chapter on .NET-specific regex issues helps to clarify things, and helps to make up for the exceedingly poor documentation that Microsoft provides with the package.
    Nice to see that things haven't changed much ;)
  2. This has no educational purpose by Anonymous Coward · · Score: 3, Insightful

    Other than to tell us what is different between the two books. After reading the article I walked away with no general knowledge that was useful in using regular expresions, or what might be coming, or where we came from.

    It is a slightly wordy advertisment for why you should upgrade. The fact that it was foisted on us as something else annoys me, as I spent time reading it.

    I know, a slashdot reader that actually reads linked stories is such a minority, but come on, quite stuffing articles with advertising. Aren't the ads in the middle of a page enough?

  3. Contentless article by Shevek · · Score: 2, Insightful

    That is one of the most contentless articles I have seen in a long time.

    A regex is a type 3 grammar. Type 3 grammars haven't really changed since Chomsky's time.

    The smartarses will now proceed to point out that
    a) Perl is actually limited type 2
    b) Some change noone knows or cares about was made to some definition of the Chomsky hierarchy in ninteen dumdy-dum.

    Foo.

    1. Re:Contentless article by Get+Behind+the+Mule · · Score: 5, Insightful
      That is one of the most contentless articles I have seen in a long time.

      A regex is a type 3 grammar. Type 3 grammars haven't really changed since Chomsky's time.
      You get a B-, Bunky. And here's your cookie.

      After you've finished your untergrad CS theory class, you might go on to discover that implementations of regexes under various paradigms and in the various languages have extremely rich variety regarding syntax, semantics and efficiency. This isn't about the pristine theory of Prof. Chomsky, but about the actual use of regexes as programming constructs, and that's a tremendously complex subject. Friedl's book in the first edition is one of the best I've ever seen that has tackled such complexity and made it accessible and useful for the everyday business of programming.

      The article indicates that the practical use of regexes, far from stagnating since Chomsky's time, continues to evolve and grow. That's only "contentless" if you're stuck in the ivory tower and don't intend to leave.
  4. regexp and programmers by revscat · · Score: 4, Insightful

    Over the course of my career I have come to the rather firm opinion that you are not worth much as a coder if you do not know regular expressions. I don't care what language(s) you're proficient in, or if you've memorized every single design pattern the GoF has ever conceived, of do 4 foot by 6 foot UML diagrams in your head. If you can't do regexps then you're missing a basic skill. I bought Friedl's book a couple of years ago, and although I wound up not using man of the Perl related stuff the rest of the book helped me out immensely.

    A programmer without knowledge of regular expressions is like a carpenter without a hammer.

    1. Re:regexp and programmers by Anonymous Coward · · Score: 4, Insightful

      A programmer without knowledge of regular expressions is like a carpenter without a hammer.

      If ever there was an apt analogy of regular expressions - that's it! They make everything seem like a nail ;).

  5. Re:indeed by kdorff · · Score: 2, Insightful

    Ummm. Are you a programmer? Sure, you don't need Regexp's to solve every problem (and probably don't need them for MOST problems), but there are many problems that are solved so much more elegently WITH regexp's than without that once you understand them, IF you are a programmer, you wouldn't give them up. They are invaluable tool in a programmers toolkit.

  6. Re:at some point... by joshv · · Score: 4, Insightful

    Beyond a certain degree of complexity, it really doesn't make much sense anymore to use regular expressions--a simple built-in parser generator with executable annotations is both clearer and more powerful. Parser generator syntax allows comments, whitespace, with a simple, fairly standard syntax.

    Yes, regular expressions should be used to find particular patterns in text and perform basic manipulations on them. Beyond a certain point of complexity it really doesn't make sense to perform more complex manipulations. Get the information you want out of the string using a regular expression, then manipulate it in code.

    One has a feeling that regexp engines are just becoming programming languages in and of themselves - the only difference being that the 'program' consists of a string of cryptic single character commands, and the input is limited to a single string.

    -josh

  7. Friedl's book is a must read for Perl folks by Lumpish+Scholar · · Score: 5, Insightful

    It's not just a Perl book, but the language independent and Perl dependent parts are a godsend.

    I was a full time Perl programmer (with a two hour commute by rail) when Friedl's book came out. I read it cover to cover, and then recommended it strongly to my co-workers.

    Friedl shows how to write powerful, readable, efficient regular expressions that can do a lot of the work your program needs to do. It changed how my group wrote Perl (very much for the better). This is more than highly recommended; after the Blue Camel, and even before the Cookbook, this is a definitive book for all those who call themselves "Perl programmers."

    (In the first edition of the book, Friedl discovered some problems with regular expressions in early versions of Perl 5. The very next release of Perl -- 5.003, I think -- immediately fixed these problems. When Larry & Co. pay attention to a Perl book, maybe you should, too?)

    --
    Stupid job ads, weird spam, occasional insight at
  8. Re:Now, if only Google would support regexp search by quasi_steller · · Score: 2, Insightful

    The problem with regular expressions is that there are so many constraints. for example:

    1. \<John.+Doe\>
    should match:
    1. JohnBDoe
    1. JohnandDoe
    1. JohnDoe
    1. JohnClark
    2. ...text...JaneDoe
    But this shouldn't match:
    1. "Doe Re Me," sang John
    1. "Jane Doe and John
    1. "John Doe"
    2. As you can see, even with a very simple regular expression like this, the text has to be processed a lot to get the results needed. A simple "John AND Doe" would match all of the results while the regular expression puts more restraints on the search, which takes longer to process. For complex regular expressions, the searching of text becomes too slow for large amounts of data, such as the internet.

    --
    ...interesting if true.
  9. perl 6 is gonna change all this by millette · · Score: 4, Insightful
    Anyone here that read the latest perl apocalypse, #5 it was, knows full well the regex as we know and love them are out-the-window. The apocalypse is a large document, so I picked this page to give you a little idea of wants going to change. The pages before that mention all the warts that Larry wants to bury.

    I understand that Perl 6 isn't near being done, and that the "r" in "Perl" doesn't necessarily stand for "regex", depending on who you ask, but Perl will always have the greatest influence over what is called a regex. Or is that going to change with Perl 6?

  10. Is that so? by ochinko · · Score: 2, Insightful
    MSDN Library [microsoft.com] is the best single reference for everything Microsoft.

    Well, I don't find it fair that you were modded as a troll. You may be just misinformed.

    I can tell you that _any_ decent *nix gives you complete knowledge of what is going on in your machine. Without having to look at source code, without having to go to some central repository of information.

    Now, press Ctrl-Alt-Del in your favorite Windows and take a look at the name of the services. Try to enter any of them in the MSDN search. What do you see? Do they tell you what that service does? How is it started? How can you stop it?

    Do you still praise MSDN so high when you see that they don't even tell you the basics?

  11. Re:regexp criticism by thrig · · Score: 3, Insightful

    Sounds kind of like what the Regexp::English perl module does.

    You may also want to look at the YAPE::Regex series of modules that allow parsing/extracting/explaining of regex.

  12. Re:VB and Regexes by Tarpan · · Score: 2, Insightful

    Heh.. isn't the whole point of posting as AC to be just that, anonymous. Then why the hell did you sign it? ;) (assuming you did and not some impostor)