Slashdot Mirror


Mastering Regular Expressions

gianluca writes "Having always been a heedful guy, I always duly did my homework, going through the lengthy manual pages of a number of regular expressions (regex) crunching tools. You name it: be it PERL, awk, emacs, sed or even one of the .NET framework languages -- any such program provides support for the same regex expressions (or at least, so they seem to the occasional observer). After some years of regex practice with these tools, I had the pretentious conviction that I knew my way through the intricacies of patterns, grouping, greediness, and the like. When I first stepped into Mastering Regular Expressions, looking at the nearly 500 pages which build up Friedl's book, I wondered what could someone ever have to say about regexes to fill so many pages." Gianluca ended up finding plenty of worthwhile content; read below for his review. Mastering Regular Expressions, 2nd edition author Jeffrey E. Friedl pages 460 publisher O'Reilly rating 9.5 reviewer Gianluca Insolvibile ISBN 0596002890 summary An in-depth guide to lead the apprentice to mastering regular expressions' wizardry

My first suspicion, I admit, was that I was facing one of the countless "man page reprints" that you find these days. It was only after reading the book that I eventually understood: before then, I had had no idea of what regexes were really about.

What it's about The book is logically divided into three parts: the first one (Chapters 1, 2 and 3) introduces the reader to the basic concepts of regexes, building a common ground upon which the subsequent chapters will be based. The introduction is clear and straightforward, and lets the readers quickly grasp the key points in the regex business. This part is more or less a good summary, presenting information that can be found also in existing manual pages (albeit presented in a distilled form, which lets you perceive that the author has very clear ideas about the matter). If you already know something about regexes, you could skip this part entirely -- even if reading it turns out to be a nice occasion to brush up and overhaul your knowledge.

The second part (Chapters 4, 5 and 6), is the one that struck me most for the depth of provided information and the richness of though. Rather than throwing at the reader usage dictates on one or another regex flavour, the author explains with a wealth of details the inward mechanisms which make regexes run and how you can exploit such knowledge to write better expressions.

Chapter 4 presents the different families of regex processing engines (namely, DFA, traditional and POSIX NFA), whose internal behavior differs so greatly that writing a regex in the appropriate way can make a substantial difference in both efficacy and efficiency. If you thought you knew it all about greedy and lazy regex operators, possessive quantifiers, backreferences and lookaround, you'd better think again: I was pleasantly surprised to discover how ignorant I was (to be honest, I had never heard of lookaround operators before!).

Chapter 5 slows down a little bit to let the reader absorb the massive previous chapter. Some simple (but still tricky) examples are presented, showing how to apply the techniques explained up to this point. A couple of examples are perhaps too contrived (ever needed to match aligned groups of 5 digits in an unspaced stream of characters?), but it is instructive anyway to follow the reasoning behind the construction of a complex regex.

Chapter 6 focuses on efficiency, considering how backtracking and matching can drive your regex engine to exponential complexities. Optimization techniques are then presented, first by explaining the automatic optimizations performed by the most common regex engines and then by giving a practical list of hints that you can follow to be sure that your expression will run as fast as possible. Again, I was quite surprised to find out how small changes in a regex can make such a big difference to the engine (and give rise to noticeable performance penalties if ignored).

What I absolutely liked most was that the author explains exactly why a certain optimization works, based on the information given in Chapter 4 (and provided that you have been able to assimilate it in the first pass). Finally, a paragraph entitled "Unrolling the loop" really put me in a good mood, reminding me of the past times of "old school" asm programming.

The third part of the book devotes three chapters to PERL, Java and .NET, respectively. Each chapter goes through the syntax and features of regexes for each language: while the information provided on Java and (VB).NET is quite commonplace, in the case of PERL the author deals with aspects rarely covered elsewhere, like dynamic regexes, embedded-code constructs, regex-literal overloading and specific optimization techniques.

What's to like In one word: insight. The author is definitely knowledgeable of regular expressions and the whole book is filled with thoughtful suggestions and hints. Still, a friendly and straightforward writing style makes reading pleasant and seldom boring (well, you wanted details, didn't you?) while you learn internal regex mechanics rarely available elsewhere.

A further nice point is the broad view offered to the reader, starting from regexes in general and focusing on specific flavours only in the final part of the book. The second edition also offers up-to-date information, covering the .NET framework and the latest versions of PERL (5.8) and Java (1.4).

What's to consider Despite the book's reassuring conversational tone, dealing with such a specific topic with so many in-depth details might sometimes become boring, especially if you do not have a strong interest in getting the most out of regular expressions or in knowing how they internally work. If you are just an occasional regex user and dwell in manual pages, you can probably live without this book. Also, it is a pity that specific sections on Tcl, emacs and awk have disappeared in the second edition (maybe they were not as current as the .NET framework ?) and that pcre (a C regex library) is barely mentioned. The summary Regular expressions are tied so strongly to the *nix culture that everyone who has been exposed to that culture has come to use them in a more or less conscious way. Still, most of the documentation around lags on basic features and presents only the most common regex operators. Mastering Regular Expressions is the book to read if you want to go further and get serious about regexes: even if extreme optimization might not be a big concern today, understanding how regex engines work under the hood greatly helps also in creating everyday small expressions. Table of Contents Preface
Chapter 1. Introduction to Regular Expressions
Chapter 2. Extended Introductory Examples
Chapter 3. Overview of Regular Expression Features and Flavors
Chapter 4. The Mechanics of Expression Processing
Chapter 5. Practical regex techniques
Chapter 6. Crafting a Regular Expression
Chapter 7. Perl
Chapter 8. Java
Chapter 9. .NET

You can purchase the Mastering Regular Expressions, 2nd edition from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

252 comments

  1. i mastered regular expressions by Anonymous Coward · · Score: 5, Funny

    when figuring out the lameness filter

  2. Slashbot book review by rkz · · Score: 1, Informative

    This one is a great addition to the book shelf, you all know how to do certain things regular expression but this book clarifies nicely why you are actually doing it. Also, it introduces nice advanced concepts which occasional regex users might not have come across before.

    1. Re:Slashbot book review by Anonymous Coward · · Score: 0

      What is to flamm?

    2. Re:Slashbot book review by Anonymous Coward · · Score: 0

      a 'flam' is a drum rudiment in which one stick hits a drum hard just after a quieter stroke, sounding almost like one single loud drum hit. HTH you fucking fruitbag

      --sa

    3. Re:Slashbot book review by Anonymous Coward · · Score: 0

      previous review
      here

  3. Perl, Java, .NET.. oh my! by Gortbusters.org · · Score: 3, Interesting

    This sounds like a nifty tool for those who have to switch programming environments quite often. I always find myself going back to the books when I either have to write a regex myself or decypher someone elses crazy looking expression.

    --
    --------
    Free your mind.
    1. Re:Perl, Java, .NET.. oh my! by ErikZ · · Score: 1

      I'm surprised PHP isn't in there. I guess you can just use perl compatible regular expression functions.

      --
      Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
    2. Re:Perl, Java, .NET.. oh my! by mackstann · · Score: 1

      Why use PHP's ereg* functions when preg* functions are faster and more powerful?

    3. Re:Perl, Java, .NET.. oh my! by sketerpot · · Score: 2, Informative
      The big part of regular expressions is learning how to read and write them well. After that, just find some documentation for your language of choice.
    4. Re:Perl, Java, .NET.. oh my! by melonman · · Score: 1

      The big problem with PHP and regexes is that the C-like syntax makes no concessions to the needs of regular expressions. I ported some regexes from Perl to PHP using preg a while back, and while the regexes themselves didn't change, the guff around them was a lot more opaque in PHP. I guess this is the price PHP users pay for a 'consistent' language: pity the syntax was designed for writing operating systems at quasi-assembler level, not applications...

      --
      Virtually serving coffee
  4. Don't go overboard by apsmith · · Score: 3, Interesting

    I read the first edition of this book - it was great, and completely changed the way I handled (and understood) perl regular expressions. It's tempting, after reading this book, to try to apply regex's to everything! Friedl had an example of a huge, horrible (but efficient) regex to parse mail headers in the first edition - my advice on that is, don't try that at home! Interspersing procedural logic with the regex's tends to make much cleaner and more readable code...

    --

    Energy: time to change the picture.

    1. Re:Don't go overboard by sharlskdy · · Score: 5, Insightful

      When all you have is a hammer, everything looks like a nail. And, REGEX is one HUGE hammer!

    2. Re:Don't go overboard by eidechse · · Score: 1

      Hehe...I liked showing that 6598 char monstrosity to people when they asked about email address validation...

      It's too bad it wans't included in the 2nd addition, if only for the amusement value.

    3. Re:Don't go overboard by sigxcpu · · Score: 1

      On the other hand,
      If all you want to do is put in a nail, everything looks like a hammer.

      --
      As of Postgres v6.2, time travel is no longer supported.
    4. Re:Don't go overboard by willtsmith · · Score: 1

      I bough this book (expensive) because I was trying to do some relatively simple stuff and good explanations weren't forthcoming on various websites (or MS documentation (usually MS documentation is pretty good)). Overall, it explained what I was after. The various chapters for Regex on xxx technology is also helpful though, you feel a bit gipt if you only use a COUPLE of these chapters.

      The ultimate testament to Regex is that it NEEDS a book like this to understand it's complexities. The syntax is EXTREMELY context sensitive, some constructs mean completely different things in different contexts.

      I could compare this to C++ vs Java/C#, but it would be unfare because C++ is WAY easier then Regex to parse and understand. The contructs are often indistinguishable from text and all over it requires close scrutinization to figure out whats going on. It almost reminds me of the old "lets make the biggest program in 40 lines" contests that produced very clever programs with practically unreadable code.

      Regex is VERY cool and EXTREMELY functional. But I think a replacement is ultimately in order that is a bit LESS compact, has consistent construct usage (metacharacters), and overall just plain easier to understand and write read without having to go over statements with a fine-toothed comb.

      --
      -------- -------- Support Wesley Clark for president!!!
    5. Re:Don't go overboard by tshak · · Score: 4, Funny

      Friedl had an example of a huge, horrible (but efficient) regex to parse mail headers in the first edition

      And I'm pissed that it's NOT in the second edition (at least it couldn't easily be found). I was trying to impress this chick at B&N the other day by showing her how I understood that longass expression and low-and-behold, the back page where it's SUPPOSED to be is filled with a 3 line regex - not very impressive after you've made a huge deal about a full-page regex. Fortunately it all worked out since I had the original at home, and I was like "well, you'll just have to come over to MY place to check out the big regex". ;-)

      --

      There is no longer anything that can be done with computers that is nontrivial and clearly legal. -- Paul Phillips
    6. Re:Don't go overboard by kmellis · · Score: 2, Funny
      And I'm pissed that it's NOT in the second edition (at least it couldn't easily be found). I was trying to impress this chick at B&N the other day by showing her how I understood that longass expression and low-and-behold, the back page where it's SUPPOSED to be is filled with a 3 line regex - not very impressive after you've made a huge deal about a full-page regex. Fortunately it all worked out since I had the original at home, and I was like "well, you'll just have to come over to MY place to check out the big regex". ;-)
      When I read this book, I found myself in amazement at the enormous powers of regexes--you can do almost anything with them!

      However, it never occured to me, oddly, to use regexes as a tool of seduction. I guess I just don't understand the ladies.

    7. Re:Don't go overboard by arkanes · · Score: 1

      The Activestate Perl IDE supposedly has a step through regexp debugger. Someday I'm going to have to get that thing.

    8. Re:Don't go overboard by Imperator · · Score: 1

      Yes, but every now and then you'll make a mistake and the hammer will smash not only the nail, but the entire project, your workbench, and most of Guatemala. Up to the end of the line, that is.

      --

      Gates' Law: Every 18 months, the speed of software halves.
    9. Re:Don't go overboard by Anonymous Coward · · Score: 0
      and I was like "well, you'll just have to come over to MY place to check out the big regex". ;-)

      And while she was playing with your big regex, did you play with her Perl?

  5. Regular Expressions by $calar · · Score: 2, Insightful

    I am so happy that this book is out. I love regular expressions (first saw them in Perl and JavaScript), and I considered buying the first edition from O'Reilly last year, but I thought that it would be best to wait and get the next edition (plus I had about 5 other O'Reilly titles to read at the time). I wish that there was better support for regular expressions in languages like C/C++. Does anyone know of a good library for it because there is no support for it in the language that I know of? Thanks.

    1. Re:Regular Expressions by qorkfiend · · Score: 3, Informative
    2. Re:Regular Expressions by rkz · · Score: 4, Informative

      try this

      Its caldera's c++ portable regex lib.

    3. Re:Regular Expressions by sw155kn1f3 · · Score: 1

      1. Use Caldera's (SCO's) regex lib
      2. Go to curt
      3. Loose money

      --
      - Arwen, I'm your father, Agent Smith.
      - Well, you're just Smith, but my father is Aerosmith!
    4. Re:Regular Expressions by rkz · · Score: 1

      4.??????

      5.Profit

    5. Re:Regular Expressions by pi_rules · · Score: 2, Funny
      Its caldera's c++ portable regex lib.


      Don't! It's probably got a Unix kernel in it. Beware the lawyers.
    6. Re:Regular Expressions by carlos_benj · · Score: 1

      Loose money? Dang! How much of it is loose? Somebody help me gather it up!

      --

      --

      As a matter of fact, I am a lawyer. But I play an actor on TV.

    7. Re:Regular Expressions by Anonymous Coward · · Score: 0
      try this

      Its caldera's c++ portable regex lib.

      "It's a trap!"

    8. Re:Regular Expressions by withak53 · · Score: 1

      RWC Strings have decent RegEx.

    9. Re:Regular Expressions by Anonymous Coward · · Score: 0
      Try Greta

      It's supposed to be faster than the boost regex library.

    10. Re:Regular Expressions by Marc2k · · Score: 1

      Really? Curt never took loose money from me before, he used to be such a nice boy..

      --
      --- What
    11. Re:Regular Expressions by Lost+Race · · Score: 1

      He obviously meant Curt is handing out loose money (i.e. spare change). So Curt is still a nice boy. You won't get rich, but hey, loose money!

  6. Different than 1st Edition? by khef · · Score: 3, Interesting

    Can anyone that's read this describe what's changed from the first edition? Is it worth shelling out the cash if you already have the first one?

    1. Re:Different than 1st Edition? by sharlskdy · · Score: 5, Informative

      You can read about the differences by clicking here, which is an article by the author outlining the differences.

    2. Re:Different than 1st Edition? by The+Clockwork+Troll · · Score: 0, Flamebait
      Can anyone that's read this describe what's changed from the first edition?
      By "this" did you mean the article, which discusses that "sections on Tcl, emacs, and awk" have disappeared and that they have been replaced on sections about .NET and the latest versions of perl and Java?

      I have a serious question for you. If you bought this book, would you actually read it?

      --

      There are no karma whores, only moderation johns
    3. Re:Different than 1st Edition? by Anonymous Coward · · Score: 0

      The main diffs as I see it are updates for new versions of Perl and expansion of coverage to Java and .Net. If you've read the first edition, I'd recommend just getting the freely available PDF's of Java and .Net features from the O'Reilly website. OTOH, if you are a regex fanatic, it might be worth it to pick up the new edition.

      http://www.oreilly.com/catalog/regex2/

  7. Useful? by dunston1212 · · Score: 0

    objRegExpr.Pattern = "I could use this book, I don't tend to get the most out of my regexs."

    strSearchOn = "I could use this book, I don't tend to get the most out of my regexs."

    Set colMatches = objRegExpr.Execute(strSearchOn)

    --
    Here
  8. Line endings with sed by Anonymous Coward · · Score: 0
    A bit off topic...

    Can anyone tell me:
    Is there a simple way to do perl -p -e 's/\r\n/\n/g; s/\r/\n/g' with sed?

    Do I need to buy this book?

    1. Re:Line endings with sed by Anonymous Coward · · Score: 1, Funny

      looks like it.

  9. I was going to read this by L.+VeGas · · Score: 4, Funny

    but instead I *

    1. Re:I was going to read this by nick_urbanik · · Score: 3, Funny
      but instead I *

      ...read spaces to the end of the line, or the next non-space character :-)

    2. Re:I was going to read this by mmol_6453 · · Score: 1

      No, I believe that would be

      but instead I +

      Or if you wanted to be more specific in Perl usage,

      /but\ instead\ I\ +/

      --
      What's this Submit thingy do?
    3. Re:I was going to read this by nick_urbanik · · Score: 2, Insightful

      In Perl, no need to escape spaces. You just added the requirement that there must be at least one space. If you want to be pedantic, at least please be correct!

    4. Re:I was going to read this by hackstraw · · Score: 1

      In Perl, no need to escape spaces. You just added the requirement that there must be at least one space. If you want to be pedantic, at least please be correct!

      OK, then please specify what version of Perl you are talking about. Version 6 regexps default to using the /x option, so you would need to escape the whitespace.

    5. Re:I was going to read this by nick_urbanik · · Score: 1
      OK, then please specify what version of Perl you are talking about.

      The version of Perl discussed in the topic of this discussion, Mastering Regular Expressions, i.e., version 5.8.

    6. Re:I was going to read this by mmol_6453 · · Score: 1

      How can I be correct if I'm a beginner? I just learned Perl 5.005 (in a week) out of Programming Perl: 2nd Edition.

      Anyway, Perl allows you to do the same thing in just about as many ways as you can imagine. Your is one way. Mine is another. :)

      --
      What's this Submit thingy do?
  10. Re:pff by Anonymous Coward · · Score: 0

    Thats shell expansion, not a regex. * is a quantifier in regex, and needs to have something to it's left to quantify (such as .* or [0-9a-f]*).

  11. Are there really standards? by Thinkit3 · · Score: 0

    Don't most languages just come up with their own, especially with anoying things like the LF/CR problem?

    --
    -Libertarian secular transhumanist
    1. Re:Are there really standards? by tarquin_fim_bim · · Score: 1

      That Sir, is a classic, "No problems, only opportunities", situation. It allows you to reject all mail composed on windows machines.

  12. Cheap prices on Half.com by cybermint · · Score: 5, Informative

    I just purchased an almost new copy on Half.com for under $15 including shipping. There are still a few left at prices far lower than amazon.com or bn.com. Here is the half/ebay link.

    1. Re:Cheap prices on Half.com by cybermint · · Score: 2, Informative

      DOH! I didn't notice. I wish slashdot would let you edit posts.

      At $15 compared to $30, I'm not going to cancel my order even if it is just 1st edition. The only parts I'll miss is the extra info on new Perl 5.8 features, and maybe the unicode stuff. Guess I'll be reading perldoc.com for that.

  13. Obligatory crap regexp joke by BabyDave · · Score: 5, Funny
    Regular expressions are tied so strongly to the *nix culture
    Shouldn't that be .*nix instead?
    1. Re:Obligatory crap regexp joke by Anonymous Coward · · Score: 0

      .*?nix perhaps? I guess that depends on the context. :)

    2. Re:Obligatory crap regexp joke by simetra · · Score: 1

      No
      Probably something more like:
      .\{1,5\}[nN]\{1\}[aeiouAEIOU]\{1\}[xX]

      --

      "Would it kill you to put down the toilet seat?" -- Maya Angelou
    3. Re:Obligatory crap regexp joke by GoRK · · Score: 1

      how about /([a-z]?[a-z](ni|i|nu)x|[a-z]*bsd)/i

    4. Re:Obligatory crap regexp joke by Anonymous Coward · · Score: 0

      Yes, shell globbing regexes would use *nix

    5. Re:Obligatory crap regexp joke by Anonymous Coward · · Score: 0

      personally, i've always used :

      [(*n?x)|(*BSD)] :)

    6. Re:Obligatory crap regexp joke by Anonymous Coward · · Score: 0

      .+nix

    7. Re:Obligatory crap regexp joke by Anonymous Coward · · Score: 0

      How are regular expressions tied to phoenix culture?

    8. Re:Obligatory crap regexp joke by RubberChainsaw · · Score: 1

      Blaspheme! You, sir, need to read brother Ovid's dissertation: Death to Dot Star! to find out why you shouldn't use .* and what to use in its place.

      :)

      --
      I welcome our new 99% overlords.
    9. Re:Obligatory crap regexp joke by Anonymous Coward · · Score: 0
      It's official, [(*n?x)|(*BSD)] is dying.

      sorry, couldn't resist.

    10. Re:Obligatory crap regexp joke by RevMike · · Score: 1

      So HP-UX and Minix are out?

    11. Re:Obligatory crap regexp joke by iabervon · · Score: 1

      I'll say. Nobody uses globbing for any serious work any more...

    12. Re:Obligatory crap regexp joke by Anonymous Coward · · Score: 0

      If I was writing a regexp for this, it would be something along the lines of ((gnu/)?linux|(open|net|free)bsd|aix|hpux|irix|osf /1|ultrix|tru64 unix|digital unix|solaris|sysv|macos x|darwin|nextstep|openstep|gnu/hurd|a/ux|unicos)
      and I'm probably still missing dozens of early 80's unices.

  14. What's new in this edition? by kbeer · · Score: 2

    I read the first edition and loved it. Can anyone who has read both editions say if it's worth buying the second edition?

    My only complaint about the book is that non-techies looked at the title when I was reading and said, "Aren't 'Hi there' are 'How are you?' regular expressions?"

    1. Re:What's new in this edition? by Anonymous Coward · · Score: 1, Funny

      Yes, they are. The first matches only the string 'Hi there', the second will match 'How are yo' or 'How are you'.

  15. Perl, not "PERL" by carl67lp · · Score: 5, Informative

    It's always surprised me when I see intelligent people write "PERL" when they refer to Larry Wall's programming language.

    From the Perl FAQ, General Questions About Perl:

    What's the difference between "perl" and "Perl"?
    One bit. Oh, you weren't talking ASCII? :-) Larry now uses ``Perl'' to signify the language proper and ``perl'' the implementation of it, i.e. the current interpreter. Hence Tom's quip that ``Nothing but perl can parse Perl.'' You may or may not choose to follow this usage. For example, parallelism means ``awk and perl'' and ``Python and Perl'' look ok, while ``awk and Perl'' and ``Python and perl'' do not. But never write ``PERL'', because perl isn't really an acronym, aprocryphal folklore and post-facto expansions notwithstanding.

    You can read the entire FAQ if you like.

    1. Re:Perl, not "PERL" by josevnz · · Score: 1

      Don't forget your "Lord of The Rings" pillow for today pijama party, you little geek :D. (Yes, PERL in uppercase instead of lowercase makes a LOT OF DIFFERENCE to the real world :)).

      --
      Jose Vicente Nunez Zuleta RHCE, SJCD, SJCP
    2. Re:Perl, not "PERL" by br0ck · · Score: 5, Informative

      From an interesting interview with Larry Wall - 1999..

      Marjorie: Well, that certainly answered the question fully. I must admit I didn't expect you to go back as far as the beginning of the Universe. :-) How'd you come up with that name?

      Larry: I wanted a short name with positive connotations. (I would never name a language ``Scheme'' or ``Python'', for instance.) I actually looked at every three- and four-letter word in the dictionary and rejected them all. I briefly toyed with the idea of naming it after my wife, Gloria, but that promised to be confusing on the domestic front. Eventually I came up with the name ``pearl'', with the gloss Practical Extraction and Report Language. The ``a'' was still in the name when I made that one up. But I heard rumors of some obscure graphics language named ``pearl'', so I shortened it to ``perl''. (The ``a'' had already disappeared by the time I gave Perl its alternate gloss, Pathologically Eclectic Rubbish Lister.)

      Another interesting tidbit is that the name ``perl'' wasn't capitalized at first. UNIX was still very much a lower-case-only OS at the time. In fact, I think you could call it an anti-upper-case OS. It's a bit like the folks who start posting on the Net and affect not to capitalize anything. Eventually, most of them come back to the point where they realize occasional capitalization is useful for efficient communication. In Perl's case, we realized about the time of Perl 4 that it was useful to distinguish between ``perl'' the program and ``Perl'' the language. If you find a first edition of the Camel Book, you'll see that the title was Programming perl, with a small ``p''. Nowadays, the title is Programming Perl.

    3. Re:Perl, not "PERL" by Anonymous Coward · · Score: 0
      Bravo to you.

      Knowing the proper capitalization of the language is clearly a fine substitute for actual language mastery.

    4. Re:Perl, not "PERL" by Anonymous Coward · · Score: 0

      ... unless you use a regex.

  16. Re:pff by swordgeek · · Score: 1

    Which isn't actually a regex at all.

    --

    "People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
  17. I concur by Speare · · Score: 5, Insightful
    I completely concur with the poster's prejudices and pleasant surprise at the scope of the book. Having learned and used regex since 1986, and having worked on the internals of a couple lightweight C regex engines, I figured I knew all I needed to know. Having seen how many people just get hung up on the basic concept and syntax of regex, I assumed this was going to be a rehash.

    This is no "Learn Regex in 21 Days" or "Regex for Dummies" book with lots of tips on page 400 about how the | is useful for finding Jones OR Smith. If you haven't gotten that down yet, this book's not for you.

    As the reviewer says, this is a very worthwhile cover-to-cover read which will turn your empirical experiences with regex into a more structured understanding of the science and engineering of advanced regex. As a reference on my shelf, it sits comfortably next to Knuth's AoCP and Foley & van Damme.

    --
    [ .sig file not found ]
    1. Re:I concur by Horny+Smurf · · Score: 1

      I have the first edition. It doesn't get deep into the internals of NFA/DFA construction and theory like the dragon book does, but it does show how to apply regular expressions, and mistakes you might make. Part cookbook, part guidebook.

  18. netLibrary by dboyles · · Score: 4, Informative

    I first started reading this book via netLibrary through my school's library. Just the first two chapters are enough to explain regular expressions to the point where one can use them effectively in programs. The remaining chapters expand on this information and discuss language specifics. I bought a paper copy to have on my shelf, and I constantly find myself referencing it.

    To those at universities, see if your school offers netLibrary-based books. It's easy to read and it's free.

    --
    -- "Complacency is a far more dangerous attitude than outrage." -Naomi Littlebear
    1. Re:netLibrary by mmol_6453 · · Score: 0, Offtopic

      I just checked, and no O'Reilly books are available through my college's netLibrary account. Not surprising, unfortunately.

      --
      What's this Submit thingy do?
  19. about the missing info by ChristTrekker · · Score: 1

    Agreed. Especially since the reviewer mentioned that Tcl/emacs/awk/pcre regex's were not covered in any detail in this edition. A small appendix at the end summarizing the syntax in those languages would have been helpful to many, I'm sure. It would be nice if O'Reilly did publish such a thing as errata, so we could print it out and tuck it in the back cover or something.

  20. I just can't fathom this by Anonymous Coward · · Score: 4, Funny

    Now, I thought I was reading a simple article about a programming book review. And here I come across this thread of epic mirth. Somehow you have single-handedly crafted a finely-tuned piece fun-joy from what was a rather mundane topic. I just have to page my boss back to the office to see this! Gather round the water cooler old salts and let me spin a comedic yarn I saw this day on Slashdot. Using an asterix to finish a sentence we would have all seen as being finished in a different manner? Well sir, someone set you up the bomb. You have taken that bomb, added the asterix into the mix and exploded laugh-shrapnel into Slashdot proper. I couldn't even scroll down without getting struck in the eye with a piece of your fun-bomb. Mods, mod this man's excursion into the comedy arena as +5 StopItHurts. Here we sit, emotionally spent and basking in the aftermath of your comedic genius. Thank you kind sir, thank you.

    1. Re:I just can't fathom this by L.+VeGas · · Score: 1

      You're welcome.

      But you owe me.

    2. Re:I just can't fathom this by Tjebbe · · Score: 1

      Maybe i should read the book, but wouldn't this match only sentences ending with an arbitrary number of spaces?

      Has anyone patented all programs matching '.*' yet?

    3. Re:I just can't fathom this by Anonymous Coward · · Score: 0

      Using an asterix to finish a sentence

      We prefer "asterisk" or "klein star".

      thanks.

    4. Re:I just can't fathom this by Anonymous Coward · · Score: 0

      that's "kleene star".

      Back on the short bus with ya.

    5. Re:I just can't fathom this by Anonymous Coward · · Score: 0

      thanks for making me laugh

    6. Re:I just can't fathom this by Artemis+P.+Fonswick · · Score: 1

      That was quite possibly the funniest thing I've read all day. I don't care if I'm modded down, I just have to give props to the parent.

      You, sir, are a gentleman and a scholar.

      --


      Kudos to you, my good man.
    7. Re:I just can't fathom this by Requiem · · Score: 1

      Kleene star. Back to your algorithms class, whelp.

    8. Re:I just can't fathom this by Anonymous Coward · · Score: 0

      "klein star"?

      I think you mean "kleen bottle"

    9. Re:I just can't fathom this by Anonymous Coward · · Score: 0

      I think you mean "kleen bottle"

      That's MISTER Kleen to you, turnip!

  21. pcre by crow · · Score: 1

    The article mentions pcre (I believe that's the Posix C Regular Expression library).

    On most systems, use `man regcomp` to see how to use regcomp, regexec, regerror, and regfree.

    Essentially, you first compile the regular expressioin into a binary format with regcomp(), then use regexec() to match it against a string. It's all a little awkward to use until you get used to it.

    1. Re:PCRE by tstub · · Score: 1

      Yes, the library is definitely Perl Compatible Regular Expressions.

      You can get the the C source, great documentation, and C++ wrappers from http://www.pcre.org

      This is the library that PHP and Apache 2.0 (and several other programs) use for regular expressions.

      Also, my book Regular Expression Pocket Reference is a companion to MRE and includes a chapter on PCRE. I own both editions of MRE and highly recommend the second edition. I think it's fair to say that Friedl taught me everything I know on this subject.

    2. Re:pcre by damiam · · Score: 1

      I believe pcre stands for Perl Compatible Regular Expressions.

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
  22. that's the first edition by SweetAndSourJesus · · Score: 4, Informative

    Which isn't a big deal, I guess.

    Mastering Regular Expressions is now in its second edition. Mr. Friedl has posted a nice writeup about what's different in the second edition.

    --

    --
    the strongest word is still the word "free"
  23. regular expressions? by Anonymous Coward · · Score: 1, Funny

    I'd be happy if the editors could master spelling and grammar

  24. Soviet Russia Regex by TheFlyingGoat · · Score: 4, Funny

    s/\A(.*?)\s+(.*)\Z/In soviet Russia, $2 $1s you!/i;

    --
    You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
    1. Re:Soviet Russia Regex by erikharrison · · Score: 1

      Well, this is quite good. But, a little quip, you shouldn't need the /i regex modifier at the end, as .* is inherently case insensitive.

  25. They can be hard by DeadSea · · Score: 4, Informative

    I know from my own experiences that writing a regular expression to describe something is not always as easy as it would seem at first glance. I found it difficult to write a regular expression to define a c-style comment: /* comment */ Well, not impossible, just more difficult that I thought it would be. I posted my thought process about how I constructed a regular expression to pick out a c-style comment on my website. It's the kind of thing I like to ask interview candidates.

    1. Re:They can be hard by jandrese · · Score: 1

      Your perl example is far too complicated. Why not just say something like: m#(/\*.*?\*/)#s; to grab the comment?

      --

      I read the internet for the articles.
    2. Re:They can be hard by FroMan · · Score: 1

      /\*(.*\r?\n)*?.*?\*/

      This should be a simple solution.

      Give it a shot.

      --
      Norris/Palin 2012
      Fact: We deserve leaders who can kick your ass and field dress your carcass.
    3. Re:They can be hard by dargaud · · Score: 3, Insightful
      Not to nitpick too much, but I think your regexp finds the following when it's actually not a comment:

      printf("Comments in C are written like /* this */ although I prefer the // C++ style");

      That's why we use parsers to write compilers and not regexps. I came back from Perl after a few months using it, being very disillusionned by its read-onlyness.

      --
      Non-Linux Penguins ?
    4. Re:They can be hard by Otter · · Score: 3, Informative

      It's probably worth mentioning: KDE comes with a GUI regexp constructor. Googling for alternatives shows a similar Windows app.

    5. Re:They can be hard by DeadSea · · Score: 0
      /\*(.*\r?\n)*?.*?\*/

      This should be a simple solution.

      If you had read my site, you would know that that is not good. It matches "/* hello */ hello */" (the whole thing) as a single comment.
    6. Re:They can be hard by FroMan · · Score: 1

      Doh! Good point. (Not the original poster, but someone who supplied a different solution to his suggested solution from his we site.)

      --
      Norris/Palin 2012
      Fact: We deserve leaders who can kick your ass and field dress your carcass.
    7. Re:They can be hard by DeadSea · · Score: 2, Informative
      You make an excellent point. The regular expression I came up with would not do the right thing in that situation when finding comments in your text editor.

      Parsers are, however, based on regular expressions. I orginally wrote this regular expression when I was writing a lexer (using JFlex) for Java. The examples that I saw used a state machine and I wanted to do it with a regex. When combined with regular expression to find sting literals (and all the regular expressions for other junk), it does the right thing.

      I should put your example on the page somewhere. :-)

    8. Re:They can be hard by FroMan · · Score: 3, Informative

      Nope, it wouldn't. Give it a try. I don't have access to a unix box here right now. But atleast the little java app I put together works correctly.

      Assuming you wanted to capture "/* hello */" out of "/* hello */ hello */"

      You see what you are missing is the '?' modifier that will cause the "(.*\r?\n)*" to not be greedy. Same with the ".*".

      I think you are just missing the some of the functionality of regexes. You might want to pick up this book. ;-)

      --
      Norris/Palin 2012
      Fact: We deserve leaders who can kick your ass and field dress your carcass.
    9. Re:They can be hard by ideut · · Score: 0
      Hi,

      Looks like you went a bit over the top on that regex! What you were trying to achieve can be done in perl with m|/\*[\x00-\x7f]*?\*/|

      In newer perl, this is nicely abbreviated to m|/\*\p{IsASCII}*?\*/|

      In other words, you were very nearly there after step 1 on your webpage (apart from using non-greedy matching). But instead of using ".", which doesn't match newline, you should have either explicitly defined a character class or used the builtin \p{IsASCII} (the latter technique is for perl 5.8 and beyond only I think).

      best wishes,
      Ideut.

      --

      --

    10. Re:They can be hard by Gailin · · Score: 1

      Offtopic, thank you for the Java ExcelCSVParser, downloaded it this morning and it worked like a charm. It saved me a lot of time :-)

      G

      --
      I wish there was a fscking blue pill
    11. Re:They can be hard by DeadSea · · Score: 1
      Ah yes, if you have that feature of regular expressions, that is true. However, non-greedy matching is a regex feature that I found is not reliably implemented everywhere that I need it.

      Hmmm, I just tried it in my text editor and it worked. Maybe its more widely implemented than I thought. I could have sworn this didn't work last time I tried it there. Maybe it was added in a recent version. :-)

    12. Re:They can be hard by stefanb · · Score: 1
      Looks like the book is for you :-)

      /\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/
      Using a non-greedy quantifier, this could be as easy as /\*.*?\*/, if you can make your implementation have . match newlines.
    13. Re:They can be hard by DeadSea · · Score: 1

      You are very welcome. I used regular expressions to build the parser, so you arn't too far offtopic. ;-)

    14. Re:They can be hard by DeadSea · · Score: 1

      In some regular expression packages (not all) [\x00-\x7f] can be written as [^]. That is, all the characters that are not in the empty set of characters. Very nice shorthand. I also like the [A]|[^A] does the same thing.

    15. Re:They can be hard by FroMan · · Score: 1

      I can't say for sure, but it has been in atleast grep (maybe you needed egrep?) and perl for as long as I've known. Also in Java since Pattern and Matcher classes have existed.

      There might be some obscure regex libs without support, but it is posix regexs as far as I know.

      Anyways, try the javadoc of the Pattern class for a complete and consise doc for regexs.

      I recently was working with java 1.1 which didn't have the regex libs in it (afaik) so we were using some perl regex (java implementation) library one of the guys who get here before me chose. Well, I went through and reimplemented the entire library (one depending on the regex libs) with the java library. For the most part they were completely compatible, but there were slight differences. Mainly had to deal with .* and .+ when doing substitutions. That and perl seemed to be more forgiving (which lead to errors). Such as "[a-z0-9*" would not give you an error with the perl regex lib, but the new one would throw a fit (which is good since that isn't what was meant). The other thing I found was that the native java regex lib can be upto 2-3x faster than the perl one implemented in java.

      --
      Norris/Palin 2012
      Fact: We deserve leaders who can kick your ass and field dress your carcass.
    16. Re:They can be hard by DeadSea · · Score: 1

      I was just looking over the documentation for JFlex and non-greedy regular expression matching isn't mentioned. I'll be that means that it isn't implemented, but I haven't tested it. Since I write parsers using JFlex, its a good thing I can come up with the non-greedy syntax when I need to.

    17. Re:They can be hard by Cuthalion · · Score: 1

      That's why we use parsers to write compilers and not regexps

      Actually compilation typically involves two stages - tokenization and parsing. Tokenization is where the compiler realizes that 66 is an int and printf is an identifier and "\011unix\010" is a string. This frees the parser to worry about the upper level structure, such as whether x = (4; means anything.

      Generally the tokenizer is implemented as a regular expression parser, and the 'higher level' parsing uses some context free language parsing technique (table or recursive descent or whatever)

      The reason is that the number of tokens is a lot less than the number of bytes in the file and regexps are a lot easier to parse than context free grammars.

      Comment stripping (and also literal parsing) is in the tokenizing stage, which is why c doesn't allow nested comments - it's impossible to ensure that arbitrary numbers of /*'s and */'s match with a regexp.

      --
      Trees can't go dancing
      So do them a big favor
      Pretend dancing stinks!
    18. Re:They can be hard by Kashif+Shaikh · · Score: 1

      And your website proved a point of mine:

      It's difficult to know if your regex is really correct for the stuff your're parsing.

      I mean, it might work for the 9 cases of input you have. But for the 10th case, bam! your regex doesn't parse the 10th input properly. And reading regexs is worse than reading assembly IMO, when you want to fix a bug in some regex 6 months after you've written it.

      But if you know your regex is correct, you can reduce 100 lines down to a mere two lines of code. So it's beautiful on one end, but a deadly sin at the other end.

    19. Re:They can be hard by Mr.+Droopy+Drawers · · Score: 2, Informative

      Technically, all regex's are lexer's, not parsers. Parsers must be able to be recursive.

      --

      To Copy from One is Plagiarism; To Copy from Many is Research.

    20. Re:They can be hard by Anonymous Coward · · Score: 1, Interesting

      I found it difficult to write a regular expression to define a c-style comment: /* comment */ Well, not impossible, just more difficult that I thought it would be. I posted my thought process about how I constructed a regular expression to pick out a c-style comment on my website. It's the kind of thing I like to ask interview candidates.

      Nice you ask your interview candidates. Hopefully one of them recognizes that your webpage's regexps fail in the presense of escape characters

      /* comment *\
      /

    21. Re:They can be hard by maniac1860 · · Score: 1

      Parsers are not based on regular expressions. Lexers often use regular expresions to specify (though the REs are usually translated into an fsm), but parsers are a level up in the grammar hierarchy. I guess though you could say a lexer is a weak parser, but I think that would be rather confusing.

    22. Re:They can be hard by dubious9 · · Score: 1
      Regular expressions dealing with escape characters are usually introduced in string literal example. Try matching

      "something /" /n //like this"

      I remeber doing this for my compiler design class and remember having it hurt my head at the time. If I was a teacher, or asking interviewees questions about regular expressions, the string literal question is about as hard as I would ask. I do belive this example is in this book, as it was in the first printing IIRC.

      The thing with the string token is it's easier than comment because it's only one character. Besides, why would you need escape characters in comments?
      --
      Why, o why must the sky fall when I've learned to fly?
    23. Re:They can be hard by DeadSea · · Score: 1

      Parsers are based on lexers which are based on regular expressions. Is that clear then?

    24. Re:They can be hard by maniac1860 · · Score: 2, Informative

      I'm afraid you're wrong. Parsers are stack based. Try doing matching paranthesis with a lexer (or RE).

    25. Re:They can be hard by DeadSea · · Score: 1
      Have you ever written a parser? I have written several. I start with a tokenizer (lexer) that is built using regular expressions. My favorite tool for generating lexers is JFlex. The job of the lexer is to break the stream into tokens. Regular expressions are very useful for this.

      Once the lexer is complete, a parser can be built. The tokens from the lexer are assembled into a parse tree. My favorite tool for generating parsers is CUP. The grammar for the parser is usually specified in BNF form.

      As you can see, the parser depends on the lexer and the lexer depends on regular expressions. If you really want to be picky about it, there are some parser generators (javacc) that let you stick regular expressions right on the leaves of the grammar and handle both the parsing and the lexing. So yes, you need more than regular expressions to do parsing, but parsers are still almost always based on regular expressions.

    26. Re:They can be hard by Luk+Fugl · · Score: 1

      Once the lexer is complete, a parser can be built. The tokens from the lexer are assembled into a parse tree. My favorite tool for generating parsers is CUP. The grammar for the parser is usually specified in BNF form.

      Exactly. The parser uses a BNF grammer; ie. a restricted Context Free Grammar [CFG] which is higher up the heirarchy than Regular Expressions (which are related to Automata).

      Your parser uses the lexer, which uses regular expressions, so you could say the parser depends on regular expressions, but the core work of the parser -- what makes it a parser and not a lexer -- is the use of a parse tree to evaluate a Context Free Grammar.

      There is no way you can correctly process a CFG using only regexes. Sorry, but no.

      (and yes, I have written a parser)

    27. Re:They can be hard by maniac1860 · · Score: 1

      To answer your question, I have of course made several parsers. Anyway, I think I see the nature of your confusion. In general, for the purpose of generating tokens (most tokens anyway, things like nested comments must be done seperately), an fsm is all that is needed, and because an fsm (lexer) is more efficient than a fsm+stack (parser), a lexer is used for generating tokens. These tokens are then usually passed to a parser, which generates the parse tree (explicitly or implicitly). This does not mean that a parser is based on REs in any way though. In fact, a parser can be built with out a lexer, though this would usually be inefficient.

    28. Re:They can be hard by oobar · · Score: 1

      I suggest a few changes to your perl "oneliner" on your page.

      First, you can use -0777 to enable slurp mode. Add -n to automatically read. Finally, for god's sake, use some character OTHER THAN / for your re-delimiter (aka leaning toothpick syndrome)

      So the final command could be simplified to:

      perl -0777ne 'print m!regexp!g;' file.c

      I can't actually include the regexp because of slashdot's lameness filter, but you don't have to escape forward slashes in it.

      And I don't know why you have the outermost set of parens anyway.

  26. Surely not? by Old+Man+Trouble · · Score: 0

    I'm rather sure the author was after Perl Compatible Regular Expressions.

  27. Regex Learning Tool by johndiii · · Score: 4, Informative

    Regex Coach is a great free tool for learning about regular expressions and constructing them interactively. Both Linux and Windows versions are available.

    --
    Floating face-down in a river of regret...and thoughts of you...
    1. Re:Regex Learning Tool by i_am_pi · · Score: 1

      Another learning tool is vim. If you have ":set hlsearch" and ":set incsearch", you can construct regexps much the same as in the Regex Coach and see them applied, and it's already installed on most linux machines. It also works on the console, instead of in X windows.

    2. Re:Regex Learning Tool by damonlynch · · Score: 1

      There is a somewhat similar tool available for python users: kodos. I've found it very useful myself. It requires PyQt (see also Mandrake, RedHat and Suse PyQt packages available here).

  28. +4 Informative? He doesn't even have to own... by Anonymous Coward · · Score: 0

    ... the book to write this up, and he got it out within minutes of the article going up. It's not informative whatsoever. Come on folks, let's spend the mod points a bit more carefully.

    1. Re:+4 Informative? He doesn't even have to own... by Anonymous Coward · · Score: 0

      suck my dick

    2. Re:+4 Informative? He doesn't even have to own... by carlos_benj · · Score: 2, Funny

      I suppose on /. that would be considered a regular expression....

      --

      --

      As a matter of fact, I am a lawyer. But I play an actor on TV.

    3. Re:+4 Informative? He doesn't even have to own... by Anonymous Coward · · Score: 0

      Not even if you let me tape it.

  29. Online resource by dema · · Score: 4, Informative

    I'd be interested to check that book out as I use reg expressions a lot in PHP. But for those of you looking for a resouce online check out RegExLib. I use it often when I'm having trouble putting an expression together and have found it extremely helpful.

    1. Re:Online resource by Anonymous Coward · · Score: 0

      Ditto!

      Considering PHP is the most widely used server-side web scripting language, I'm very surprised that the author did not include a chapter on regex use in PHP in this book.

      Kind of a bummer... really, this exclusion might actually be what keeps me from buying the book.

    2. Re:Online resource by tarquin_fim_bim · · Score: 1

      I'd guess that as you can use both POSIX and Perl type regex style in PHP, this would be unecessary duplication.

  30. From Windows by Quill_28 · · Score: 3, Insightful

    Going from windows to unix one of the things I liked most about unix was the wide spread usage of regex in various applications. Quite powerful.

  31. All i have to say is: by jdew · · Score: 5, Funny

    Thats a big regex
    stupid filter wouldn't let me paste the regex here XD

    1. Re:All i have to say is: by Suppafly · · Score: 1

      whoa.. the mail checking regex's i write generally look for an @ and the presences of a . followed by 3 letters.. I don't think I'd want to try and recreate that one..

    2. Re:All i have to say is: by Redwing · · Score: 1

      Do you want to ignore email from domains that use two-letter country TLDs?
      I.e. how does your validator work with .us, or .co.uk, etc.?

      or even .info, .museum and those other strange TLDs?

      --
      Raisinettes are my raison d'etre
    3. Re:All i have to say is: by Bedrock · · Score: 2, Interesting

      Another fun one is the REX shallow XML parser algorithm that's been around for some time. Check out http://www.cs.sfu.ca/~cameron/REX.html and scroll to appendix A for a Perl implementation. I recently had to reverse-engineer this approach and write a stack-based parser to run in an environment where Perl's :?$foo construct was broken. Much fun...

    4. Re:All i have to say is: by elemental23 · · Score: 1

      Well I, for one, have never seen mail from a *.info address that wasn't spam, so I say throw them all in the bit bucket.

      --
      I like my women like my coffee... pale and bitter.
  32. Regexp's almost consistent across languages by GGardner · · Score: 1

    But what drives me nuts about using regexps is how they differ slightly from implementation to implementation. Even though the perl regexp's tend to be the de-facto standard, the perl people are frequently adding stuff to their regexps. Some regexp implementations require you to escape open-paren to get the special meaning, and not escaped to match an open paren. Others require just the opposite. Madness!

    1. Re:Regexp's almost consistent across languages by akeru · · Score: 1

      While it may not help in the confusion, what you're seeing with the escape-open-paren vs. not is the difference between Basic and Extended regular expressions in POSIX parlance. Or, to quote the GNU grep man page:
      In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

      --

      Let's hope that there's intelligent life somewhere out in space 'Cause there's bugger-all down here on Earth.

    2. Re:Regexp's almost consistent across languages by IpalindromeI · · Score: 3, Insightful

      Even though the perl regexp's tend to be the de-facto standard, the perl people are frequently adding stuff to their regexps.

      Damn those Perl people and their innovations. Why can't they just be happy doing everything the familiar, crappy way? Why must they push the envelope to make things easier and better? I hate that.

      PS. I hope you haven't seen this yet. It'll really boil your blood.

      --

      --
      Promoting critical thinking since 1994.
    3. Re:Regexp's almost consistent across languages by Anonymous Coward · · Score: 0

      Jesus, I'm glad I opted to learn Python.

    4. Re:Regexp's almost consistent across languages by Anthony+Boyd · · Score: 2, Insightful
      I hope you haven't seen this yet. It'll really boil your blood.

      Can I just say that I really like Larry Wall? I mean, reading that document, I realize that he is sooo good for Perl culture. You won't hear "that's how it has always been done" from him. His focus is on how to build a better system, not politics, not grandstanding. I would be very happy to see this kind of openness and disarmingly reasonable attitude influence certain other people in the Perl community.

      Of course, I could be extrapolating too much, and it could be he's a PITA, but I've read his comments/posts/articles a few times over the years, and he's been this way each time.

  33. The only problem with this book... by WoodstockJeff · · Score: 1

    ... is that I can never get it back from people who "borrow it for a few days".

    1. Re:The only problem with this book... by Anonymous Coward · · Score: 0
      was that supposed to be an insult?

      kind of like saying, "Oops, bet you don't have chlamydia"

  34. REGEX for Brazilians by maizena · · Score: 2, Informative

    Regex rules, but I wouldn't know anything if it wasn't for this book in portuguese: http://guia-er.sourceforge.net/. The printed version is always with me wherever I go.

  35. C++ Regular Expressions by TheOldBear · · Score: 5, Informative

    The Boost C++ libraries have a regular expression package. Take a look at http://www.boost.org/libs/regex/index.htm

    --
    Caution: Do not stare into laser with remaining eye.
  36. Shell out the cash? by Anonymous Coward · · Score: 0

    I downloaded this from edonkey a while ago... *yawn*

  37. My problem with regular expressions... by GreenJeepMan · · Score: 0, Flamebait

    Regular expressions are extremely powerfull, but that cryptic line of text is out of control. I mean, you have to be an expert in regular expressions to do the most rudmentary things. I've been using them on and off for years, and every time I find myself studying the book for hours just to write a simple search and replace.

    I guess if I used them on a daily basis, I would be better off, but how many programers find a need for that?

    I think they should offer a form of regular expressions in a longer form with easier to read syntax for the casual user.

    Just my 1/2 cent.

    1. Re:My problem with regular expressions... by Anonymous Coward · · Score: 0

      Well, when I started learning Perl I got to a point where not understanding regex was holding me back. I got the first edition of this book and it helped immensely. Once I grasped regex better I could return to the Perl books and make sense of the examples. As it turns out, the only real need I have had for Perl (at work) has been in parsing files using regex, so I think the book was extremely useful.

    2. Re:My problem with regular expressions... by Gabe+Garza · · Score: 5, Interesting
      Amen!

      I think a lot of the people who use RE's a lot would be well-served by brushing up on their recursive-descent parser writing skills. For only a little more time then it takes to write a regular expression, you can (if you know how) write a simple recursive-descent parser that:

      • Is more readable (and thus maintainable)
      • Is more efficient
      • Has the potential to have much better error handling (e.g., a descriptive message instead of just "RE doesn't match! Ack!")
      • Is much more scalable: recursive descent parsers can easily scale up to parsing an entire language (witness g++, which uses one to parse C++)
      • Is likely to be a great deal more correct, because it forces you to actually define a language, instead of just iteratively building up an RE
    3. Re:My problem with regular expressions... by cgibbard · · Score: 2, Interesting

      There are some really nice parser combinator libraries that are beginning to show up for constructing parsers very easily.

      For example, check out Parsec
      a monadic parser combinator library for Haskell. Have a look at its documentation to get a feel for what I'm talking about.

      The basic idea is to construct parsers for complicated things by combining simpler parsers in various ways.

      Parsec also has mechanisms that allow one to provide information so that good looking error messages can be constructed for the user should the parse fail, and also allows for one to tune parsers for efficiency reasonably well, while maintaining a fair bit of flexibility (by default, parsers are LL1 for efficiency, but they can be extended so as to provide potentially infinite lookahead if necessary).

    4. Re:My problem with regular expressions... by dtfinch · · Score: 1

      I feel they both have their place. It depends on the language I'm using and what I'm parsing. I mostly use regular expressions nowadays because they require fewer keypresses than the alternatives. But yeah, for parsing most languages a recursive descent parser is definitely best. I had to write one in C for a basic interpreter back in one of my high school programming classes.

  38. Here's my challenge... by Jon+Abbott · · Score: 1

    Does anyone here know how to do multi-line regexes in perl? I've seen the notation on how to do it (Mastering Regular Expressions has one paragraph for it), but nothing seems to work...

    1. Re:Here's my challenge... by dbenhur · · Score: 1
      Does anyone here know how to do multi-line regexes in perl?

      Um, RTFM? There are plenty of examples of the extended multi-line form in the standard docs and it works just fine.

      Simple answer, use the "/x" modifier at the end of your expression to set extended mode. In this mode, whitespace is ignored (unless escaped) within the regex and you can insert comments.

    2. Re:Here's my challenge... by Jon+Abbott · · Score: 1
      Um, RTFM?
      I have read the perlre pod file and Mastering Regular Expressions for sections pertaining to multi-line regular expressions -- and none of the approaches given work for the files I work with. Programming Perl doesn't touch the topic. While the logical answer would be to switch files, sadly that is not an option for me.
    3. Re:Here's my challenge... by dbenhur · · Score: 1

      Post a sample extended regex that isn't working for you -- it's hard to debug a problem without an example which demonstrates it. What version of perl are you using?

    4. Re:Here's my challenge... by phorm · · Score: 1

      I usually use s/old/new/gs or m/expr/gs in perl

      Codebits used to have a lot on this, but the page has since moved and seems to be having permissions errors at this moment.

      If you want, you could always email me (phormix at phormix dot com) and I can attempt to help you with your regexp woes - I've used a lot of multiline perl regexps for HTML processors, etc.

    5. Re:Here's my challenge... by shotglasses · · Score: 1
      If you have data where the multi-line matching isn't working, can you reformat your data in some way to get around the problem?

      I have spent much time parsing poorly written HTML pages, and find that if I read the whole file into a string, and then substitute all whilespace characters for a space, all of the multi-line problems (and many others) go away, because your data is now only one line...

      This works with HTML because the "format" of the data is imbedded in the tags, not the physical formatting, but I have used a similar approach when parsing logfiles that attempt to be "user friendly" and wrap long lines -- now each line of the file may or may not be a complete record. To get one record per line, join them all together, and split them on the "timestamp" field, and now you have a bunch of single line records to work with. If there isn't a timestamp, is there another way to determine the beginning (or end) of a record?

      Obviously, you cannot always reformat the data file, and if you cannot change the actual files, make a copy and modify the copy

      There might not be an easy way, but there should be a way -- you just have to keep working on it!

      Mark

    6. Re:Here's my challenge... by slumped · · Score: 1

      Are you reading the files in line by line? If so, you won't be able to do multiline matches unless you join the lines together into a multi-line variable.

  39. Regex rant by Tablizer · · Score: 5, Insightful

    The problem with regex's is that if you don't use them often, you forget a lot of the finer details. They are not self-documenting at all. I think something like "generators" used in some of the compiler tools floating around are more intuative. For example, you can define a "LISP-lite" language like this:

    statement -> (command params)
    statement -> (command)
    params -> params params
    params -> constant
    params -> variable
    params -> statement

    1. Re:Regex rant by Frans+Faase · · Score: 1

      For this you might need an interpretting parser, such as IParse.

  40. Regex and Spam... by DM_NeoFLeX · · Score: 1

    All the spam I get these days contains little lines of random (or apparently random atleast) alphanumeric characters at the bottom, for the most part simply capital letters, although some are more complex. Some have random strings of alphanumeric characters longer then 50 characters... Has anyone else noticed these lines at the bottom of spam? Surely it would be fairly easy to filter spam at the MTA level using regex on these strings...

    --
    -------------------------------------------------- - God is the tangent point between zero and infin
    1. Re:Regex and Spam... by qui_tollis · · Score: 1

      My spam has the same, I assume it's there to make work more difficult for filters, though I'm not sure how. In any case the randomness of these strings means they can't be filtered by a regex.

  41. Why is it that people think regexps are hard? by SkewlD00d · · Score: 1, Funny

    All you have are zero-or-more "+", one-or-more "*", conditional "? or sometimes "[ ]", scan-sets "[a-zA-Z]", grouping "()" or "{}", non-CFG count range {}2,3, sentintel chars ^ $ etc., place-holders for replacement, dont match "~" or "!", match any single char ".", and maybe a few more odds-ends. It's these bozos that think "regexp" sounds cool, but doesn't want to learn what they are. In general, these generalized extended regular expressions are easily implemenatable w/ efficient DFA and NFA->DFA conversion (i hate that algorithm!!!). If you need a 500 page book on regexps, you might want to have a look at a good compiler book (red dragon, etc.) first. Full non-CFG languages are so much more powerful than any regexp could ever dream of being, and more importantly they can have state.

    --
    The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
    1. Re:Why is it that people think regexps are hard? by Abcd1234 · · Score: 3, Insightful

      Someone just took a course on formal languages...

      If you need a 500 page book on regexps, you might want to have a look at a good compiler book (red dragon, etc.) first.

      And why would I want to learn about all the various automata (finite state machines, push-down automata, and Turing machines) not to mention all that language parsing crap (top-down versus bottom-up parsing, parse trees, etc, etc), when all I really want to learn is how to exploit a regular expression engine efficiently so I can solve real world problems?

      Full non-CFG languages are so much more powerful than any regexp could ever dream of being, and more importantly they can have state.

      Yeah, that's called a programming language. And yeah, I could implement any regular expression using a standard programming language, but why would I bother when a regular expression is far more concise and better suited to the job?

      Geez, give someone a hammer...

    2. Re:Why is it that people think regexps are hard? by Anonymous Coward · · Score: 1, Interesting
      It sounds like you knew exactly as much about regular expressions as the reviewer did prior to reading this book.

      The difference is that the reviewer was smart enough to suspect there was more to it, and he was right, and you are not.

      "regular expressions" is actually a misnomer given the power of modern regex tools. They have all those fancy things like context that you like so much, having mastered what I am guessing is a whole semester of compilers and theory of computation classes.

    3. Re:Why is it that people think regexps are hard? by muonzoo · · Score: 3, Funny
      SkewlD00d writes:

      Why is it that people think regexps are hard

      All you have are zero-or-more "+", one-or-more "*", conditional "? or sometimes ...

      ...these bozos that think "regexp" sounds cool...

      Just like the bozo who just finished a Formal Computation course, yet mixed up the meanings of "+" and "*" ? ;-)


      From man grep:

      A regular expression may be followed by one of several
      repetition operators:
      ? The preceding item is optional and matched at most
      once.
      * The preceding item will be matched zero or more
      times.

      I hear they're serving humble pie at the school cafeteria today. ;-)

    4. Re:Why is it that people think regexps are hard? by sk8king · · Score: 2, Insightful

      And you go the 'zero-or-more' and the 'one-or-more' mixed up [in Perl anyway]....that's why they're not as easy as you claim.

    5. Re:Why is it that people think regexps are hard? by SkewlD00d · · Score: 1

      That's what revision 0.1 is supposed to fix. Geez, gimme a break. You think i proof-read anything I post? Nawh. I didnt need a formal lang course though, no... I took a compiler course, the superset of all that shit. " Implement a C compiler in hardware." We talked about it. Though C is not a CFG. ;) Humble pie? You mean, I have foot-in-mouth disease? I always have known that. It's funny getting modded down to 0 when I know what I'm talking about but transpose a few chars because I have lysdexia. My point was that the set of things you need to know for classical regexps is very small, and the normal *nix regexps isn't much bigger, and the GNU regexps have some nice features... all-in-all, they aren't that bad. Besides, the animal book from o'reily i think covered it pretty good, and wasn't 500 pgs.

      --
      The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
    6. Re:Why is it that people think regexps are hard? by Abcd1234 · · Score: 1

      I took a compiler course, the superset of all that shit.

      LOL! That compiler course is BASED on formal language theory, so I think you probably have the relationship a little confused there. In fact, a compiler course really only gives you a light dusting of the real theory behind formal languages (I would know, I've taken a course in both... I'm certainly no expert in either, but at least I have a little perspective). A class in formal languages not only discusses Chomsky's hierarchy and their associated automata, but also delves into fundamental theories about algorithms and computability, time and space complexity, and so on.

    7. Re:Why is it that people think regexps are hard? by BigBadBri · · Score: 2, Insightful
      In my case, they're hard because I only use them once in a blue moon, and it's nice to have a simple look-up and a few examples.

      But then, I'm not a compiler god, just a network guy who happens to have to use the fscking things once in a while.

      --
      oh brave new world, that has such people in it!
  42. Newbie review by Telastyn · · Score: 2, Informative

    I also have this book [actually right next to me]. I'd put off learning perl [and indirectly regexes] for some time, because... well, I was a windows admin by trade. Now that I do other [actual] work, time came to pickup on some other tools.

    Even having not dealt with regexes pretty much at all, the book was very easy to get into. The first few chapters go through the basic matching structures, along with requisite history. All of the points are done with understandable real life examples, with diagrams and [a small amount] of actual code. The later chapters go through individual languages, and goes through which features are there, what the nuances are, and a few of the gotchas. I must admit that I probably learned more useful things about perl from this book than from any other source. There is also a large section [which I did not read, and caanot comment on] which actually details the nuts and guts of regexes.

    All and all, it's easily the best instructional [as opposed to reference] text I've ever purchased.

  43. web interface by Anonymous Coward · · Score: 0

    I wish there was an online tool with a simple web interface that would let you craft regexes using menus.

  44. WHAT?!?! by Anonymous Coward · · Score: 0

    perl isn't really an acronym, aprocryphal folklore and post-facto expansions notwithstanding.

    So Perl doesn't stand for "Practical Extraction and Reporting Language?" My world has been blown apart. The sky is spinning. So ... cold ...

    1. Re:WHAT?!?! by carl67lp · · Score: 1

      Nope. From what I've read, someone thought up that expansion and pegged Perl as an acronym.

      There's your history lesson for the day, folks.

    2. Re:WHAT?!?! by Anonymous Coward · · Score: 0

      Perl: Past and Present

      Perl was first released on December 18, 1987. It was created by an American named Larry Wall, who took the name 'Perl' from a popular programming language at the time, Pearlâ"taking out the 'a' to show the difference. When the product was first released, Wall told the world that the name was actually an acronym, standing for Practical Extraction and Report Language (as the original man page describes).

    3. Re:WHAT?!?! by Anonymous Coward · · Score: 0

      That's kinda cheesy, isn't it? I mean if you release it to the world as an acronym, and then say later that you only came up with the acronym after you came up with the name, is the world really supposed to give you that Perl isn't an acronym after all. Know what they say about first impressions...

    4. Re:WHAT?!?! by Anonymous Coward · · Score: 0

      I never knew that
      I wonder what crafty acronymns someone could think up for MS... all I can come up with is Monkey Shit

  45. in a nutshell by Suppafly · · Score: 1

    My first suspicion, I admit, was that I was facing one of the countless "man page reprints" that you find these days.

    No, that would be OReilly's in a nutshell series of books..

  46. That is some funny, FUNNY sh1t by Anonymous Coward · · Score: 0

    Kudos to you for the best post I've seen in this bore-a-torium all day.

  47. Re:Mwahahahah! by Anonymous Coward · · Score: 0

    yeah, those are my favorite:

    "It doesn't get any more serious than a rhinocerous about to charge your ass"\

    rofl

    there's another with a pirate that's funny as hell too

  48. errata by Anonymous Coward · · Score: 4, Informative

    The reviewer forgot to mention the wonderful errata list of the book! Can be found here.

    1. Re:errata by jbrax · · Score: 1

      Long errata for a thick book.. These books that include a lot of code (and therefore a lot of errors) should be sold on CD (or other digital media) so that they could be patched afterwards with errata(s).

      And wouldn't it be nice to search information from regex-book with some regex? Haven't found a PDF-reader that can do that yet!

      JOna

  49. And he's Qualified to review this book???? by CSG_SurferDude · · Score: 3, Funny

    (to be honest, I had never heard of lookaround operators before!).

    Gezzzz, This guy hasn't even heard of lookaround operators before? What a clueless fool! He should be driven from /. after being tarred and feathered!

    Everyone knows that a lookaround operator is that guy that goes into the bank first to make sure that there aren't any armed guards or policemen/women getting their paychecks deposited.

    /me runs and hides now! ;-)

  50. Interpretting parser by Frans+Faase · · Score: 3, Informative

    If you want to have something more powerful than regexprs, and still have it as an interpretter, you might have a look at an interpretting parser that I wrote: IParse.

  51. Re:pff by The+Clockwork+Troll · · Score: 1
    Shell * wildcards are basically syntactic sugar for the Kleene star over all valid filename characters.

    Sure smells like a regex alias to me.

    --

    There are no karma whores, only moderation johns
  52. definitely a good read by cheesyfru · · Score: 1

    I never really thought you could fill a book about regular expressions, but this one manages to accomplish this while at the same time being very interesting. This is absolutely required reading if you know "enough to get by" with regular expressions. Chances are, until you read this, you're making a ton of common mistakes and you don't even know about it.

  53. Re:Another karma whore post by Anonymous Coward · · Score: 1, Funny

    Yea, the post should at least bust on Microsoft, make some kind of esoteric unfunny comment about CowboyNeal, or praise CmdrTaco to get +4 Interesting. Some moderators are smoking crack today, and they're not sharing.

  54. Or without a book... by Iscariot_ · · Score: 2, Informative

    For those who don't want to buy a book, here's a nice page with pre-built regexps for doing all sorts of things: RegexLib.

  55. you didn't check for the C++ first by Anonymous Coward · · Score: 0
    There's another case that will get you:

    //*

    ...Not a comment...

    //*/

    Where as this is a comment

    /*

    ...A comment...

    //*/

    This is a quick trick for quickly commenting out a small section, yes I know #if 0 works well and this doesn't nest but I like it for small code because the editor I use changes the color when it's a comment etc. The point is that if you are already in a C++ comment then you can't enter a C comment. It's also fairly easy to write you're own finite state machine to print out comments, but of course that doesn't plug into anything.

  56. re-builder for Emacs by David+Ishee · · Score: 3, Informative

    The re-builder mode is great for debugging regexps in Emacs. This is the latest version as far as I can tell: re-builder 1.2

    --
    Your password has expired, please login to change it.
  57. You actually liked this book? by Forgery · · Score: 2, Informative

    I have a previous version of Friedl's book and found it needlessly confusing. The author's examples often leave much to be desired. I have no doubt that all of the information about regex is somewhere in the book, but it takes an extraordinary amount of work on the reader's part to extract it.

    1. Re:You actually liked this book? by melonman · · Score: 3, Interesting

      I loved the first edition, probably for the reasons you didn't. I'd read several short overviews of regexes, including Larry Wall's one in the Camel book, and, while they got me doing simple stuff, they left me with lots of unanswered questions, and the more I experimented the more my "why doesn't that work?" list grew. The Friedl book is totally thorough, and, I thought, aggessively pedagogical, if you want to learn about how a regex engine works rather than pick up stuff in a cookbook fashion.

      That said, I do wonder about the guy. The colophon was astounding: he wrote half the book using regexes on a computer on the other side of the world, using a 37.5 bit/hour connection by the sound of it, and then he proceeded to write his own typesetting system so he could produce a phoenetically alphabetical index in English, Japanese and probably some other languages that I missed. I think he ought to get out more...

      --
      Virtually serving coffee
  58. The how, not the what, is lacking by StRex · · Score: 1
    I'm only part of the way through this book, so I can't claim to know all about it. I bought it because the author (and possibly other reviews I'd read?) stated that the book helps illustrate how to approach a problem using regular expressions.

    You can hand me a box of wrenches, but that won't tell me how to fix a car. ;-)

  59. tend to be the de-facto standard - dream on! by DrSkwid · · Score: 1

    maybe where you come from

    awk
    sed
    grep
    sam

    not a perlism in sight

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
    1. Re:tend to be the de-facto standard - dream on! by arkanes · · Score: 1

      Perl-style regexps tend to be used on things that post-date perl. Things that pre-date perl obviously do not use perl-style regexps.

    2. Re:tend to be the de-facto standard - dream on! by GGardner · · Score: 2, Informative
      Perl-style regexps tend to be used on things that post-date perl.

      True, but things get tricky quickly -- plain-old Unix awk predates perl. But GNU-awk (gawk) does not, so it has some perl-style regexp features, like \w, which are missing from Unix awk.

  60. There are no ".NET Framework" languages by ClubStew · · Score: 2, Informative

    ...or even one of the .NET framework languages

    There are no ".NET framework" languages. There are languages that target the Common Language Runtime, or the CLR. The .NET Framework is nearly a class library like the JDK/JRE. If he doesn't even know that, why should I trust his book review?

    1. Re:There are no ".NET Framework" languages by Anonymous Coward · · Score: 0

      Lighten up. The review is about a book on regular expressions, not .NET. Your obsession with exactness and semantics makes you sound like the "Comic Book Guy" in the Simpsons.

    2. Re:There are no ".NET Framework" languages by trouser · · Score: 1

      I don't mean this in a bad way, but are you maybe gay and you haven't realised yet. How many comic books do you have? And how big is your collection of fantasy novels?

      --
      Now wash your hands.
    3. Re:There are no ".NET Framework" languages by Sanction · · Score: 1

      Perhaps because he is writing a review of a book on regular expressions, and not a book on .NET? With all the marketing newspeak and hype about .NET, there is not exactly a great consensus about proper usage of specific terms among the general computing public. This kind of pedantic whining only serves to diminish the image of Microsoft developers in general.

      --
      Well I'm the doctor and I say you're dead, so shut up and take it like a man!
    4. Re:There are no ".NET Framework" languages by Sanction · · Score: 1

      Oh, and it is also worth mentioning that it is close enough to languages that can use the .NET framework, since that is where the regular expression libraries discussed in the book are located.

      --
      Well I'm the doctor and I say you're dead, so shut up and take it like a man!
    5. Re:There are no ".NET Framework" languages by ClubStew · · Score: 1

      Diminish Microsoft developers? It's the *nix/OSS developers that are always whining about not making money! Guess what - I do. That's not to say a lot of the stuff I developer with what little free time I have doesn't get released to the public, but I never expect something to which I've provided source to make money. It's the constant whining of OSS developers that choose the GPL expecting money that diminishes OSS - why would any company want to use it after people complaining about lack of funds. Sure, some do, but companies like IBM have plenty of revenue from other sources.

      Like someone said in a previous article about the end of the LRP, "Live by the GPL, die by the GPL". That truly is diminishing.

      What I hate is these OSS people that bitch about Microsoft all the time and, like this article poster, know nothing about Microsoft other than "they are big and brudish and their code is buggy". Yeah, show me any company that has that many products that does have even more bugs. When I used RedHat, there were updates all the time - just like people bitch about when Microsoft does it. If OSS developers are going to bitch about Microsoft, at least know thy enemy and quite spouting the same usual garbage that applies to most software companies.

    6. Re:There are no ".NET Framework" languages by Sanction · · Score: 1

      Very nice and all, but how is any of that little rant relevant to either the review, or the topic of the correct usage of the term ".NET Framework"? This hypersensitivity is very unbecoming, as is responding to a list of arguments that were never made. I made no statement either for or against either group, and actually develop closed source software on a Microsoft platform. Your above screed only emphasizes my point about some developers flying off the handle about issues that are not relevant.

      --
      Well I'm the doctor and I say you're dead, so shut up and take it like a man!
    7. Re:There are no ".NET Framework" languages by Anonymous Coward · · Score: 0

      And his obsession with sofas and value meals makes him look like him too!

  61. err... by pb · · Score: 1

    How about something like this: m|/\*.*?\*/|s

    --
    pb Reply or e-mail; don't vaguely moderate.
  62. The best place for buying technical books is... by Draxinusom · · Score: 2, Informative

    www.bookpool.com

    Mastering Regular Expressions, 2nd Edition
    Our Price: $24.50

    Bookpool is consistently the cheapest place to buy technical books. And no, I am not affiliated with them in any way.

  63. A must read nerd book by neves · · Score: 1
    Mastering Regular Expressions is, IMHO, the best nerd book example. Very well written by a knowledgable author, distils details of a interesting technical topic. A great read. The problem is that you will fall in love with regular expresssions. You will try to use it everywhere. Complex regexes will substitute what a handful of string find methods would solve cleaner (and maybe more eficiently).

    Regexes are a little cute toy in a programmer toolbox. It's nice to use, but you'll be cursed forever by who mantain your code. Better use it just in weird string-replaces inside your text editor, throwing out after the use. A great refactoring tool.

    1. Re:A must read nerd book by taradfong · · Score: 1

      While I agree you can get carried away, ala the "a man with a hammer sees a world of nails" syndrome, and that one should use simple string processing code in hard-baked production code, I don't think anyone will ever regret knowing how to use regexps better. They are marvelous for scripting, log parsing, and the less-baked-in type of thing.

      --
      Does it hurt to hear them lying? Was this the only world you had?
  64. RE in Perl by bolek_b · · Score: 1

    From formal point of view, regular expressions in Perl (PCRE) are no longer true regular expressions. Since Perl 5.6 and the introduction of look-ahead/subexpression clauses, it is possible to match expressions of classic context-free grammar of correctly parenthesized clauses.
    I therefore suppose that one day somebody will implement a CFG parser entirely using PCRE engine (and most probably on a sigle line ^__^).

  65. please mod parent (-1, Fucktard) by Anonymous Coward · · Score: 0
    1. I sure hope you don't expect interview candidates to figure that pap out on the spot. Understanding the logic behind regular expressions is ridiculously easy...it's the varying syntax that can be a bitch. If you make them sweat that shit out while you're jerking off at your desk, you're not proving anything.
    2. That little pop-up on your site shows your ignorance. If your html works on Mozilla, it will most definitely work on IE 6. Trying to get people to download that kludgy POS proves to me that you, sir, are an asshat.
  66. Funny you should say that... by devphil · · Score: 2, Informative


    ...about switching programming environments. Right now there's some discussion about problems in regex engines which follow you around as you switch environments, due to problems in the engines.

    Curent versions of glibc (apparently) made some inefficient design choices in their regex engine. When other tools such as sed switched to using glibc's version, their performance dropped quite a bit, leading to a couple of bug reports.

    The interesting thing is, one of the messages in the bug report mentions this book. It had been a few years since I covered DFAs and NFAs in college, so I got a copy yesterday. Came back home to find this review on /.

    --
    You cannot apply a technological solution to a sociological problem. (Edwards' Law)
  67. My Version... by BinaryCodedDecimal · · Score: 5, Funny

    Mastering Regular Expressions:

    Repeat after me:

    "I'm so hungry, I could eat a horse."

    "It's been raining cats and dogs."

    "I'll sleep with you when Hell freezes over."

    And my personal favourite:

    "Oh look, Hell just froze over!"

  68. Parsers and regex by Beltway+Prophet · · Score: 2, Interesting

    And recursion is lots of fun, but I use REs to recognize and extract tokens and boundaries, because it's so easy to write and change simple REs.

    There is a middle way between overly complex REs which mere mortals cannot read nor safely modify, and overly complex parsers that never take advantage anything more functional than getc().

  69. I've read the first edition and... by RevMike · · Score: 4, Interesting
    I have to agree that this is a book that should be on everyone's shelf.

    The very fact that both vi and emacs support regular expressions must mean they are a best-in-breed tool, because if there was a way for those two communities to disagree, they would have done it.

    I love the fact that I can use the same expressions with grep, sed, vim, Perl, and Java. that being said, however, the critics are who warn that regex can be over used are correct: regex's are difficult to debug and to maintain, so don't go overboard.

  70. contrived examples? by anonymous+loser · · Score: 4, Interesting
    (ever needed to match aligned groups of 5 digits in an unspaced stream of characters?)

    Yes, actually. Older FORTRAN codes (that have been slowly added to/modified over time) especially exhibit this kind of behavior thanks to formats that allow you to specify columns for output. The numbers actually run into each other on the line, and the only way to read the file is to know which column the data you want is in. I would never discount any regular expression example as contrived. Somewhere, someone has developed a program that uses that formatting in an input or output file, and someone else might need to be able to speak it's language in an automated fashion.

  71. what did one regular expression say to the other? by jdew · · Score: 3, Funny

    what did one regular expression say to the other?
    .*

  72. Re:People think your dumb... by Anonymous Coward · · Score: 0

    Geez, when I bought it at borders the guy rang up the order practically bowed and scraped.

  73. Post is from a troll template (see below) by Chad+E+Dirks · · Score: 2, Informative

    "Today I got roughly 4 first posts but then slashdot wouldn't let me post anymore. So thats enough trolling for one day." - rkz

    To be honest, that this exact same post template has been moderated highly again and again in recent book reviews is becoming more humorous than anything. Unfortunately, and this is addressed to certain moderators, I believe it would be correct to say the laughing is 'at you' and your misfortune rather than 'with you'.

    If you would like to confirm that you are being 'taken in', click on the link represented in the parent post by the user name to be taken to a list of this user's recent posts

    There, view the user's posts made in recent book review comment threads to see this exact template used in multiple "Book Review" comment threads.

    Here are several past instances of this template being used by this user:
    Mac OS X Unleashed (2nd Edition)
    Dynamic HTML: The Definitive Reference (2nd Edition)
    Linux Network Administrator's Guide, 2nd Edition

    In the future, please feel free to cite this post as a reference to inform the opinion of future Book Review comment moderators.

  74. Since regular expressions are Turing complete... by Anonymous Coward · · Score: 0

    And they are available in all platforms and in all languages (c and c++ thru libraries), I have a suggestion:
    I suggest that all programmers should program in pure regular expressions.
    That way we should be able to ensure the maximum portability and clarity of our code.
    No more will we have to waste our time learning the latest new thing forced on us by Microsoft or Sun -- our language can do it all!!!

    -- Joe Perl Hacker

  75. Sample Chapters by darkpurpleblob · · Score: 2, Informative

    A sample chapters from the book, Java and .NET are available in PDF format from the book page on O'Reilly's site.

  76. Theory of Computation -- Theory of Exposure to TOC by muonzoo · · Score: 1

    Hey -- I'm not dissing you -- but it was funny that you made the mistake. :-)

    One thing to remember these days -- given the broad range of exposure to computers, from Grandma through systems designers, compiler writers, and formal methods gurus. The latter sets make up less and less of the community.

    Although it is agreed that CFG's, regular expressions, FSM and compiler technologies are important, I'd wager that well under 1% of the online community -- and even under 5% of the 'technical' community every really had to wade through the proof that an NFA called N is equivalent to a DFA called D subject to the following blah blah blah. :-)

    If you're taking a compiler course -- I would have to assume that a pre-req was the Theory of Computation (or equivalent course). When I was that age, (here we go), Theory 1 was a 1/2 course (semester) and was the pre-req for the compiler course (along with some math courses). The compiler course was a full-course (1 yr long).

  77. Yes. by Anonymous Coward · · Score: 0
  78. why i don't like this book by andy666 · · Score: 1

    i was really excited when i first got this book, but i never have liked reading it. every few months i go back to it hoping i will change my mind, but i still hate it.

    my major problem with it is the notation. he uses some symbols, like an "L" and a reverse "L" that are very light (the font) and i find it very annoying.

    in general, i think that the book could be shorter. regular expressions are not very deep, and even though they are useful, a whole book is not necessary (at least for my purposes).

  79. Regular Expression by dynayellow · · Score: 1

    Finally! A book geeks can use!

    Chapter One: "I have FIVE black lotuses in my main deck" is not a good way to impress girls. Niether is using the phrase "Ogg Vorbis" every chance you get.

    1. Re:Regular Expression by Anonymous Coward · · Score: 0

      It is if the black Lotuses are Esprit V8's ...

  80. you're kidding right? by Anonymous Coward · · Score: 0

    grow the fuck up.

    regexps are the least of your worries.

  81. Re: C++ Regular Expression library by wbniv · · Score: 2, Interesting

    i've been looking at the boost c++ regex library http://www.boost.org/libs/regex/ and i'm going to give it a try. as i'm doing more c++ programming these days ( i've been lucky to have been doing perl for the last couple of years :) ), i've been looking for quality, cross-platform, license-compatible c++ classes; boost seems quite good (and it's peer-reviewed, too)

    i also just found this benchmark http://research.microsoft.com/projects/greta/regex _perf.html comparing boost vs. microsoft's greta http://research.microsoft.com/projects/greta/ which gives you "all the power of Perl 5 regular expressions in your C++ applications. These easy-to-use classes let you perform regular expression pattern matches on strings in C++." (from the website)

  82. Re:Ij us tc an 't fat homt his by Anonymous Coward · · Score: 0
  83. Hey, it's relevent! by bedessen · · Score: 1

    Here's where I take the opportunity to direct you to my homepage link above, where you will find a set of nasty regular expressions that you can plug into Privoxy, a filtering HTTP proxy. They will rewrite slashdot's HTML markup with CSS tags, which lets you create a stylesheet to modify the look and feel of slashdot quite radically. A fun distraction if you're bored...

  84. Re:what did one regular expression say to the othe by Anonymous Coward · · Score: 0

    What website do regular expression's love?
    /.

  85. Re:Theory of Computation -- Theory of Exposure to by mvw · · Score: 1
    Although it is agreed that CFG's, regular expressions, FSM and compiler technologies are important, I'd wager that well under 1% of the online community -- and even under 5% of the 'technical' community every really had to wade through the proof that an NFA called N is equivalent to a DFA called D subject to the following blah blah blah. :-)

    These regex books are nice to get a first grip on the topic, to see some common applications beyond del *.*.

    But I personally came only past a certain level of understanding, these texts never managed to explain what NEAs are, why there are sometimes epsilon transitions or not and so on.

    The big enlightening came after a great course in automata theory and formal languages (in German), thus some theoretical computer science.

    After this course it was considerably easier to read the dragon book and such applied regexp titles. So I would definitely recommend to read through a book like Hopcroft/Ullmann to get the foundations right.

    This year I work through a course about applied automata theory (in German) and it introduces great applications, the connection between finite automata and logic formulae, which is important for model checking and automata that do not work on words but on trees or even more complicated structures like pictures and grids. The tree automata allowed to me get a better grip about the foundations of XML, DTDs, XPATH and such.

    Last year there was a course about automata theory and reactive systems (in English). which used automata on infinite words to model parallel processes, games and such. Too bad I had no time to work throught it.

    So I definitely vote to go for the theoretical computer science. It makes you see certain design decisions much, much clearer.

    Regards,
    Marc

  86. Re:Theory of Computation -- Theory of Exposure to by Chris_Jefferson · · Score: 1
    These regex books are nice to get a first grip on the topic, to see some common applications beyond del *.*.

    Although of course del * is all you need as a regular expression.

    --
    Combination - fun iPhone puzzling
  87. Re:Theory of Computation -- Theory of Exposure to by muonzoo · · Score: 1
    Although of course del * is all you need as a regular expression.
    Nope. '*' is file glob notation. If you were to use a regex, it would be '.*' .
    :-)
  88. Re:Theory of Computation -- Theory of Exposure to by mvw · · Score: 1
    In light of my posting


    del \Sigma^*

    :-)