Slashdot Mirror


Regular Expression Recipes

r3lody writes "If you spend time working writing applications that have to do pattern matches and/or replacements, you know about some of the intricacies of regular expressions. For many people they can be an arcane hodgepodge of odd characters that somehow manage to do wonderful things, but they don't have enough time (or interest) to really understand how to code them. Nathan A. Good has written Regular Expression Recipes: A Problem-Solution Approach for those people. In its relatively slim 289 pages, he offers 100 regular expressions in a cookbook format, tailored to solve problems in one of six broad categories (Words and Text, URLs and Paths, CSV and Tab-Delimited Files, Formatting and Validating, HTML and XML, and Coding and Using Commands)." Read on for the rest of Lodato's review. Regular Expression Recipes: A Problem-Solution Approach author Nathan A. Good pages 289 publisher Apress rating 8/10 reviewer Raymond Lodato (rlodato AT yahoo DOT com) ISBN 159059441X summary A cookbook of useful regular expressions for Perl, Python and more.

Regular expressions are not restricted to just the Perl or shell environments, so Nathan offers variations for Python, PHP, and VIM as well. In most cases the translation is relatively straight-forward, but in a few cases a different environment may have (or lack) additional facilities, prompting a different expression to do the same task.

Before you even read chapter 1, Nathan provides a quick summary course on regular expressions, with detail given to each of the five environments you might utilize. He has written the syntax overview in a highly-readable format, making it easy to understand the gobbledy-gook of the most bizarre concoctions you might encounter.

The first chapter (Words and Text) starts simply enough. He gives examples of how to find single words, multiple words, and repeated words, along with examples of how to replace various detected strings with others. In each case he gives an example of its use for each platform, followed by a bit-by-bit breakdown of how it works. Not every environment is given on every example, and in many cases the "How It Works" section refers to the first one, as most REs are identical between the platforms.

The next chapter (URLs and Paths) offers various methods of doing commonly needed parsing. Pulling out file names, query strings, and directories, as well as reconstructing them in useful fashions is covered in the 15 offerings given here. Validating, converting, and extracting fields of CSV and tab-delimited files are handled in chapter 3, while chapter 4 is concerned with validating field formats, as well as re-formatting text for the fields. Chapter 5 handles similar tasks for HTML and XML documents. The final chapter covers expressions that facilitate the management of program code, log files, and the output of selected commands.

First, I must admit that there are a number of useful solutions provided, especially for someone who is concerned with application and web development. However, I did feel a little cheated by the fact that several chapters covered essentially the same task, with only minor variations. It almost seemed as though the author was trying to pad out the solution count to the magic number 100. A simple example: three solutions in chapter one cover (a) replacing smart quotes with straight quotes, (b) replacing copyright symbols with the (c) tri-graph, and (c) replacing trademark symbols with the (tm) sequence. In each case, the expression was simply "s/\xhh/ rep /g;". Did we really need three separate chapters for that? I don't think so.

Another quibble revolves around some of the coding of the expressions. Nathan has made liberal use of the non-capturing groups (that is, (: expr )) to insure only the items that needed replacement were captured. While a worthy idea, in some cases the expression may have been simplified for understanding. Another issue is a slight error in searching for letters. In a number of expressions, Nathan uses [A-z] to capture all letters. Unfortunately, the special characters [, \, ], ^, _, and ` occur between upper-case Z and lower-case a, making it match too much. Either [[:alpha:]] or [A-Za-z] should have been used.

Despite these quibbles, Regular Expression Recipes does provide a useful compendium of solutions for common problems developers face. Presenting the information in a cookbook fashion, along with ensuring that those using something other than Perl don't have to sweat translating the expressions to their target language, makes this a handy book to have. I wouldn't hesitate to recommend it.

You can purchase Regular Expression Recipes: A Problem-Solution Approach from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

13 of 258 comments (clear)

  1. Curious by LiquidCoooled · · Score: 3, Funny

    I was performing a strange custom regular expression on the book review, and discovered that it outputted the following:

    "Regex coders are in league with the devil"

    Who woulda thunk it!

    --
    liqbase :: faster than paper
    1. Re:Curious by Saeed+al-Sahaf · · Score: 4, Funny
      interesting, so what secret regular expression construct matches what is nowhere in the original string?

      It's something called a joke. A joke is something said or done to evoke laughter or amusement, especially an amusing story with a punch line. Jokes employ something called humor. Humor is the quality that makes something laughable or amusing. Many Slashdotters are unable to perceive, enjoy, or express what is amusing, comical, incongruous, or absurd, often referred to as humor impaired.

      --
      "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
    2. Re:Curious by ErikZ · · Score: 2, Funny

      You're pretty touchy for a mechanical abomination, devoid of all life and only a mere shadow of the men you were bult to replace.

      You should try tweaking your .conf.

      --
      Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
  2. Bran... by Anonymous Coward · · Score: 3, Funny

    ...is the best regular recipe.

  3. Re:Points by LiquidCoooled · · Score: 2, Funny

    2. the index has a lot of typos.

    No problem, the website issued a global regex and a pot of tip-ex for all customers.

    --
    liqbase :: faster than paper
  4. REGEX by null+etc. · · Score: 5, Funny
    Another quibble revolves around some of the coding of the expressions. Nathan has made liberal use of the non-capturing groups (that is, (: expr )) to insure only the items that needed replacement were captured. While a worthy idea, in some cases the expression may have been simplified for understanding.

    I'm not sure I understand what your quibble is - do you dislike the fact that he uses non-capturing groups, or the fact that he disposes of them at certain points?

    Another issue is a slight error in searching for letters. In a number of expressions, Nathan uses [A-z] to capture all letters. Unfortunately, the special characters [, \, ], ^, _, and ` occur between upper-case Z and lower-case a, making it match too much. Either [[:alpha:]] or [A-Za-z] should have been used.

    This seems like a relatively novice mistake, and I'm surprised it would show up in a book on regular expressions.

    Despite these quibbles, Regular Expression Recipes does provide a useful compendium of solutions for common problems developers face. Presenting the information in a cookbook fashion, along with ensuring that those using something other than Perl don't have to sweat translating the expressions to their target language, makes this a handy book to have. I wouldn't hesitate to recommend it.

    It's nice that he covers five environments for regular expressions. I'm sure everyone has heard of Mastering Regular Expressions, published by O'Reilly. The Perl Cookbook also does a good job at solving common problems with Regular expressions.

    This is just my opinion, but I think what the world needs is a book on Regular Expression Design Patterns.

  5. Minor variations by pocari · · Score: 5, Funny
    However, I did feel a little cheated by the fact that several chapters covered essentially the same task, with only minor variations.

    I can relate. I have cookbooks for food that have all these recipes that are nothing but flour, butter, eggs, and sugar. Do we need all these recipes for pancakes, cupcakes, cookies, crepes, waffles, popovers, bread, quick bread, bread sticks? Won't people figure out eventually to put a little less sugar in waffles with savory ingredients?

    Japanese cookbooks are even worse. Soy sauce, sake, mirin...boooooooring!

  6. two problems by EphemeralPhart · · Score: 4, Funny

    Some people, when confronted with a problem, think ``I know, I'll use regular expressions.'' Now they have two problems.

    Jamie Zawinski

  7. Linda Richmond says... by Anonymous Coward · · Score: 5, Funny

    I'm feeling a bit verklempt!

    Talk amongst yourselves!

    Alright, I'll give you a tawpic:

    "Regular Expressions are neither regular nor expressions."

    Discuss.

  8. a Cookbook eh? by chiapetofborg · · Score: 3, Funny

    Anyone have any good recipies for [cookies]+ ?

  9. Re:Regexes are overused by Anonymous Coward · · Score: 4, Funny

    > *11-page* regex.

    I think that's a sure sign of insanity. Or autism at the least.

  10. Re:F*ck this book and all others like it: by winkydink · · Score: 1, Funny

    Yeah, mod me down as a troll, don't even READ my comment. [...]

    You dumb slashbot fucks have no idea what a regex is [...]

    Sycophants and asshats, monkeys who crawl around above my office trying to figure out which wire the rats chewed through. Know-nothing idiots [...]

    Fuck you and your iPods. All those white earbuds do is help me pick out the clueless wannabes. No true geek would own one.


    Let me guess... you didn't finish the Dale Carnegie course, did you?

    --

    "I'd rather be a lightning rod than a seismometer." -Ken Kesey

  11. Re:Points by poot_rootbeer · · Score: 2, Funny

    2. the index has a lot of typos.

    Yeah, but in a book about regexes, you have to study the index VERY CAREFULLY to determine whether there are any typos or not.