Slashdot Mirror


Regular Expression Pocket Reference

Michael J. Ross writes "When software developers need to manipulate text programmatically — such as finding all substrings within some text that match a particular pattern — the most concise and flexible solution is to use "regular expressions," which are strings of characters and symbols that can look anything but regular. Nonetheless, they can be invaluable for locating text that matches a pattern (the "expression"), and optionally replacing the matched text with new text. Regular expressions have proven so popular that they have been incorporated into most if not all major programming languages and editors, and even at least one Web server. But each one implements regular expressions in its own way — which is reason enough for programmers to appreciate the latest edition of Regular Expression Pocket Reference, by Tony Stubblebine." Read below for the rest of Michael's review. Regular Expression Pocket Reference, Second Edition author Tony Stubblebine pages 126 publisher O'Reilly Media rating 9/10 reviewer Michael J. Ross ISBN 0596514271 summary A pithy guide to regular expressions in many languages. The second edition of the book was published by O'Reilly Media on 18 July 2007, under the ISBNs 0596514271 and 978-0596514273. On the book's Web page, the publisher makes available the book's table of contents and index, as well as links for providing feedback and any errata. As of this writing, there are no unconfirmed errata (those submitted by readers but not yet checked by the author to see whether they are valid), and no confirmed ones, either. In fact, in my review of the first edition, published in 2004, it was noted that there were no unconfirmed errata, despite the book being out for some time prior to that review. The most likely explanation is that the author — in addition to any technical reviewers — did a thorough job of checking all of the regular expressions in the book, along with the sample code that make use of them. These efforts have paid off with the apparent absence of any errors in this new edition — something unseen in any other technical book with which I am familiar.

Before discussing this particular book, it may be of value to briefly discuss the essential concept of regular expressions, for the benefit of any readers who are not familiar with them. As noted earlier, a regular expression (frequently termed a "regex") is a string of characters intended for matching substrings in a block of text. A regex pattern can match literally, such as the pattern "book" matching both "book" and "bookshelf." A pattern can also use special characters and character combinations — often termed metasymbols and metasequences — such as \w to indicate a single word character (A-Z, a-z, 0-9, or '_'). Thus, the regex "b\w\wk" would match "book," but not "brook."

Here is a simple example to show the use of regexes in code, written in Perl: The statement "$text =~ m/book/;" would find the first instance of the string "book" inside the scalar variable $text, which presumably contains some text. To substitute all instances of the string with the word "publication," you could use the statement "$text =~ s/book/publication/g;" ('g' for globally search) or use "$text =~ s/bo{2}k/publication/g;". In this simplistic example, the second statement makes use of a quantifier, {2}, indicating two of the preceding letter.

These examples employ only one metacharacter (\w) and one quantifier ({2}). The total number of metacharacters, metasymbols, quantifiers, character classes, and assertions (to say nothing of capturing, clustering, and alternation) that are available, in most regex-enabled languages, is tremendous. However, the same cannot be said for the readability of all but the simplest regular expressions — especially lengthy ones not improved by whitespace and comments. As a consequence, when using regexes in their code, many programmers find themselves repeatedly consulting reference materials that do not focus on regular expressions. These resources comprise convoluted Perl books, incomplete tutorials on the Internet, and confusing discussions in technical newsgroups. For too many years, there was no published book providing the details of regexes for the various languages that utilize them, in addition to a clear explanation of how to use regexes wisely.

Fortunately, O'Reilly Media offers two titles in hopes of meeting that need: Mastering Regular Expressions, by Jeffrey Friedl, and Regular Expression Pocket Reference, by Tony Stubblebine. In several respects, the books are related — particularly in that Stubblebine bases his slender monograph upon Friedl's larger and more extensive title, justifiably characterized by Stubblebine as "the definitive work on the subject." In addition, Stubblebine's book follows the structure of Friedl's book, and contains page references to the same. Another major difference is that Regular Expression Pocket Reference is, just as the title indicates, for reference purposes only, and not intended as a tutorial.

At first glance, it is clear that Stubblebine's book packs a great deal of information into its modest 126 pages. That may partly be a result of the terseness of most, if not all, of the regular expression syntax; a metasymbol of more than two characters would be considered long-winded! Yet the high information density is likely also due to the manner in which Stubblebine has distilled the operators and rules, as well as the meaning and usage thereof, down to the bare bones. But this does not imply that the book is bereft of examples. Most of the sections contain at least one, and sometimes several, code fragments that illustrate the regex elements under discussion.

The book begins with a brief introduction to regexes and pattern matching, followed by an even briefer cookbook section, with Perl-style regexes for a dozen commonly-needed tasks, e.g., validating dates. The bulk of the book's material is divided into 11 sections, each one devoted to the usage of regexes within a particular language, application, or library: Perl 5.8, Java,.NET and C#, PHP, Python, Ruby, JavaScript, PCRE, the Apache Web server, the vi programmer's editor, and shell tools.

Each of these sections begins with a brief overview of how regexes fit into the overall language covered in that section. Following this is a subsection listing all of the supported metacharacters, with a summary of their meanings, in tabular format. In most cases, this is followed by a subsection showing the usage of those metacharacters — either in the form of operators or pattern-matching functions, depending upon how regular expressions are used within that language. Next is a subsection providing several examples, which is often the first material that most programmers turn to when trying to quickly figure out how to use one aspect of a language. Each section concludes with a short listing of other resources related to regexes for that particular language.

There are no glaring problems in this book, and I can only assume that all of the regular expressions themselves have been tested by the author and by previous readers. However, there is a minor weakness that should be pointed out, and could be corrected in the next edition. In most of the sections' examples, Stubblebine wisely formats the code so that every left brace ("{") is on the same line as the beginning of the statement that uses that brace, and each closing brace ("}") is lined up directly underneath the first character of the statement. This format saves space and makes it easier to match up the statement with its corresponding close brace. However, in the.NET / C# and PCRE library sections, the open braces consume their own lines, and also are indented inconsistently, as are the close braces, which makes the code less readable, as well as less consistent among the sections.

Some readers may fault the book's sparse index. Admittedly, an inadequate index in any sizable programming book can make it difficult if not impossible to find what one is looking for. As a result, one ends up flipping through the book's pages hoping to luckily spot the desired topic. This is the rather unpleasant method to which a reader must resort when a technical book has no index, or one that is inadequate — which is far too often the case. Stubblebine's index offers only several dozen entries for all the letters of the alphabet, and only two symbols. Some readers might demand that all of the metacharacters and metasequences be listed in the index, so they can be found even faster than otherwise. But given the large number of metacharacters and metasequences, as well as method names, module functions, and everything else relevant, creating an exhaustive index would almost double the size of the book, and be largely redundant with the language-specific sections. Within each language, there is typically a limited enough number of pages that scanning through them to find a particular topic, would not be onerous. On the other hand, some of the index's inclusions and omissions are odd. For instance, two symbols are listed, and yet no others; why bother with those two? Also, a few key concepts are missing, such as grouping and capturing.

Yet aside from these minor blemishes, Regular Expression Pocket Reference is a concise, well-written, and information-rich resource that should be kept on hand by any busy software developer.

Michael J. Ross is a Web developer, writer, and freelance editor.

You can purchase Regular Expression Pocket Reference, Second Edition from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

144 comments

  1. I want one by corerunner · · Score: 0, Offtopic

    this looks good, want to send me a copy? ;)

    --
    "Don't hate the media, become the media." -Jello Biafra
  2. General introductions to regex? by CRCulver · · Score: 1

    I find that O'Reilly's books on regular expressions (beside the pocket reference there's Mastering Regular Expressions ) seem to assume a great deal of prior knowledge. Is there any introduction to regular expressions for total beginners, perhaps teaching through examples and including exercises?

    1. Re:General introductions to regex? by Swizec · · Score: 2, Informative

      You can always try php.net. I find that it's a fairly good introductory tutorial into regular expressions going through all the basics and such. It might be a tad specific, but the general science behind them is there and should allow you to quickly learn them in any language.

    2. Re:General introductions to regex? by CRCulver · · Score: 1

      I should mention that it needn't be a print book. Any websites that fit the description would be interesting as well.

    3. Re:General introductions to regex? by Anonymous Coward · · Score: 0
    4. Re:General introductions to regex? by wol · · Score: 5, Informative
      --
      If you think deeply enough, you will have no single direction for your outrage.
    5. Re:General introductions to regex? by Anonymous Coward · · Score: 1, Funny

      Is there any introduction to regular expressions for total beginners, perhaps teaching through examples and including exercises?
      This book IS for total beginners, literally.
    6. Re:General introductions to regex? by Anonymous Coward · · Score: 1, Informative

      Try the free sample chapter for the book Pro Perl Parsing from Apress. It provides a nice walk through of Regex usage and how Regexs work.

    7. Re:General introductions to regex? by morgan_greywolf · · Score: 1

      http://www.regular-expressions.info/ And this book doesn't seem any better than this site, which I've used as a reference for the last 3-4 years or so. Plus, there's an additional advantage to using regular-expressions.info over this book: You can't grep dead trees!.

    8. Re:General introductions to regex? by athakur999 · · Score: 5, Informative

      A regex visualizer is pretty useful too for understanding how regex works. I used this one a few years ago and it does a good job:

      http://laurent.riesterer.free.fr/regexp/

      It will color code your regex pattern and the associated matches in the string to be searched so you know what is matching what.

      --
      "People that quote themselves in their signatures bother me" - athakur999
    9. Re:General introductions to regex? by MightyYar · · Score: 2, Funny

      You can't grep dead trees! Dang, you're right:
      [mini-me:/] luser% grep dead trees
      grep: trees: No such file or directory

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    10. Re:General introductions to regex? by Anonymous Coward · · Score: 2, Informative

      Urgh, no. I just had a look at the site, and any site with gems like this right on the front page should definitely be avoided:

      you could use the regular expression \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b Analyze this regular expression with RegexBuddy to search for an email address. Any email address, to be exact. A very similar regular expression (replace the first \b with ^ and the last one with $) can be used by a programmer to check if the user entered a properly formatted email address.

      Checking email addresses for well-formedness (not the same as validity, anyway) is possible with regexes, but the above example is definitely wrong, and anyone who wants to do so should better use a Perl module or something similar in their language of choice instead of trying to reinvent the wheel and - inevitably - getting it wrong.

      So the advice that site is giving there is flawed on several levels, and for me, that's enough to take anything and everything on there with a big grain of salt. I'd advise others to stay away and turn to more reliable resources.

    11. Re:General introductions to regex? by jtev · · Score: 1

      Well, depending on how much mathmatical background you have, I would say that a good place to start on regular expressions is to pick up a book on discrete mathmatics. Once you've mastered the concepts contained in it, you might want to move on to something that is more detailed about automota theory. Unfortunatly regular languages (the set of languages that can be expressed as regular expressions) require a bit of background to truly understand. That said, the description of them is rather simple, and if you only want to touch on them, and not get terribly in depth, just don't use everything that an RE or RL can do/be and you should be fine.

      --
      That which is done from love exists beyond good and evil
    12. Re:General introductions to regex? by jlowery · · Score: 2, Funny

      The most useful link I've seen on /. in a long, long time.

      --
      If you post it, they will read.
    13. Re:General introductions to regex? by Anonymous Coward · · Score: 0

      Don't you mean egrep?

    14. Re:General introductions to regex? by jandrese · · Score: 4, Insightful

      That and getting into that kind of depth is usually a good way to find the bugs in your regular expression library. It's also an easy way to write code that will drive maintainers crazy.

      Unless you're a hard core mathhead, that's probably not a good place to start with regexes IMHO. That's just going to scare people off from a highly useful tool. One generally does not need to rigorously prove that his regexes are going to work to use them. One does not have to use every feature of a language to make good use of it.

      --

      I read the internet for the articles.
    15. Re:General introductions to regex? by jtev · · Score: 1

      Yes, but most books on regular expressions expect you to know what a regular expression is. And the depth to which regular expressions are covered in the discrete math book I used freshman year was shallow enough to give someone a broad overview without swamping them, assuming they have the mathmatical rigor to get that far. If they don't, then what they can gleam from a book on REs without knowing even that level of depth will be adequate for 99% of what REs are used for.

      --
      That which is done from love exists beyond good and evil
    16. Re:General introductions to regex? by WarJolt · · Score: 1

      I suggest taking a class. I took a Intro to Unix course at a community college a while back and two weeks were learning regular expressions. VI isn't used by many people new to unix, but has powerful regex search and replace features. Grep is a must if you are a power user. I've learned perl which bases a lot of features around regex.
      If you're interested in examples from my teachers website.
      for pearl:
      http://voyager.deanza.edu/~perry/cis331.html

      for vi:
      http://voyager.deanza.edu/~perry/vi.html
      Perl might not be for you, but the examples should be relevant.

    17. Re:General introductions to regex? by jandrese · · Score: 1

      I've never understood why people find them so confusing in the first place. The concept is dirt simple: You tell the computer to look for X (usually the example here is a fixed string match) in your data. When it finds X, it tells you where it is. Magic!

      Then you go on and explain wildcards, character classes, and subexpressions and you've covered 95% of what a regular person will use in day to day life, all in the span of about 5 minutes. The hardest part about using regular expressions is usually setting up all of the support stuff that the language makes you go through before using them (even the PCRE library has a fair bit of stuff you have to do to make a single match). That's the big reason they didn't take off until Perl came around IMHO, because they were just too much work for the payoff in pretty much every language up until then. Perl integrated them nicely at the core of the language and suddenly everybody stated using them.

      --

      I read the internet for the articles.
    18. Re:General introductions to regex? by larry+bagina · · Score: 1

      The dragon book aka Compilers: Principles, Techniques, and Tools.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    19. Re:General introductions to regex? by Abcd1234 · · Score: 1

      Actually, I find the easiest way to understand regexes is to understand their underlying representation: the finite state machine. Understanding this not only helps to illuminate how regexes work, it also highlights their limitations (eg, counting).

      'course, taking a course in formal language theory is even better (and should be a required course as part of a computing science degree, IMHO). :)

    20. Re:General introductions to regex? by cmacb · · Score: 1

      Does the book, or any other reference explain why we need such an obtuse mechanism for parsing strings in the first place? Most of the things I read about people doing with regular expressions could be done with much more intuitive string handling methods that have been around since at least the 70s. There may be things that can be done with regex that couldn't be done with (for example) the "parse" statement in Rexx, but it would be a very small percentage of the examples I've seen.

    21. Re:General introductions to regex? by jasko · · Score: 1

      Everything I know about regular expressions came from the Python documentation. http://docs.python.org/lib/re-syntax.html

    22. Re:General introductions to regex? by markjl · · Score: 1
      --
      My opinions are my own, but you may share them!
    23. Re:General introductions to regex? by Tony+Hoyle · · Score: 1

      I've never understood why people find them so confusing in the first place.

      Same here.. people here advocating all sorts of wierd stuff like advanced maths theory*, when anyone could work out regular expressions by looking at them for a few minutes. Of course visualisers help for the really complex stuff (which nobody ever uses anyway).

      PCRE is actually quite nice - you only have to bother with the setup once. just make a class that you can chuck a regexp string at and reuse it. Depends on the data set I guess.

      * I hate maths.. Computer science has never really got over the way it used to be a part of the maths dept. at schools. They *really* hated me.. I was in remedial classes for maths (never saw the point of learning it until I was in my late teens, so I just wrote crap for all the anwsers and went back to whatever I was messing with at the time), so technically I wasn't allowed to enter the computer class as I was too stupid, but then I got 99% on the aptitude test and they were kinda stuck in a quandry as it was the higest score they'd ever had...

    24. Re:General introductions to regex? by lsolano · · Score: 0

      GREAT site, incredible useful. Absolutely recommended.

    25. Re:General introductions to regex? by Anonymous Coward · · Score: 0

      So, you have a dead file?

      The authorities have been notified.

    26. Re:General introductions to regex? by X0563511 · · Score: 1

      Donate the $5 and get a PDF book - It's well worth it... on the webpage click PDF on the left-side navigation bar.

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    27. Re:General introductions to regex? by Anonymous Coward · · Score: 2, Insightful

      Does the book, or any other reference explain why we need such an obtuse mechanism for parsing strings in the first place?
      What's obtuse about them? They're a straightforward and direct way of describing text patterns, and perfectly intuitive if you have an analytical mind (and if you don't, you shouldn't be programming in the first place).

      Here's a REXX example from Wikipedia:

      myVar = "(202) 123-1234"
      parse var MyVar 2 AreaCode 5 7 SubNumber
      say "Area code is:" AreaCode
      say "Subscriber number is:" SubNumber
      This is your idea of "intuitive"? Don't make me laugh! It's difficult to understand (humans work by recognising patterns, not by counting characters) and it's fragile (what happens if someone puts an extra space in after the area code?).

      Here's the Perl equivalent:

      my $var = "(202) 123-1234";
      my ($areacode, $subnumber) = $var =~ m{
          \( (\d+) \) # Area code (parenthesized digits)
          \s* # Optional whitespace
          (\d+-\d+) # Subscriber number (two groups of digits separated by a hyphen)
      }x;
      print "Area code is: $areacode\nSubscriber number is: $subnumber\n";
      Much more readable (though not quite as readable as it was before Slashdot mangled it by squishing the whitespace before the comments) and it's not often you get to say that about Perl!
    28. Re:General introductions to regex? by shellbeach · · Score: 1

      Mod parent up ... that's the best regexp I've ever seen!

      'course, it doesn't actually seem to work under Perl 5.8; but I sure as hell ain't trying to debug it ...

    29. Re:General introductions to regex? by Anonymous Coward · · Score: 0

      Unless you're a hard core mathhead, that's probably not a good place to start with regexes IMHO.

      Even if you are a "hard core mathhead" it might not do that much good, because what Perl and others call "regular expressions" are not actually regular expressions. You can match far more than just regular languages with Perl regexps these days.

      "'Regular expressions' [...] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes" (or "regexen", when I'm in an Anglo-Saxon mood)." --Larry Wall
    30. Re:General introductions to regex? by kalidasa · · Score: 1

      I've got bad news for you, Tony. Computer Science IS math, and if you're good at it, you'd probably be good at math if you applied yourself. Understand that regexes are just a kind of function that takes symbols rather than digits, and returns either true or false.

    31. Re:General introductions to regex? by Anonymous Coward · · Score: 0

      Well, according to the same site, there is a more standards-compliant version as well (but it takes up 426 characters). They explain why the simpler regex is "good enough" in most cases.

    32. Re:General introductions to regex? by blantonl · · Score: 1

      I've found that the gold standard O'Reilly Book Learning Perl - Chapter 7 - Regular Expressions, is a fantastic beginners reference for regular expressions, how to use them, and the power of their usage.

      --
      Lindsay Blanton
      RadioReference.com
    33. Re:General introductions to regex? by Dragonslicer · · Score: 1

      Does the book, or any other reference explain why we need such an obtuse mechanism for parsing strings in the first place? Most of the things I read about people doing with regular expressions could be done with much more intuitive string handling methods that have been around since at least the 70s. There may be things that can be done with regex that couldn't be done with (for example) the "parse" statement in Rexx, but it would be a very small percentage of the examples I've seen. If a person is using a regular expression when they really only direct string parsing, that's the fault of the person, not regular expressions. The annoying details of finite state machines can be ignored if you're just using regular expressions in programming, but if you try to just use conditionals and substrings for all of your text parsing, eventually you'll have a case where you end up essentially writing your own finite state machine.
    34. Re:General introductions to regex? by CastrTroy · · Score: 1

      I read Mastering Regular Expressions, cover to, cover. I find that it started off very easily and even having no Regex knowledge outside of using *.* on the command line, I was able to pick up Regex using just this book pretty well. Sure you can't just read the book, and master regular expressions, but what programming concept can be mastered from simply reading a book? It's a really good starter, and a really good reference. Everything else you'll figure out from experimentation, and just using it.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    35. Re:General introductions to regex? by krazytekn0 · · Score: 1

      I recommend this for all your "maths" needs. Highly intuitive teaching method!

      --
      Not all life is cyber. Extra Income
    36. Re:General introductions to regex? by Mad+Merlin · · Score: 1

      There may be things that can be done with regex that couldn't be done with (for example) the "parse" statement in Rexx, but it would be a very small percentage of the examples I've seen.

      I don't think you understand the difference between "possible" and "easy". Using regular expressions to parse text is (really!) easy. Writing a 100% CSS 3, XHTML 1.1 and Javascript 1.7 compliant web browser entirely in x86 assembly by hand (on paper) in 24 hours or less is "possible".

    37. Re:General introductions to regex? by Mad+Merlin · · Score: 1

      I actually own that book... it talks about fake regular expressions though (ie, not regex).

    38. Re:General introductions to regex? by Mad+Merlin · · Score: 1

      I read Mastering Regular Expressions, cover to, cover. I find that it started off very easily and even having no Regex knowledge outside of using *.* on the command line...

      Actually, that's globbing that the shell does for you, not regex.

    39. Re:General introductions to regex? by CastrTroy · · Score: 1

      Yes, I realize it's not exactly the same as regular expressions, but it's kind of the same thing. Look for files that have such and such in the name. Mastering Regular Expressions even brings this up as an example, because just about everybody who would be reading the book has probably used this concept at some point in their lives.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    40. Re:General introductions to regex? by hackstraw · · Score: 1

      You can't grep dead trees!.

      Yes you can. You just need quotes, and a lot of patience.

      [me@my computer] grep "dead trees"

      If you wait long enough, everything you ever wanted will appear.

    41. Re:General introductions to regex? by morgan_greywolf · · Score: 1

      ^D

  3. catch 22 by blakbeard0 · · Score: 1

    You need help from the book in order to find the best way to search for its ebook on the internet

    1. Re:catch 22 by russotto · · Score: 1

      /(?:\w+)regex(?:\w+)/i (The lameness filter ain't going to allow thes, is it?

  4. "Regular Expressions for Onion Routing" by markov_chain · · Score: 1

    Now that would be an interesting pair of authors ;)

    --
    Tsunami -- You can't bring a good wave down!
  5. is it better than by superwiz · · Score: 1

    a google search for "regex [your fav language goes here]"?

    --
    Any guest worker system is indistinguishable from indentured servitude.
    1. Re:is it better than by MightyYar · · Score: 1

      That's kind of what I was thinking.

      Pair up Google with something like Kodos and you are all set. I still struggle with them sometimes, but nothing like before I had the debugger.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  6. Stand back, everyone by Armakuni · · Score: 5, Funny

    ...I have a pocket reference to regular expressions.

    --
    That's not Picasso, that's Kandinsky!
    1. Re:Stand back, everyone by Anonymous Coward · · Score: 0

      I guess that means you can now become a superhero: http://xkcd.com/208/!

      (funny side note: captcha is 'jumper')

    2. Re:Stand back, everyone by Anonymous Coward · · Score: 1, Funny

      You got the reference! Well done you!

    3. Re:Stand back, everyone by superwiz · · Score: 4, Funny

      and I want you to watch as I fuck your sister, you mealy-mouthed faggot. Guess neither of us are going to get what we want, are we? You make me fucking sick, you all do. Fuck off before I slap you in the mouth. Ask your doctor to decrease the dose.
      --
      Any guest worker system is indistinguishable from indentured servitude.
    4. Re:Stand back, everyone by ibmjones · · Score: 1

      For those who may not get the joke:

      http://xkcd.com/208/

    5. Re:Stand back, everyone by kylehase · · Score: 1

      Next time use a smaller font

      --
      You want fun, go home and buy a monkey!
    6. Re:Stand back, everyone by laejoh · · Score: 0

      m{\AIs that a reference to regular expressions in your pocket or are you just happy to see me\?\z}xms;

      And btw, you should stand back too, because I have the shirt!

    7. Re:Stand back, everyone by ignavus · · Score: 1

      While I have a pocket reference to some highly irregular expressions.

      --
      I am anarch of all I survey.
    8. Re:Stand back, everyone by locster · · Score: 1

      Or increase it. One or the other.

  7. FREE online version by Anonymous Coward · · Score: 0

    www.regexlib.com

  8. useful regular expression by stokessd · · Score: 4, Funny

    Here's the regular expression that I found most useful in childhood:

    "Hello, I'm a smart geeky person, please to not beat me up and take my lunch money. I can help you with your math homework"

    Sheldon

    1. Re:useful regular expression by Bogtha · · Score: 2, Funny

      I can help you with your math homework

      Now you have math problems.

      --
      Bogtha Bogtha Bogtha
    2. Re:useful regular expression by Vyx · · Score: 1

      Don't lookbehind!

      --
      Zerg = 1a2a3a4a5sh6sh7sh8sh9sh0sf
    3. Re:useful regular expression by Anonymous Coward · · Score: 0

      ...followed by the standard expression "oh shit".

    4. Re:useful regular expression by mav[LAG] · · Score: 3, Informative

      Pure genius and probably the first time I've laughed out loud at something to do with regexes. Hats off to you sir.

      For those of you who don't know the reference:

      Some people, when confronted with a problem, think "I know, I'll use regular expressions."
      Now they have two problems.
      --Jamie Zawinski, in comp.lang.emacs

      --
      --- Hot Shot City is particularly good.
    5. Re:useful regular expression by junglee_iitk · · Score: 1

      Ah! I didn't know that. Pure genius...

  9. Correction for summary by Jerry+Coffin · · Score: 4, Funny

    However, there is a minor weakness that should be pointed out, and could be corrected in the next edition. In most of the sections' examples, Stubblebine wisely formats the code so that every left brace ("{") is on the same line as the beginning of the statement that uses that brace, and each closing brace ("}") is lined up directly underneath the first character of the statement. This format saves space and makes it easier to match up the statement with its corresponding close brace. However, in the.NET / C# and PCRE library sections, the open braces consume their own lines, and also are indented inconsistently, as are the close braces, which makes the code less readable, as well as less consistent among the sections.


    A minor correction:
    However, there is a minor weakness that should be pointed out, and could be corrected in the next edition. Specifically, the book includes a section on .NET/C# and PCRE. By the time the next edition is needed, Microsoft will undoubtedly have moved on to new languages running in a new environment, as well as "enhanced" regular expressions "to provide better security and a syntax that is more approachable by beginners."
    --
    The universe is a figment of its own imagination.
    1. Re:Correction for summary by sconeu · · Score: 1

      PCRE isn't an MS technology.

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    2. Re:Correction for summary by Shados · · Score: 1

      Jokes aside, while it virtually implements the entire standard (and in some cases, more so than basically all other implementations), .NET's regexes actually DO have a few "extensions" to them, like special syntax to handle dynamic amounts of matching pairs more easily. The syntax is hell though, so not more approachable to beginners :)

  10. ObJWZ by Minwee · · Score: 4, Funny

    Because you just can't discuss regular expressions without bringing up this quote:

    Some people, when confronted with a problem, think "I know, I'll use regular expressions."
    Now they have two problems.

    -- Jamie Zawinski, 1997, in alt.religion.emacs

    1. Re:ObJWZ by poot_rootbeer · · Score: 1
      May I not-so-humbly submit my own revision to Zawinski's quote? Mine reads:

      Some people, when confronted with a problem, think "I know, I'll use regular expressions."
      Now they have ^[2-9]\d*$ problems.
    2. Re:ObJWZ by jandrese · · Score: 1

      To be fair, regular expressions in emacs lisp in 1997 were not exactly something for the faint of heart. Heck, the POSIX C regular expression library is a nightmare syntactically (all of the support code you need for a single expression is unbelievable). Hard as it is to believe, Perl actually made the syntax cleaner.

      Despite being less self-documenting, I have never met a person who prefers to type [[:alnum:]] over \w.

      --

      I read the internet for the articles.
  11. already built in by Fujisawa+Sensei · · Score: 2, Informative

    There's already a built in regular expression tutorial:

    man perlretut
    --
    If someone is passing you on the right, you are an asshole for driving in the wrong lane.
    1. Re:already built in by Anonymous Coward · · Score: 0

      I think you mean perldoc perlretut.

    2. Re:already built in by CRCulver · · Score: 1

      Wow, I had no idea this has been sitting on my box for years. Thank you very much for bringing it to our attention.

    3. Re:already built in by rrohbeck · · Score: 1

      There's already a built in regular expression tutorial:

      man perlretut And man perlrequick for regex noobs.
    4. Re:already built in by tonystubblebine · · Score: 1

      The book best for people who use regular expressions in more than one language/application. When I was a Perl programmer, I rarely looked at the Perl section, but did look at the javascript, grep, and sed sections all the time. Having a consistent format for each section should theoretically make it easier for you to apply your regex knowledge in whatever tool you're using (it does for me at least). I wrote the second edition primarily to add Apache and Ruby sections for myself, but also updated everything else with feedback and changes since the first edition.

    5. Re:already built in by Anonymous Coward · · Score: 0

      'man' is not recognized as an internal or external command,

      operable program or batch file.

    6. Re:already built in by value_added · · Score: 1

      There's already a built in regular expression tutorial: man perlretut

      Not to be pedantic, but that's neither "built in" nor a "regular expression" tutorial. That's one of the many manpages for Perl that gets installed when you install Perl, and describes, in a friendly format, using Perl and Perl regular expressions.

      Which is different than using Perl-compatible regular expressions as described in pcre(3).

      Which is different than using Posix regular expressions as described in re_format(7) or grep(1).

      So, by all means, use perlreftut (or any of the numerous, well-written and informative pages on the subject as outlined in perl(1) to learn as little or as much you want. You'll probably discover it's no more difficult to learn this regex business than it is to learn how to conjugate verbs. And you'll be better off for doing so. Just be wary that there are other implementations that, by comparison, don't quite measure up to the gold-standard that Perl has become.

    7. Re:already built in by Anonymous Coward · · Score: 0

      'man' is not recognized as an internal or external command,
      Most likely, you are sitting in front of a computer designed for females, not men. Find a computer in your workplace without the Mac logo, and that error should go away.
    8. Re:already built in by Anonymous Coward · · Score: 0

      'man' is not recognized as an internal or external command,

      operable program or batch file. Then get a computer with a real OS.
  12. Why bother with a book? by hcdejong · · Score: 1

    I use grep regularly enough to know generally how to build an expression, but not often enough to know each (I use grep in 3-4 different editors) application's quirks/implementation details off the top of my head, so I end up having to look up something regularly. I always use the application's Help file rather than the grep manual I've got laying around somewhere.
    Opening the Help file for the app and using its search function is a lot quicker than having to leaf through a book (worse when the book has a bad index). The only time this is annoying is when I've got a lack of screen real estate, but that's usually when I'm on the road and won't have access to any books anyway.

  13. Problems by Peaker · · Score: 2, Interesting

    I'll start with an Obligatory quote.

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. --Jamie Zawinski, in comp.lang.emacs

    I'll close with a somewhat depressing fact: Regular expression and string processing can be done quickly and efficiently (and was done that way back decades ago, with grep and awk), but is actually done in a horribly inefficient way in all modern/popular programming language regexp engines.

    1. Re:Problems by Abcd1234 · · Score: 4, Interesting

      First off, Mr. Zawinski is recorded as being rather prejudiced against Perl, so I'd take any comments he's made regarding regex's with a massive grain of salt. In fact, I'd probably just ignore him altogether. Besides, his comments are focused almost entirely on the *mis*uses of regexes, not their appropriate application.

      As for your second complaint... uhh, who cares? Premature optimization is the devil. So if regex's allow you to cleanly implement a simple solution to a problem (and regexes *are* very well suited to certain tasks, even if they do tend to be misused, particularly in languages such as Perl where they're very tightly integrated), it would be foolish to move to another technique based solely on performance concerns without first profiling the code.

      'course, the real irony, on the performance front, is that Mr. Zawinski himself said "The heavy use of regexps in Emacs is due almost entirely to performance issues: because of implementation details, Emacs code that uses regexps will almost always run faster than code that uses more traditional control structures." So maybe they aren't so evil or slow after all?

    2. Re:Problems by rrohbeck · · Score: 1

      I keep wondering about this myself all the time given that we wrote a regex engine with NFA-to-DFA conversion in 3rd semester CS way back when. That was kind of enlightening after weeks of DFA, NFA, formal languages, grammar and Chomsky hierarchy tedium.

    3. Re:Problems by Anonymous Coward · · Score: 0

      but is actually done in a horribly inefficient way in all modern/popular programming language regexp engines.
      That's because all modern and popular regexp engines match extended regular expressions, not actual regular expressions. Kleene's theorem is very nice, but it doesn't work for extended regular expressions.

      Also, the benchmark in the article is complete bull. The regexp /a{0,29}a{29}/ matches a 29-character string in about 5 microseconds on my machine. The only possible way the author could have benchmarked that expression at 60 seconds is to run perl on a "little-used PDP-7 in a corner".
    4. Re:Problems by bit01 · · Score: 1

      Premature optimization is the devil.

      Wrong. Premature peephole optimization is the devil.

      At the design stage choosing a good algorithm that scales is entirely appropriate. This is particularly true when you don't know how much data you'll be working with. Like any scripting language.

      Performance criteria are always part of a design and cruddy programmers who hide their incompetence with the above mantra should be fired. See dailywtf for examples.

      ---

      Don't be a programmer-bureaucrat; someone who substitutes marketing buzzwords and software bloat for verifiable improvements.

    5. Re:Problems by RedWizzard · · Score: 1

      I'll close with a somewhat depressing fact: Regular expression and string processing can be done quickly and efficiently (and was done that way back decades ago, with grep and awk), but is actually done in a horribly inefficient way in all modern/popular programming language regexp engines. I think you'll find that the regex algorithms used in the likes of Perl were chosen for a very good reason - not just because the implementers were lazy or stupid. The author of the article never addresses the fundamental differences in semantics between Posix regular expressions (such as grep and awk implement) and Perl regular expressions semantics. In the Posix case you must find the longest match, a requirement that the Thompson NFA approach handles easily. In the Perl case you must find the first match (i.e. you must try left branches of '|' before right branches, and treat '?' '*' and '+' as greedy). This requirement is problematic for the Thompson NFA algorithm.

      Comments on the Cox paper from the Haskell regex implementer. Another response, from the Perl side.

    6. Re:Problems by Anonymous Coward · · Score: 0

      Yeah, but when you're talking about 5 milliseconds vs 2 for something that runs once in a blue moon, who gives a fuck? Performance is always under consideration, but making it work it approximately infinitely more important. Anyone on my teams who goes off on unwarranted performance rants is fired.

    7. Re:Problems by bjourne · · Score: 1

      I'll close with a somewhat depressing fact: Regular expression and string processing can be done quickly and efficiently (and was done that way back decades ago, with grep and awk), but is actually done in a horribly inefficient way [swtch.com] in all modern/popular programming language regexp engines. That's not true. To get the exponential runtime from your regexps in a pcre-style engine, you have to write some wicked bad regular expressions. In Real Life(tm) backtracking engines are just as good as NFA's. Plus, backreferences are hard to implement using NFA's so you must resort to backtracking them anway. Which is why the authors of Perl's, Python's and PHP:s regular expression libraries have choosen to use recursive backgracking -- it is much simpler and you get the same performance for non-pathological cases.
    8. Re:Problems by bit01 · · Score: 1

      Yeah, but when you're talking about 5 milliseconds vs 2 for something that runs once in a blue moon, who gives a fuck?

      The problem is that when every programmer does this, and every program is just lots of little operations combined into big ones, then the program as a whole takes seconds to respond instead of milliseconds and likely fails requirements.

      Performance is always under consideration, but making it work it approximately infinitely more important.

      It is not either-or as you imply. Frequently, just a few minutes careful thought at design time can save massive amounts of development time, run time and user time. All at the same time. And time is money.

      Anyone on my teams who goes off on unwarranted performance rants is fired.

      Yes, it's possible to spend too much time optimizing code, like premature peephole optimization, but it's also possible to spend to little time optimizing and large projects frequently fail because of performance problems. Fixing it at that point can be very expensive compared to spending a few minutes getting the design right.

      Time is money and programs always have performance criteria, even if they're only implied, and assuming that the tooth fairy will make sure that a program will pass performance requirements is a great way to fail.

      ---

      Large, slow code is slower to debug. It costs development time. Those who claim there's a development/code performance tradeoff are blowing hot air.

    9. Re:Problems by Peaker · · Score: 1

      it is much simpler and you get the same performance for non-pathological cases. Its not that much simpler, as the NFA approach is quite simple. And they indeed speak of the backtracking required in some cases in the article. For backtracking regexps, use this approach, sure. But many (perhaps a majority) of regexps ARE regular and don't need to backtrack.

      Claiming that "real world regexps" are not pathological cases may be true - but there is a middle-ground. We have hit, in my workplace, cases of regular expressions scaling much worse than O(N) on the text - and they were completely regular!
      So real-world regular (truly regular) expressions could benefit a lot from the NFA approach.
      It was done correctly in the 60's, but nobody seems to get it right now.
    10. Re:Problems by Peaker · · Score: 1

      The article speaks of extended regular expressions - and those indeed require backtracking. The idea is there are plenty (perhaps a majority) of regular expressions that do NOT use the extended unregular features, and the engine could use a simple FSM for those.

      You got the regexp wrong, take another look at the one in the article, you're missing a ? there.
      Also, it may work very quickly with the constant 29, and take years with the constant 35 or 40, such are the wonders of exponential complexities.

    11. Re:Problems by Abcd1234 · · Score: 1

      but it's also possible to spend to little time optimizing and large projects frequently fail because of performance problems.

      Bah, I have never once seen a project fail because of software performance issues. A software project is *far* more likely to fail because of poor requirements, poor design, poor management, poor test, or more likely than not, all of the above.

      And for those cases where a project *did* fail because of large-scale performance issues, I'd bet dollars to donuts it's because of high-level architectural issues. And only an idiot would mistake designing a correct architecture with "premature optimization".

    12. Re:Problems by ahabswhale · · Score: 1

      Performance criteria are always part of a design and cruddy programmers who hide their incompetence with the above mantra should be fired. Bravo! Well said, sir. I'm sick and tired of people throwing out that mantra for every excuse they need to say performance doesn't matter till later. It's a one-size-fits-all stream of bullshit tossed about by the simple minded who don't understand the context of the original statement, and who don't know when it really applies.
      --
      Are agnostics skeptical of unicorns too?
    13. Re:Problems by TheLink · · Score: 1

      Well it's the job of the language people to do that NFA stuff.

      As long as they do it in a backward compatible way I don't care.

      --
    14. Re:Problems by RedWizzard · · Score: 1

      Well it's the job of the language people to do that NFA stuff.

      As long as they do it in a backward compatible way I don't care. They can't. Otherwise they would have. This is the point that the OP seems to have missed - language implementers aren't just a pack of idiots as the OP seems to believe. Non-backtracking NFAs can handle a certain subset of the requirements very efficiently, but can't handle the rest of the requirements at all. Back references are one thing they struggle with. Another is the requirement that many languages (such as Perl) impose to return the first match, not just any match or the longest match.
    15. Re:Problems by Peaker · · Score: 1

      You can fall back to backtracking when the obscure backtracking features are used - and use the regular engine when they are not.

      The majority of regexps ARE regular and there is no reason for them to pay the price of rare and obscure features that they do not use.

      Appearently the regexp implementors are a bunch of idiots after all :-)

    16. Re:Problems by RedWizzard · · Score: 1

      You can fall back to backtracking when the obscure backtracking features are used - and use the regular engine when they are not. There's nothing obscure about requiring the regex to return the first match. That is simply the semantics that most of these languages have chosen.

      Appearently the regexp implementors are a bunch of idiots after all :-) You seem to prefer to believe that every modern regex implementer is an idiot rather than recognize the fact that the Thompson NFA approach is not suited to the regex semantics most languages now employ. I think that's an extremely arrogant attitude. But hey, if you're so sure you're right then why not produce an implementation that proves it? Perl has pluggable regex engines now...
    17. Re:Problems by Peaker · · Score: 1

      NFA supports returning the first-match, and other trivialities that the Perl implementors thought it doesn't. It may take actually understanding the algorithm involved, however.

      I would take up your challenge, if I wasn't deeply involved with many others already. However, others I have mentioned these problems to have said that they intend to use this as a project.

    18. Re:Problems by TheLink · · Score: 1

      Sounds great to me. I hope you are right :). Then my stuff will just run faster.

      Aside: I have noticed that many versions of grep have become extremely slow after they introduced the i18n stuff, so much so that even perl is faster in many common cases.

      Switching to the C locale restores performance.

      --
  14. Bad index by ShawnCplus · · Score: 1

    Well to apply a common saying to something in need. You can't grep dead wood.

    --
    Excuse me while I gather the virgin sacrifice and assemble the pentagram required to solve your problem
  15. Regular Expressions & Literate Programming by hachete · · Score: 1

    Once I found the functions startswith and endswith, my need to use for regular expressions dropped away, fast. Occasionally, I'd have to pony up for a more complex pattern match, still a pain though even though I'd "cracked" regex. I wonder if the rest of regex could be done away with in a similar fashion?

    --
    Patriotism is a virtue of the vicious
    1. Re:Regular Expressions & Literate Programming by Abcd1234 · · Score: 1

      If that's all you were using regexes for, you were probably misusing them in the first place. Try doing any kind of complex text file parsing and you'll understand why regexes have their place.

  16. And for the Mac: RegExhibit by repetty · · Score: 4, Informative

    Another post links to a site for a regex visualizer utility for Windows and Linux.

    Here's one for the Mac:

    http://homepage.mac.com/roger_jolly/software/index.html#regexhibit

  17. I have to get one of these by HangingChad · · Score: 2, Insightful

    I'd rather stick knitting needles in my eyes than debug a regular expression.

    The only cure for that is getting a good reference and having a go at some tutorials until you get good enough to slay the beast. Then you'll be everyone's buddy at the office, because a lot of people feel the same way.

    Or you could just stick knitting needles in your eyes and slash your face with a razor and then everyone will leave you alone.

    --
    That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
    1. Re:I have to get one of these by rrohbeck · · Score: 1

      I'd rather stick knitting needles in my eyes than debug a regular expression.

      The only cure for that is getting a good reference and having a go at some tutorials until you get good enough to slay the beast. Then you'll be everyone's buddy at the office, because a lot of people feel the same way. It's not all that hard if you format them properly with /x first.

      If you're using Perl anyway, dunno if others have that feature too. I don't need anything but Perl :)
    2. Re:I have to get one of these by Bogtha · · Score: 1

      I'd rather stick knitting needles in my eyes than debug a regular expression.

      Why do you say that?

      --
      Bogtha Bogtha Bogtha
    3. Re:I have to get one of these by tonystubblebine · · Score: 2, Interesting

      I wrote the pocket reference and also this best practices article for writing regular expressions. It's the habits I developed to avoid poking my own eyes out: http://www.onlamp.com/pub/a/onlamp/2003/08/21/regexp.html

  18. A unanswered question by vrmlguy · · Score: 1

    How do you pronounce "regex"? I see four possibilities:
    1) "regh-ex" (hard 'g', like 'ghost')
    2) "rej-ex" (soft 'g', like 'gerbil')
    3) "re-gex" (hard 'g')
    4) "re-jex" (soft 'g')

    I use the first one, since those are the two initial syllables of 'regular' and 'expression', but I can see arguments for the others.

    --
    Nothing for 6-digit uids?
    1. Re:A unanswered question by gangien · · Score: 1

      since it's a combination of regular expressions i pronounce it "reg x"

      but i also think that char should be pronounced like (char)acter and not (char)red

    2. Re:A unanswered question by Xtravar · · Score: 1

      It's so weird how you never think of this stuff until somebody else says it differently.

      #2 - I don't think of it as the combination of two words, which is probably why I ignore the hard G from 'regular' and replace it with the soft G from 'register'. I do the same with Linux (lin ucks), char (chair), etc.

      Although I admit I can't pronounce the word "debacle" correctly for the life of me.

      --
      Buckle your ROFL belt, we're in for some LOLs.
    3. Re:A unanswered question by youthoftoday · · Score: 1

      I pronounce 'regexp' 'regular expressions'. Written contractions don't always have to leak out into speech.

      --
      -1 not first post
  19. Regexes aren't complicated by Anonymous Coward · · Score: 0

    (?<mod>\?!?)? # Match the type of the expression
                (?<v1>\$[A-Za-z_0-9]+) # Match the variable or the complex condition
                (?(mod)
                    (
                    {0} # Match first opeing delimiter
                    (?<inner>
                      (?>
                          {0} (?<LEVEL>) # On opening delimiter push level
                        |
                          {1} (?<-LEVEL>) # On closing delimiter pop level
                        |
                          (?! {0} | {1} ) . # Match any char unless the opening
                      )+ # or closing delimiters are in the lookahead string
                      (?(LEVEL)(?!)) # If level exists then fail
                    )
                    {1} # Match last closing delimiter
                    ){{1,2}} # Match one or two subexpressions
                  |
    :(?<v2>\$[A-Za-z_0-9]+) # Match the simple condition
                )?
  20. Don't fear - Just download txt2regex by Shux · · Score: 4, Informative

    Regular expressions are easier than you think and once you get comfortable with them you will be wishing you hadn't done so sooner. In my opinion the difficult part of learning them is just getting used the strange mess of dots, pluses, brackets, backslashes, etc. and what they mean in different contexts. Unfortunately it is hard to walk away from an article or howto on regexes and actually remember the meaning of all the symbols. Regular expressions are deliberately terse and that makes them hard to read and understand by humans.

    Therefore I think the best way to learn regular expressions is by example. I highly recommend this small interactive program which will walk you through building regular expressions for a few different languagues. When you think you need a regex for a program, just fire it up and answer the questions.

    http://txt2regex.sourceforge.net/

    After a while you won't need txt2regex for simple stuff because you will have hopefully just absorbed the syntax. Once you have mastered the basic regexes which txt2regex can generate you will be able to dive into more advanced topics like capturing groups.

  21. Regular expressions are out by Anonymous Coward · · Score: 0

    Parsing Expression Grammars are the future.

    They are generally faster and more powerful. It's easy to do regular expression-like stuff as well as really complicated things like a programming language parser.

    Lua's LPeg extension is really excellent.

  22. Missing applications by tirerim · · Score: 1

    What about emacs? Grep? Sed? This book sounds like a good idea, but it's not so useful without a wider selection of applications. Frankly, though, I just want a short guide as to which things need to be escaped to get which meanings, and what character classes are available.

    1. Re:Missing applications by tonystubblebine · · Score: 1

      It does cover grep and sed (and awk) but not emacs. Sorry about not covering the emacs. I'll give that some consideration if we do a third edition.

  23. Perl, regexps by Peaker · · Score: 2, Interesting

    If you read the link I posted, you will see that they are indeed evil and slow - and not for any good reason. The implementation of good regular expression engines is not difficult and known in CS theory for many decades.

    "Premature optimization" is a nice slogan - but the regexp performance problems are real, and I have encountered them before (I was extremely surprised to see that the regexp matching is scaling far worse than O(N) as it was clear to me that matching that regexp should be at worst O(N)).

    The reason it is depressing is because they got it right in the 60's, and are getting it wrong now. Stalling progress is sad. Deteriorating is depressing.

    As for elisp regexps being faster than other elisp methods - its not very indicative, as the regexp engine is implemented in C. If you compare, however, the pathological regexps (see my original link) in elisp, compared to a naive elisp char-by-char iteration of strings, you'll see that the elisp code performs better.

    About your link, it doesn't seem that he is prejudiced against Perl, it seems that he hates Perl and that implies no prejudice. Many of us dislike or even hate Perl because we find it less suitable for all tasks than other tools that we use, and because we find that it an extremely ugly hack that strongly encourages write-once read-never code.

    1. Re:Perl, regexps by Abcd1234 · · Score: 2, Interesting

      but the regexp performance problems are real, and I have encountered them before

      That's all well and good, but unless you're parsing extremely large volumes of text, the issues are probably unimportant. Which is, of course, why profiling is so important. Throwing out a perfectly valid solution simply because it is, in theory (or even in practice) slow, is ridiculous if you have other performance problems elsewhere, or if the code is running at a speed that is sufficient for the problem at hand.

      Put another way, if regexes solve the problem in a simple and easy manner, use them. And if, in running the code, you discover it's too slow to meet your requirements, and profiling indicates the regex is a problem, then switch to something else. But dismissing regexes out of hand is silly.

      because we find that it an extremely ugly hack that strongly encourages write-once read-never code.

      Good for you. I'm not sure why you pointed this out, as I don't really care, but that's lovely. Regardless, Zawinski clearly dislikes Perl, and those quotes make it clear that this dislike has translated to regexes as well, despite their being clearly superior to other solutions for certain problem domains. It looks like you may have done the same...

    2. Re:Perl, regexps by cp.tar · · Score: 1

      but the regexp performance problems are real, and I have encountered them before

      That's all well and good, but unless you're parsing extremely large volumes of text, the issues are probably unimportant. Which is, of course, why profiling is so important. Throwing out a perfectly valid solution simply because it is, in theory (or even in practice) slow, is ridiculous if you have other performance problems elsewhere, or if the code is running at a speed that is sufficient for the problem at hand.

      Then again, I'm a linguistics student. And we do quite a bit of work with corpora.
      Until now, most of the work has been done in Perl (and some in Intex, Unitex or Nooj); recently some started doing things in C++.
      Having read the article above, I think I'll start learning awk. Because we do have major performance issues.

      And let me just say: damn. Studying is easy.
      If I hope to get a job in hat department, I'll actually have to get something done ;)

      --
      Ignore this signature. By order.
    3. Re:Perl, regexps by CoughDropAddict · · Score: 1

      That's all well and good, but unless you're parsing extremely large volumes of text, the issues are probably unimportant. If you trigger Perl's worst-case regex performance, it can take over a minute to match a 30 character string. That's what the graph at the top of the referenced article illustrates.

      Try it for yourself:

      $ time perl -e '$x= "a" x 30; $x =~ /(a?){30}a{30}/'

      real 3m20.283s
      user 3m19.583s
      sys 0m0.086s

      Will you run into this worst-case performance? Probably not, as long as you write good regexes. Would it behoove you to understand that yes, Perl's regexes can have serious performance issues even with small amounts of text? Yes.
    4. Re:Perl, regexps by Anonymous Coward · · Score: 0

      Having read the article above, I think I'll start learning awk. Because we do have major performance issues.
      Before you start rewriting big chunks of code, remember to always do three things first: profile, profile, and profile again. Frankly, there is little chance that you're going to get a noticable performance boost by switching from a backtracking regex engine to a FA regex engine.
    5. Re:Perl, regexps by RedWizzard · · Score: 1

      If you read the link I posted, you will see that they are indeed evil and slow - and not for any good reason. Actually there are very good reasons. Just because that paper doesn't address them doesn't mean that they don't exist.
  24. uioIO&N890io io io io io io iuyy yu YU YU YDF by Anonymous Coward · · Score: 0

    lining lining lining the TBghjo 7u00 6fFGWDT^tvvttvtvtv^F^F^F^ : Ooo ;s;oflew kk lKL i ufedk hijnj n LJ L JJd hsytCRTCRCRqwe WHA iIKJDJ yrfge WHAT? iuYHIOO&Ylh j;|}|}}||}|}}|]\]\]\] fyhukYHLRFHJE> HFe fkllk>KKKL KL K f fiuewuiwuriw boog lining

  25. What is the concept of a regular expression? by Ed+Avis · · Score: 2, Informative

    it may be of value to briefly discuss the essential concept of regular expressions,
    Before you say this, make sure you know what that concept is.

    A regular expression can be thought of as a program which generates a set of strings - or recognizes a set of strings, which is the same thing. Regular expressions correspond to finite state automatons, so just as a FSA cannot recognize the set of all palindromes, neither can a regular expression. Also languages like perl have extended the capabilities of their regular expression string matchers to include things like backreferences, which cannot be done in a true regular expression, so we tend to use the word 'regexp' nowadays.

    Or perhaps I'm just playing the grumpy computer scientist here.
    --
    -- Ed Avis ed@membled.com
    1. Re:What is the concept of a regular expression? by Shados · · Score: 1

      I'm confused about what a "true" regular expression is, vs a "non-true" one... I mean, back references are part of the ECMA standard... I'm sure there's something Im missing here, but I'd like to know what...

    2. Re:What is the concept of a regular expression? by evilWurst · · Score: 2, Informative

      Rewording Ed for you: you can think of a "true" regular expression as just a shorthand for describing a state machine. Feed a state machine a string and it can only either accept or reject. Backreferences are an addition to the modern programming implementation of regular expressions, but aren't part of the language theory sense of regular expressions. You can do things with backreferences that *cannot* be done with a deterministic finite state automata. Interestingly, that wiki link has a quote from Larry Wall also saying that Perl regexes aren't real regular expressions :)

    3. Re:What is the concept of a regular expression? by Anonymous Coward · · Score: 0

      Ack, I goofed. The Larry Wall quote is in wiki's Regular Expression entry, not the Finite State Machine entry. Sorry.

    4. Re:What is the concept of a regular expression? by Mad+Merlin · · Score: 1

      I think the GP is referring to the kind of regular expressions you'd cover in a finite automata course (which I tend to refer to as "fake regular expressions", since I learned regex first...), not anything you'd actually ever implement in a library or programming language.

  26. RegExp Online by Anonymous Coward · · Score: 0
  27. Ultimate RegExp compact reference by gcsolaroli · · Score: 1
  28. What would you do, instead? by Anonymous Coward · · Score: 0

    You don't like regexes?  So what am I supposed to do?  I mean, I *have* to write tons of crappy code like this that's a lot more scatter-brained than a simple regex...  It solves ONE problem (and no others), I have to read through a lot of code to figure out why I'm doing this, and it's not likely to be any faster.

    Look at all the crap (in pseudo-Java, not using regexes at all) I'd have to write to allow people to specify something simple, like taking a pattern in the form of CCCVV and matching a word against it.  And compare that with Perl.

    # Perl
    sub matchCV {
      # Returns 1 if true, 0 if false.
      my ($pattern, $s) = @_;

      # Compare how this case is handled with the pseudo-Java.  It's easier to do it better.
      die "Invalid pattern ${pattern}!\n" if ($pattern !~ m/^[CV]+$/i);

      # Turn pattern into a real regex.
      $pattern =~ s/V/\[AEIOU\]/gi;                   # V -> all vowels
      $pattern =~ s/C/\[^BCDFGHJKLMNPQRSTVWXYZ\]/gi;  # C -> all consonants

      return 1 if ($s =~ /^$pattern$/);
      return 0;
    }

    /* Pseudo-Java */
    bool isVowel(char c) {

      String vowels = "AEIOU";

      if (vowels.indexOf(c.toUppercase()) > 0) {
        return true;
      } else {
        return false;
      }
    }

    bool matchCV(String pattern, String s) {
      /* Pattern is composed of the characters C and V depending on which we want to match. */
      bool retVal = true;

      if (pattern.length() != s.length) { return false; }

      for (int i = 0; (i < s.length()) && retVal; ++i) {
        switch (s.charAt(i)) {
          case 'C':
          case 'c':
            if (isVowel(s.charAt(i))) { retVal = false; }
            break;
          case 'V':
          case 'v':
            if (!isVowel(s.charAt(i))) { retVal = false; }
            break;
          default:
            Bug("Invalid pattern character " + s.charAt(i).toString() + " at position " + i.toString + ".");
        }
      }

      return retVal;
    }

  29. Bad example by rossz · · Score: 1

    $text = "The bookkeeper was very careful to keep proper books as he did not wish to be booked for fraud."
    $text =~ s/book/publication/g;

    Yeah, that will work. Not.

    --
    -- Will program for bandwidth
  30. I myself found this one particulary useful.. by SchizoDuckie · · Score: 1

    http://www.stklos.org/Doc/html/stklos-ref-5.html#Regular-Expressions STKLOS.org regex reference. I don't even know what the site is originally about, but the regex ref is the best!

    --
    Quack damn you!
  31. Just a little more modern, KDE Reg Exp Editor by gnutoo · · Score: 1

    KRegExpEditor gives you a nice GUI.

  32. Really? by gnutoo · · Score: 1

    Mac does not do Tkl/Tc? If it does, the GP post should work on OSX as well.

    Neither of these programs look as nice as KDE's Editor, and that too should work on Mac and Windows soon enough.

    1. Re:Really? by phantomfive · · Score: 1

      OSX can run Tcl/Tk, however it must be installed separately, it doesn't come out of the box. Neither does Windows. I might add (though mod me flamebait if you must, true is true) that the OSX Tk is visually far superior to the windows version.

      --
      Qxe4
  33. Instead of buying the book.. by ironwill96 · · Score: 1

    Buy this program: http://www.regexbuddy.com/

    It is the best $40 I ever spent when doing a project involving tons of Regular Expressions. It has detailed tutorials on how Regular Expressions work, a reference guide, debugging mode, real-time feedback on what your expression is doing, error checking, and a built-in forum where you can post your problems and people including the developer himself will chime in and help you figure it out!

    I'm not associated with JGSoft in anyway, but RegEx Buddy really is an awesome product. Also, you can change which language you are targeting and it understands the limitations of each one (and verifies your code will work) and he explains the differences between different language's implementations on that website (for free).

    --
    "To strive, to seek, to find, and not to yield." - Tennyson
  34. Very useful reference by jnelson4765 · · Score: 1

    The first edition copy I have is pretty dog-eared from constantly being stashed in my laptop bag. I don't use regexes every day, but I understand them, and just need a handy reference. Definitely great for a "how do you specify X" kind of problem.

    --
    Why can't I mod "-1 Idiot"?
  35. Then what are they? by Peaker · · Score: 1

    Oh, if so, what are those reasons really?

    1. Re:Then what are they? by RedWizzard · · Score: 1

      See my reply to your other post.