Regular Expression Recipes

← Back to Stories (view on slashdot.org)

Posted by timothy on Tuesday March 22, 2005 @07:45AM from the prune-talkin' dept.

r3lody writes "If you spend time working writing applications that have to do pattern matches and/or replacements, you know about some of the intricacies of regular expressions. For many people they can be an arcane hodgepodge of odd characters that somehow manage to do wonderful things, but they don't have enough time (or interest) to really understand how to code them. Nathan A. Good has written Regular Expression Recipes: A Problem-Solution Approach for those people. In its relatively slim 289 pages, he offers 100 regular expressions in a cookbook format, tailored to solve problems in one of six broad categories (Words and Text, URLs and Paths, CSV and Tab-Delimited Files, Formatting and Validating, HTML and XML, and Coding and Using Commands)." Read on for the rest of Lodato's review. Regular Expression Recipes: A Problem-Solution Approach author Nathan A. Good pages 289 publisher Apress rating 8/10 reviewer Raymond Lodato (rlodato AT yahoo DOT com) ISBN 159059441X summary A cookbook of useful regular expressions for Perl, Python and more.

Regular expressions are not restricted to just the Perl or shell environments, so Nathan offers variations for Python, PHP, and VIM as well. In most cases the translation is relatively straight-forward, but in a few cases a different environment may have (or lack) additional facilities, prompting a different expression to do the same task.

Before you even read chapter 1, Nathan provides a quick summary course on regular expressions, with detail given to each of the five environments you might utilize. He has written the syntax overview in a highly-readable format, making it easy to understand the gobbledy-gook of the most bizarre concoctions you might encounter.

The first chapter (Words and Text) starts simply enough. He gives examples of how to find single words, multiple words, and repeated words, along with examples of how to replace various detected strings with others. In each case he gives an example of its use for each platform, followed by a bit-by-bit breakdown of how it works. Not every environment is given on every example, and in many cases the "How It Works" section refers to the first one, as most REs are identical between the platforms.

The next chapter (URLs and Paths) offers various methods of doing commonly needed parsing. Pulling out file names, query strings, and directories, as well as reconstructing them in useful fashions is covered in the 15 offerings given here. Validating, converting, and extracting fields of CSV and tab-delimited files are handled in chapter 3, while chapter 4 is concerned with validating field formats, as well as re-formatting text for the fields. Chapter 5 handles similar tasks for HTML and XML documents. The final chapter covers expressions that facilitate the management of program code, log files, and the output of selected commands.

First, I must admit that there are a number of useful solutions provided, especially for someone who is concerned with application and web development. However, I did feel a little cheated by the fact that several chapters covered essentially the same task, with only minor variations. It almost seemed as though the author was trying to pad out the solution count to the magic number 100. A simple example: three solutions in chapter one cover (a) replacing smart quotes with straight quotes, (b) replacing copyright symbols with the (c) tri-graph, and (c) replacing trademark symbols with the (tm) sequence. In each case, the expression was simply "s/\xhh/ rep /g;". Did we really need three separate chapters for that? I don't think so.

Another quibble revolves around some of the coding of the expressions. Nathan has made liberal use of the non-capturing groups (that is, (: expr )) to insure only the items that needed replacement were captured. While a worthy idea, in some cases the expression may have been simplified for understanding. Another issue is a slight error in searching for letters. In a number of expressions, Nathan uses [A-z] to capture all letters. Unfortunately, the special characters [, \, ], ^, _, and ` occur between upper-case Z and lower-case a, making it match too much. Either [[:alpha:]] or [A-Za-z] should have been used.

Despite these quibbles, Regular Expression Recipes does provide a useful compendium of solutions for common problems developers face. Presenting the information in a cookbook fashion, along with ensuring that those using something other than Perl don't have to sweat translating the expressions to their target language, makes this a handy book to have. I wouldn't hesitate to recommend it.

You can purchase Regular Expression Recipes: A Problem-Solution Approach from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

258 comments

Curious by LiquidCoooled · 2005-03-22 07:48 · Score: 3, Funny

I was performing a strange custom regular expression on the book review, and discovered that it outputted the following:

"Regex coders are in league with the devil"

Who woulda thunk it!

--
liqbase :: faster than paper
1. Re:Curious by LordoftheWoods · 2005-03-22 08:33 · Score: 1
  
  interesting, so what secret regular expression construct matches what is nowhere in the original string?
2. Re:Curious by ThomasFlip · 2005-03-22 08:40 · Score: 1
  
  /R[^e]*e[^g]*g[^e]*e[^x]x[^ ] [^c]*c[^o]*/ etc....
  Would probably do it...
  
  --
  If the dollar is an "I owe you nothing", then the Euro is a "Who owes you nothing." - Doug Casey
3. Re:Curious by Saeed+al-Sahaf · 2005-03-22 08:53 · Score: 4, Funny
  
  interesting, so what secret regular expression construct matches what is nowhere in the original string?
  It's something called a joke. A joke is something said or done to evoke laughter or amusement, especially an amusing story with a punch line. Jokes employ something called humor. Humor is the quality that makes something laughable or amusing. Many Slashdotters are unable to perceive, enjoy, or express what is amusing, comical, incongruous, or absurd, often referred to as humor impaired.
  
  --
  "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
4. Re:Curious by Anonymous Coward · 2005-03-22 09:03 · Score: 0
  
  Running a strange custom regular expression on your comment comes up with "humorless twit"
5. Re:Curious by LordoftheWoods · 2005-03-22 09:09 · Score: 1
  
  Likewise.
  
  You, who mock me for apparent lack of humor, would not consider that my post might also have been in jest?
  
  This is the 'net, take it easy! No need to be cynical.
6. Re:Curious by Anonymous Coward · 2005-03-22 09:30 · Score: 0
  
  Slight nitpick: Most Slashdotters seem to have no trouble expressing the absurd.
7. Re:Curious by ErikZ · 2005-03-22 09:34 · Score: 1
  
  Sorry, your "Just Kidding!" line dosn't cut it.
  
  Just admit you're a robot and your life should go much smoother from now on.
  
  And not an amusing robot like Bender, or HAL.
  
  --
  Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
8. Re:Curious by Anonymous Coward · 2005-03-22 10:02 · Score: 0
  
  No, because jokes are funny. See?
9. Re:Curious by Anonymous Coward · 2005-03-22 10:57 · Score: 0
  
  On Slashdot, nine times out of ten a joke is something posted when the poster has no knowledge of the subject matter at hand but feels compelled to write something anyways in the hopes of getting recognition from his peers.
10. Re:Curious by FLEB · 2005-03-22 11:34 · Score: 1
  
  Your point?
  
  --
  Information wants to be free.
  Entertainment wants to be paid.
  You just want to be cheap.
11. Re:Curious by Anonymous Coward · 2005-03-22 13:00 · Score: 0
  
  Well, it's because the original post was funny, and yours wasn't. Simple!
12. Re:Curious by LordoftheWoods · 2005-03-22 17:08 · Score: 1
  
  Doesn't cut what? Your standards, which I am supposed to abide by? You are being extremely unreasonable. From my point of view, this excuse misinterpreting a small joke and blowing things way out of proportion by flaming me "doesn't cut it."
  
  It might have not been funny, but thats no reason to flame unnecessarily.
13. Re:Curious by Anonymous Coward · 2005-03-22 17:20 · Score: 0
  
  This has got to be some new yorker/sl2021 humor because I just don't get it. I see the set up--the peers, probably in a bar, the dumb guy, the "9 out of 10". It's all good stuff but where is the punchline?
14. Re:Curious by LordoftheWoods · 2005-03-22 17:30 · Score: 1
  
  Thanks for being one of the few who did not assume I was somehow trying to be inflammatory in my comment, and for not yourself propagating things by posting an inflammatory response.
  
  I will have to make sure not to post (bad) jokes in the future, as it seems slashdot can't take them calmly.
15. Re:Curious by LordoftheWoods · 2005-03-22 17:34 · Score: 1
  
  lol?
  
  sealab? o.O
16. Re:Curious by ErikZ · 2005-03-22 20:33 · Score: 2, Funny
  
  You're pretty touchy for a mechanical abomination, devoid of all life and only a mere shadow of the men you were bult to replace.
  
  You should try tweaking your .conf.
  
  --
  Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
17. Re:Curious by Anonymous Coward · 2005-03-22 22:41 · Score: 0
  
  "what secret regular expression construct matches what is nowhere in the original string?" /what is nowhere in the original string/
cant get used to them by alienfluid · 2005-03-22 07:48 · Score: 1, Informative

regular expressions are nice and all but i still cant get used to them .. a good manual should be kept handy at all times. Vist Lafayette Linux Users Group at http://lug.lafayette.edu. Suggestions are welcome.
1. Re:cant get used to them by Anonymous Coward · 2005-03-22 08:09 · Score: 0
  
  Learn the basics and then practice, practice and practice again.
  
  Regexps sure look scary to start with, but if you practice enough they'll grow on you.
  
  Then once you've got the basics covered, re-read your prefered manual/tutorial, pick up a couple of new techniques and pratice them.
  
  Unless you're using them for really performance intensive stuff, there's nothing wrong with using a sub-optimal regexp, provided you understand what's happening. You can always improve your technique later.
  
  I've used regexps pretty much daily for over 10 years, and I still pick up a better way of doing something once in a while.
2. Re:cant get used to them by Anonymous Coward · 2005-03-22 08:21 · Score: 2, Informative
  
  regular expressions are nice and all but i still cant get used to them .. a good manual should be kept handy at all times. [ ... ]
  Suggestions are welcome.
  
  I have a suggestion. Write a few regular expressions to get your brain refreshed on them, then go read this excellent article on how regular expressions work. At the very least, it will clear some confusing things up. Most likely you'll find that having a better understanding of the underlying concepts will make it easier for you to work with regular expressions day to day.
  
  Also, it helps if you are familiar with finite state machines. I learned about them in a couple classes while getting my CS degree, but they're not that hard and most people should be able to grasp them without any kind of formal CS training.
3. Re:cant get used to them by Waffle+Iron · 2005-03-22 08:22 · Score: 2, Informative
  
  regular expressions are nice and all but i still cant get used to them
  They may be kind of hard to get used to, but not has hard as writing, debugging and maintaining a dozen or more lines of custom string parsing code for each case where you would use one.
4. Re:cant get used to them by halber_mensch · 2005-03-22 08:22 · Score: 4, Informative
  
  A good starting point is to understand finite automata and regular languages first. See http://en.wikipedia.org/wiki/Automata_theory/ for a good first reference on automata. If you can grok automata, regular expressions will click with you.
  
  --
  perl -e "eval pack(q{H*},join q{},qw{70 72696e74207061636b28717b482a7d2c717b343 637323635363534323533343430617d293b})"
5. Re:cant get used to them by B'Trey · 2005-03-22 09:01 · Score: 3, Informative
  
  If you really want to understand regexes, get Jeffrey E. E. Friedl's "Mastering Regular Expressions" from O'Reilly. It's much deeper than the casaul reader will ever need, but if you get through it you will certainly know how regexes work from both a user perspective and from a regex engine perspective.
  
  --
  "The legitimate powers of government extend only to such acts as are injurious to others." Thomas Jefferson.
6. Re:cant get used to them by cayenne8 · 2005-03-22 09:54 · Score: 1
  
  If they could come up with some good ways to get CR/LF's out of MS excel files...in a csv, I'd be estatic!!
  I get excel files dumped to me for inserts into Oracle databases...some are HELL to clean up. Especially the comma delimited ones..with freeform text fields...that allow the user to put hard returns in them....
  That and one more bitch. When did MS make it so damned hard to change the delimiter in excel?? I remember a few editions ago, when you saved as a CSV, it gave you a wizard type thing to choose your delimiter.
  To do it now...in win2k, the only way I know how is to open the file, the go into control panel->regional options->numbers and change thelist separator to whatever I want it to be. Apply these changes. Go back to excel, save as csv...then, go back to the regional options and change it back to a comma.
  What the hell was MS thinking here?
  
  --
  Light travels faster than sound. This is why some people appear bright until you hear them speak.........
7. Re:cant get used to them by Anonymous Coward · 2005-03-22 10:01 · Score: 0
  
  Do I have to *write* the Wikipedia article first? Or is there another page I shoud look at? :-D
8. Re:cant get used to them by umgah · 2005-03-22 10:27 · Score: 1
  
  No need to write the page. Use this url. Wikipedia doesn't seem to like the trailing / character.
9. Re:cant get used to them by Anonymous Coward · 2005-03-22 10:51 · Score: 0
  
  When did MS make it so damned hard to change the delimiter in excel??
  
  As long as Excel can read back files it's written, it doesn't matter to Microsoft. They don't want you exporting data from their programs; however, Excel can read many different types of files, including fixed width columnar data. Excel usually guesses correctly about where to put the columns as well.
  
  If they could come up with some good ways to get CR/LF's out of MS excel files...in a csv, I'd be estatic!!
  
  You might try looking at one of the Perl OLE modules. You probably can't just replace all /r/n with /n. The column delimiter of their format is probably /0x01/r/n/0x01 or something :P.
  
  Of course, that's talking about Microsoft XLS files. If the file is a true CSV file, then this is a cakewalk. From a command line, do:
  
  perl -e'local $/;$_=<>;s/\r\n/\n/sg;print;' < input.csv > output.csv
  Naturally, Ruby does this more slowly than Perl, and it isn't even possible in Python. That said, you should use sed or awk for this task, and a real shell like ksh, zsk, or tcsh.
  
  The CSV format requires embedded newlines be placed inside double quotes. There's probably a Perl module already in existance for doing these sorts of things, though.
  
  open the file, the go into control panel->regional options->numbers and change thelist separator to whatever I want it to be. Apply these changes. Go back to excel, save as csv...then, go back to the regional options and change it back to a comma.
  
  You can likely automate this task with OLE and Perl if you can't find a better way.
  
  What the hell was MS thinking here?
  
  They were thinking that it doesn't matter if you can't read a file of theirs with a product they didn't make. Excel can open a lot of different file types. It should therefore be able to save to all of those as well, except that Microsoft doesn't want you to have that feature. We can't fault them for this, because it's a business decision that benefits them. We can, however, hate them for it, because we know that the power of a generic purpose processor and software is infinite in scope.
  
  I think it'd be cute if hackers started patching Microsoft software to add features like this.
10. Re:cant get used to them by WiFiBro · 2005-03-22 11:27 · Score: 1
  
  It would probably be as easy to write a macro (in VBA) which dumps the selected range in a csv file itself, you would only need something to treat quotes and newlines properly.
  It would not be the fastest macro, especially with larger selections.
11. Re:cant get used to them by Anonymous Coward · 2005-03-22 12:45 · Score: 0
  
  Most languages providing regex capabilities (like the Perl regex subset) are not context-free. They are much more powerful than context-free languages and push-down automata. I don't see how studying these would really help you.
12. Re:cant get used to them by Anonymous Coward · 2005-03-22 20:14 · Score: 0
  
  Well, understanding finite automata gives you the basics of regular expressions. By studying the theory, you should also be able to get an understanding of which regex constructs require backtracking and potentially exponential runtime.
Points by 2.7182 · 2005-03-22 07:49 · Score: 4, Informative

I really liked this book, but

1. the binding broke
2. the index has a lot of typos.
1. Re:Points by LiquidCoooled · 2005-03-22 07:50 · Score: 2, Funny
  
  2. the index has a lot of typos.
  
  No problem, the website issued a global regex and a pot of tip-ex for all customers.
  
  --
  liqbase :: faster than paper
2. Re:Points by poot_rootbeer · 2005-03-22 09:49 · Score: 2, Funny
  
  2. the index has a lot of typos.
  
  Yeah, but in a book about regexes, you have to study the index VERY CAREFULLY to determine whether there are any typos or not.
3. Re:Points by rwbaskette · 2005-03-22 10:34 · Score: 1
  
  That's funny, I spell checked it with a regex enabled aspell....
Bran... by Anonymous Coward · 2005-03-22 07:49 · Score: 3, Funny

...is the best regular recipe.
1. Re:Bran... by Anonymous Coward · 2005-03-22 17:39 · Score: 0
  
  Ye gods, if that's a regular recipe, what's a regular expression?
Another one? by cmstremi · 2005-03-22 07:50 · Score: 2, Insightful

Isn't there already enough coverage for Regex's? With all the existing books and the nearly endless availability of free information and sites (including many using the 'recipie' format) online, who will want this book.
1. Re:Another one? by scrotch · 2005-03-22 08:33 · Score: 1
  
  I don't know if this book would satisfy it, but personally, I'm tired of finding regex references that don't provide (or don't claim to provide) complete, working expressions. It seems like a pretty common occurrence to want to check that an entered email address could actually be an email address, but every regex tutorial/reference I have wimps out. They all say that their example is 'just for learning' or 'needs to be checked' or some such.
  
  A cookbook approach to Regexs seems great to me. Look up the one you want if you're in a hurry, stop and study it if you want to really understand it.
  
  If you know of a similar online reference, I'd love to know. It seems like there should be one out there.
2. Re:Another one? by northcat · 2005-03-22 08:33 · Score: 1
  
  If you don't want this, don't buy it.
3. Re:Another one? by carnivore302 · 2005-03-22 08:50 · Score: 3, Informative
  
  I don't think there is a need for another book on regexps, since there is already the excellent Mastering Regular Expressions by Jeffrey Friedl. What else then the best can you expect from an O'Reilly book?
  
  --
  Please login to access my lawn
4. Re:Another one? by Rylz · 2005-03-22 09:41 · Score: 1
  
  I agree. I learned RegExps from five pages of tables (and a little bit of explanation) in a Perl book and a lot of experimenting. Once I got the basics from that, whenever I needed more, I enlisted the help of Google. I really don't understand why a whole book devoted to this concept would help more than what is IMHO the best method for learning, googling and experimenting.
  
  --
  Sometimes you've gotta roll the hard six.
5. Re:Another one? by Molt · 2005-04-04 02:36 · Score: 1
  
  Friedl's "Mastering Regular Expressions" has just that, a completely RFC-822 compliant validating expression. It's about 6k long, which explains why you're unlikely to see many other people who're able to write such behemoth expressions.
  
  --
  404 Not Found: No such file or resource as '.sig'
Regular expressions in a cookbook? by DeadSea · 2005-03-22 07:51 · Score: 5, Informative

Sounds like good eating. ;-)
Regular expressions are great, but once you know them and you think you can conquer the world, I find they occasionally let you down. The text editor I was using had a rudementary regular expression search that did not support non-greedy matching. I found that writing a regular expression that finds C style /* comments */ to be quite tricky with only greeding matching. I wrote it up as an article where I build the expression piece by piece showing common things you might try that won't work.
If you want more of a challenge, try writing a regular expression that find any <script></script> tags along with anything in between using only greedy matching. You will find that the length of your regular expression goes up exponentially with the length of your ending condition.
--
Calculator for Converting Currency
1. Re:Regular expressions in a cookbook? by interiot · 2005-03-22 07:58 · Score: 4, Interesting
  
  Yup, regular expressions are not capable of a full-range of computing... they're pretty close (they're the lowest of four in the Chomsky hierarchy), but still have a few limitations that can't be resolved without wrapping some extra code around them.
  It still boggles my mind that people knew this in 1956 though.
2. Re:Regular expressions in a cookbook? by merlyn · 2005-03-22 08:08 · Score: 5, Informative
  
  Yup, regular expressions are not capable of a full-range of computing
  That's the "classic" regular expressions, not the modern regular expressions accepted by PCRE, and Perl itself. In fact, Perl regular expressions are full Turing machines, with PCRE being a few steps behind that. So PCRE isn't really PCRE... it's P-likeCRE. {grin}
  --
  
  Randal L. Schwartz, Just another Perl hacker for Stonehenge
3. Re:Regular expressions in a cookbook? by pcraven · 2005-03-22 08:10 · Score: 2, Interesting
  
  This is a cool article on catastrophic backtracking. I remember the first time that got me. It would occasionally cause severe issues on a production server we had. I swung and missed with my reg ex on that one.
4. Re:Regular expressions in a cookbook? by interiot · 2005-03-22 08:17 · Score: 3, Insightful
  
  You mean all the sections of the perl regexp manual that say "WARNING: This extended regular expression feature is considered highly experimental, and may be changed or deleted without notice" and then go on to say things that make my head truly ache?
  I personally treat this like I do Perl5 threads... as something to be afraid of, and hopeful that things will be much improved in Perl 6.
5. Re:Regular expressions in a cookbook? by wirelessbuzzers · 2005-03-22 08:18 · Score: 1
  
  If you want more of a challenge, try writing a regular expression that find any <script></script> tags along with anything in between using only greedy matching. You will find that the length of your regular expression goes up exponentially with the length of your ending condition.
  
  Actually, they grow quadratically:
  s{<script[^>]*> ( |[^<] |<[^/] |</[^s] |</s[^c] |</sc[^r] |</scr[^i] |</scri[^p] |</scrip[^t] |</script[^>] )* </script>}{}gix;
  
  --
  I hereby place the above post in the public domain.
6. Re:Regular expressions in a cookbook? by Anonymous Coward · 2005-03-22 08:32 · Score: 0
  
  Don't quote communists.
7. Re:Regular expressions in a cookbook? by DeadSea · 2005-03-22 08:33 · Score: 2, Informative
  
  Your expression fails for this case:
  <script><scri</script>
  It will match <scri< with your |</scri[^p] rule and then go on to match beyond the end of your regular expression.
  But I acknowledge that it may be quadratic rather than exponenetial even with a correct regular expression.
  --
  Exchange Rate Calculator
8. Re:Regular expressions in a cookbook? by prockcore · 2005-03-22 08:42 · Score: 2, Interesting
  
  I've been doing regex for a long time (over 10 years), and the best rule I can give newbies to follow is "match less, not more"
  
  Write your regex's so that they generalize as little as possible.
  
  For example, matching an xml tag use /]+>/ instead of //
  
  If you're using ".*?" in a regex, you might want to look at rewriting it.. it's almost never needed and almost always causes problems.
9. Re:Regular expressions in a cookbook? by prockcore · 2005-03-22 08:47 · Score: 2, Interesting
  
  (damn, I should really preview sometimes)
  
  The examples I gave are: /<[^>]+>/ instead of /<.*?>/
10. Re:Regular expressions in a cookbook? by syukton · 2005-03-22 09:11 · Score: 1
  
  I've been doing regular expressions for half as long and I completely agree. (your suggestion is actually the pattern I use when looking for tagging...)
  
  --
  Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.
11. Re:Regular expressions in a cookbook? by Anonymous Coward · 2005-03-22 09:37 · Score: 0
  
  Hey, this is Slashdot, we're all basically commies here!
12. Re:Regular expressions in a cookbook? by Anonymous Coward · 2005-03-22 09:46 · Score: 0
  
  They should have called it To Serve Regular Expressions
13. Re:Regular expressions in a cookbook? by Darby · 2005-03-22 10:16 · Score: 1, Troll
  
  Do you even know anything about perl?
  
  I'm sorry.
  Somebody has something in their sig about somebody saying that to Tom Christiansen and I thought I'd try to be funny.
  
  P.S. Thanks for the help on perlmonks.
14. Re:Regular expressions in a cookbook? by Ed+Avis · 2005-03-22 10:48 · Score: 1
  
  'When all you have is a hammer, everything looks like a nail.'
  
  That's what I thought when reading Jeffrey Friedl's book on regexps and it looks like this one is the same.
  
  Besides, why a book? Speaking from the perspective of a Perl programmer it makes much more sense to create libraries of real code. I guess if you're working in config files, or sed, or other tools that aren't full programming languages, then typing in things from a book could be useful.
  
  --
  -- Ed Avis ed@membled.com
15. Re:Regular expressions in a cookbook? by Anonymous Coward · 2005-03-22 12:21 · Score: 0
  
  Can you explain why .*? is bad? I used to use your recommended method ^>, but then moved to .*? because I thought it was the more correct method.
16. Re:Regular expressions in a cookbook? by Anonymous Coward · 2005-03-22 13:25 · Score: 0
  
  I'm not!
17. Re:Regular expressions in a cookbook? by merlyn · 2005-03-30 02:25 · Score: 1
  
  The Perl6 "rule" system is amazing. You can spell out a grammar in a nice, easy-to-comment BNF-like style, and then you end up with a nice AST of your input string. Then subclass it, and get a grammar variant. All built on top of "newer" regular expressions that begin to make more sense, and again encourage whitespace and commenting.
  In fact, Perl6 itself will be parsed using this grammar mechanism, with the active grammar being tweakable by the code as it is compiled. Self-modifying grammars instead of pre-processors or source filters! But "with great power, comes great responsibility".
  --
  
  Randal L. Schwartz, Just another Perl hacker for Stonehenge
Email RegEx by tquinlan · 2005-03-22 07:51 · Score: 1

I'm still looking for a good email regex, one that checks all forms of email addresses, including all the TLDs, and all the other various complicated forms email addresses can take.

--
DBA? Software Engineer? My company is hiring! Click
1. Re:Email RegEx by mqRakkis · 2005-03-22 07:57 · Score: 0
  
  Maybe not the perfect one, but pretty good anyway (the code around is PHP):
  
  function isValidEmailString($email) { return eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[ a-z0-9-]+)*(\.[a-z]{2,3})$", $email); }
2. Re:Email RegEx by tehshen · 2005-03-22 08:02 · Score: 1
  
  As you specified all forms of e-mail addresses...
  
  (I would post one here, but the lameness filter hates it, so I'll just link to it).
  
  Covers RFC 8288, as well as IP addresses.
  
  --
  Guy asked me for a quarter for a cup of coffee. So I bit him.
3. Re:Email RegEx by Sir_Real · 2005-03-22 08:10 · Score: 2, Interesting
  
  I'm still looking for a good email regex
  
  Well, you asked for it.
  
  Actually, I asked for it last week, in #linux on freenode. Scary huh?
4. Re:Email RegEx by Anonymous Coward · 2005-03-22 08:10 · Score: 0
  
  unless im reading that wrong, it only works with .?? and .??? tlds (like .com, .cx, etc) but not .info or anything larger than 3 characters
5. Re:Email RegEx by Anonymous Coward · 2005-03-22 08:17 · Score: 0
  
  Bah, write a verity function in PHP or something that opens a socket to the mail server/email addy specified, and ask it if it's valid. You can use the POP protocol to verify an address this way.
6. Re:Email RegEx by ssbljk · 2005-03-22 08:48 · Score: 1
  
  good RegEx library can be found at http://www.regexlib.com/
  
  --
  /ss
7. Re:Email RegEx by rduke15 · 2005-03-22 09:24 · Score: 1
  
  I found this little online Email address syntax checker which is useful in comparing the results of various classical Perl modules. It is very slow (maybe on purpose to avoid abuse?).
8. Re:Email RegEx by toga98 · 2005-03-22 09:28 · Score: 1
  
  What's RFC 8288? I think you meant RFC 822. Most regex email validators restrict based on TLD and other odd stuff that has to be maintained over time. It's better to simply verify that the string conforms to RFC 822. If you need to *really* know if the address is valid, then do an MX lookup plus SMTP validation.
9. Re:Email RegEx by skids · 2005-03-22 09:30 · Score: 1
  
  Hrm, that's a very inefficient (in terms of usability) regex someone sold you. If you are working in Perl, look into using the qr// operator to build it up from subexpressions. You can easily reduce that to 1/10th the size/complexity.
  
  --
  Someone had to do it.
10. Re:Email RegEx by Sir_Real · 2005-03-22 09:38 · Score: 1
  
  Yes, I see that. To contextualize this page further, I specifically asked for a regular expression that was usable outside of Perl, hence some of the verbosity. To me, it's more of a demonstration of the concept that regex isn't a panacea, and that email address verification is non-trivial (or at least more difficult than I was initially led to believe).
11. Re:Email RegEx by InvalidError · 2005-03-22 16:00 · Score: 1
  
  IDKJ about regex but this does appear to be somewhat unfair to people who host their eMail on a .info domain.
12. Re:Email RegEx by Anonymous Coward · 2005-03-23 01:36 · Score: 0
  
  Fails when IP parts are expressed as non-decimal numbers.
A language in their own right. by Sheetrock · 2005-03-22 07:52 · Score: 0

Regular expressions are probably the first Turing-complete language to be encapsulated in another Turing-complete language (C).
Unless of course you count machine language interactions with higher-level languages they implement, but I'm not. :)

--

Try not. Do or do not, there is no try.
-- Dr. Spock, stardate 2822-3.
1. Re:A language in their own right. by APDent · 2005-03-22 08:05 · Score: 1, Informative
  
  Regular expressions are not Turing complete.
2. Re:A language in their own right. by smoany · 2005-03-22 08:06 · Score: 2, Informative
  
  Um, last time I checked, Reg. Exp's are not turing complete. Take the expression O^n 1^n, which can be made by Turing machines. If you can make that for me using a Regular Expression, you deserve a Turing Award. Regular expressions are DFA/NFA complete, not turing complete... not even close!
3. Re:A language in their own right. by khrtt · 2005-03-22 08:07 · Score: 3, Informative
  
  Regular expressions are probably the first Turing-complete language to be encapsulated in another Turing-complete language (C).
  
  Don't you just love to sound like a StarTrek character, with all that fancy terminology?
  
  Go look up your complexity book - if you have one - regexes are not even close to Turing-complete.
4. Re:A language in their own right. by soliptic · 2005-03-22 09:57 · Score: 1
  
  Take the expression O^n 1^n, which can be made by Turing machines. If you can make that for me using a Regular Expression, you deserve a Turing Award.
  That's the most interesting thing I've read on slashdot today.... although/because I don't understand. What's O^n 1^n ?
  //wanders off to google/wikipedia hopefully
5. Re:A language in their own right. by YetAnotherLogin · 2005-03-22 10:23 · Score: 1
  
  Probably OOOO....O11111....111, with n of each symbol.
6. Re:A language in their own right. by LnxAddct · 2005-03-22 11:37 · Score: 1
  
  The original regular expressions were very very close to turing complete and today with enhanced regular expressions, they *are* turing complete.
  Regards,
  Steve
7. Re:A language in their own right. by Anonymous Coward · 2005-03-22 12:22 · Score: 0
  
  Ooh, take that, GP!
  
  MODED!
8. Re:A language in their own right. by khrtt · 2005-03-22 12:27 · Score: 1
  
  Hmm..
  Just what enhancements to regex are there that make them Turing-complete? If you don't mind, kindly write an expression that recognizes "a^nb^nc^n", where ^n means n repetitions of the previous letter.
9. Re:A language in their own right. by eraserewind · 2005-03-22 12:38 · Score: 1
  
  The ones in perl are apparently (I don't know it as a fact, I just read it in this thread somewhere).
10. Re:A language in their own right. by khrtt · 2005-03-22 12:48 · Score: 1
  
  Appearently, a wikipedia page says that perl RE are Turing complete. I can't find a proof. I can't figure out a way to do a^nb^nc^n in perl RE either, which is an indication that it could well be a mistake in wikipedia.
11. Re:A language in their own right. by Anonymous Coward · 2005-03-22 13:07 · Score: 0
  
  m/^(a*)(??{'b' x length $1})(??{'c' x length $1})$/ in perl
12. Re:A language in their own right. by Anonymous Coward · 2005-03-22 13:59 · Score: 0
  
  Hahahahahahahahaha!!! The original ones?!? Don't say!
  This is like my granpa saying "Back in the 30s, the original abacus was very very very close to Turing complete!" or "Granma was very very very close to being pregnant".
  
  No dude. The original ones are very very very close to quantum complete!
13. Re:A language in their own right. by UncleFluffy · 2005-03-22 17:15 · Score: 1
  
  Well, if it's Turing complete, then we're not talking about regular expressions any more, by definition ;-) More seriously, how would you write a palindrome detector?
  
  --
  What would Lemmy do?
14. Re:A language in their own right. by Dr.Ruud · 2005-03-22 20:39 · Score: 1
  
  a{n}b{n}c{n}
15. Re:A language in their own right. by Anonymous Coward · 2005-03-22 23:05 · Score: 0
  
  m/^(.*).?(??{reverse $1})$/'
16. Re:A language in their own right. by smoany · 2005-03-23 03:04 · Score: 1
  
  Yes, I'm sorry, I didn't make it clear enough. It isn't an expression, it's a set of expressions, {0^n1^n | where n is an integer, n>= 0}. thus, NULL, 01, 0011, 000111, 00001111 etc. This set of strings, or any set of strings, is called a language. One can make a Turing machine that accepts this language of all strings O^n 1^n, and no other strings, but one cannot create a DFA/NFA that does so. Regular Expressions can informally be seen as languages that a DFA/NFA can accept. If you wish to know more about Computability Theory, please read into Sipser: Introduction to the Theory of Computation http://www-math.mit.edu/~sipser/book.html Relevent information: Deterministic/Nondeterministic Finite Automata (DFA/NFA) PushDown Automata (PDA) Turing Machines (TM) Chompsky hierarchy of languages/grammars Hope this fills it out slightly -Dan
17. Re:A language in their own right. by khrtt · 2005-03-23 07:03 · Score: 1
  
  That doesn't work (tried it). What's n? According to perl syntax it's supposed to be an expression...
REGEX by null+etc. · 2005-03-22 07:54 · Score: 5, Funny

Another quibble revolves around some of the coding of the expressions. Nathan has made liberal use of the non-capturing groups (that is, (: expr )) to insure only the items that needed replacement were captured. While a worthy idea, in some cases the expression may have been simplified for understanding.
I'm not sure I understand what your quibble is - do you dislike the fact that he uses non-capturing groups, or the fact that he disposes of them at certain points?
Another issue is a slight error in searching for letters. In a number of expressions, Nathan uses [A-z] to capture all letters. Unfortunately, the special characters [, \, ], ^, _, and ` occur between upper-case Z and lower-case a, making it match too much. Either [[:alpha:]] or [A-Za-z] should have been used.
This seems like a relatively novice mistake, and I'm surprised it would show up in a book on regular expressions.
Despite these quibbles, Regular Expression Recipes does provide a useful compendium of solutions for common problems developers face. Presenting the information in a cookbook fashion, along with ensuring that those using something other than Perl don't have to sweat translating the expressions to their target language, makes this a handy book to have. I wouldn't hesitate to recommend it.
It's nice that he covers five environments for regular expressions. I'm sure everyone has heard of Mastering Regular Expressions, published by O'Reilly. The Perl Cookbook also does a good job at solving common problems with Regular expressions.
This is just my opinion, but I think what the world needs is a book on Regular Expression Design Patterns.
1. Re:REGEX by Anonymous Coward · 2005-03-22 08:19 · Score: 0
  
  This is just my opinion, but I think what the world needs is a book on Regular Expression Design Patterns.
  
  Jesus fuck, you left out all the key buzz words. Try: "Unleashing the Bible of Extreme Agile Regular Expression Design Anti-Patterns for Dummies in 24 Femtoseconds"?
2. Re:REGEX by Anonymous Coward · 2005-03-22 08:51 · Score: 0
  
  No, the parent post sounds right. For example, a regex design pattern for "matching nested closing delimeters to opening delimeters" could apply to both data formats, such as HTML, or programming languages, whose opening and closing delimeters need to be parsed in order to rule out things such as escape sequences, which is a very difficult challenge in regex without parsing.
3. Re:REGEX by Anonymous Coward · 2005-03-22 09:54 · Score: 0
  
  What the fuck is wrong with you man? The first AC was OBVIOUSLY a joke. And your follow-up is ... what? Another joke? Weak at best.
  
  You, sir, have been trolled.
4. Re:REGEX by gurnemanz · 2005-03-22 10:21 · Score: 1
  
  This is just my opinion, but I think what the world needs is a book on Regular Expression Design Patterns. How bout this, Tim -- Nanopatterns: Divorcing Devil from Detail with Smart Regex featuring Dust Pan Flea Flicker Army Ants Swarm Factory
5. Re:REGEX by Anonymous Coward · 2005-03-23 18:17 · Score: 0
  
  haven't YOU be trolled?
6. Re:regex by jcuervo · 2005-03-26 05:40 · Score: 1
  
  slash left-paren question-mark colon backslash dot pipe carat right-paren left-paren left-bracket carat backslash dot right-bracket plus-sign backslash dot left-bracket carat backslash dot right-bracket plus-sign right-paren dollar-sign slash.
  
  In your face, lameness filter!
  
  Btw, maybe I like to do things the hard way (or maybe I'm just lazy), but I'd use split() on "." for that, and just shift the array until it gets to size 2.
  
  --
  Assume I was drunk when I posted this.
Unacceptable mistakes by gniv · 2005-03-22 07:55 · Score: 5, Interesting

In a number of expressions, Nathan uses [A-z] to capture all letters.

How can this be a good book when it makes such mistakes? If this book is for beginners (as it seems) the editing process should have been much better.
1. Re:Unacceptable mistakes by tehshen · 2005-03-22 08:10 · Score: 2, Insightful
  
  [A-z] accepts all characters from A to z, including [ \ ] ^ _ and `. You want [A-Za-z] or \w (latter for 'not punctuation').
  
  --
  Guy asked me for a quarter for a cup of coffee. So I bit him.
2. Re:Unacceptable mistakes by hankwang · 2005-03-22 08:12 · Score: 1
  
  Why is [A-z] wrong, and what's the correct way to do it?
  $ echo '^' | grep '[A-z]' # wrong ^ $ echo '^' | grep '[A-Za-z]' # correct $ _
  In the ascii code table, the uppercase letters A-Z are followed by a number of special symbols, then followed by the lowercase letters a-z. The pattern [A-z] matches all characters that are between A and z in the ascii table, including those symbols, which is usually not what you want.
  
  --
  Avantslash: low-bandwidth mobile slashdot.
3. Re:Unacceptable mistakes by BinLadenMyHero · 2005-03-22 08:13 · Score: 1
  
  There are other chars between 'Z' and 'a'.
  The correct way is '[A-Za-z]'.
4. Re:Unacceptable mistakes by roman_mir · 2005-03-22 08:13 · Score: 1
  
  [a-zA-Z] - this is the correct way to do it.
  
  BTW. regular expressions present a complete Turing machine. [A-z] is wrong due to implementation of the expressions engine. They are most likely implemented in a way, that uses character 'A' as x41. Since 'Z' is x5A and 'a' is x61 there is a gap in there that would include a bunch of other characters.
  
  --
  You can't handle the truth.
5. Re:Unacceptable mistakes by hattmoward · 2005-03-22 08:14 · Score: 2, Informative
  
  \w is [A-Za-z0-9_]. The reviewer mentions use of the POSIX character class [[:alpha:]], which is more in line with what you want, and will (is supposed to) match alpha characters in non-ASCII character sets.
6. Re:Unacceptable mistakes by khrtt · 2005-03-22 08:14 · Score: 1
  
  Why is [A-z] wrong
  
  Because there are some characters between the letters Z and a in ASCII.
  
  what's the correct way to do it?
  
  [A-Za-z] - for us-ascii, or
  [:alpha:] - for other charsets, if your system supports it.
7. Re:Unacceptable mistakes by tehshen · 2005-03-22 08:18 · Score: 1
  
  I didn't know about [[:alpha:]], thanks. \w varies between each implementation, apparently - this screenshot shows it matching foreign characters with accents and stuff.
  
  Though I would use [A-Za-z0-9_] just to be on the safe side.
  
  --
  Guy asked me for a quarter for a cup of coffee. So I bit him.
8. Re:Unacceptable mistakes by Kiryat+Malachi · 2005-03-22 08:23 · Score: 1
  
  *Technically*, [A-z] does capture all letters. It does not, however, capture *only* letters.
  
  (Just to be pedantic.)
  
  --
  
  ---
  Mod me down, you fucking twits. Go ahead. I dare you.
  (I read with sigs off.)
9. Re:Unacceptable mistakes by lgw · 2005-03-22 08:29 · Score: 1
  
  As someone who's had the misfortune to work with EBCDIC, I'd point out that [[:alpha:]] is the only cross-platform answer, otherwise you can get special characters even in [A-Z], and you probably want non-ASCII alphabetics in any case.
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
10. Re:Unacceptable mistakes by Speare · 2005-03-22 08:36 · Score: 2, Informative
  
  No, [A-z] does not capture all letters. For example, "Å" and "é" are not usually included in the class [A-z], but it is often a part of the class \w.
  
  --
  [ .sig file not found ]
11. Re:Unacceptable mistakes by Kiryat+Malachi · 2005-03-22 08:47 · Score: 1
  
  I don't consider those letters, you damn foreign devil.
  
  (I kid, I kid.)
  
  --
  
  ---
  Mod me down, you fucking twits. Go ahead. I dare you.
  (I read with sigs off.)
12. Re:Unacceptable mistakes by LordoftheWoods · 2005-03-22 08:54 · Score: 2, Informative
  
  the uppercase letters A-Z are followed by a number of special symbols,
  
  Indeed. If anyone is interested in why ASCII sticks a few characters in there, it's because it allows you to flip a bit to switch between cases.
13. Re:Unacceptable mistakes by slim · 2005-03-22 09:13 · Score: 3, Interesting
  
  BTW. regular expressions present a complete Turing machine.
  
  Actually no: regular expressions are a great example of a language which is not Turing complete, but is useful nonetheless.
  
  The classic limitation of regexes is that you can't use them to parse arbitrarily nested brackets -- because there is no concept of a stack. A Turing machine would be able to do this.
  
  (Researching this post [yes! researching!] I found a couple of mailing list posts from various peoplel suggesting that Perl regexes are Turing complete. If this is true [which I have not established], it's because Perl extends the concept of REs in various ways)
14. Re:Unacceptable mistakes by hackstraw · 2005-03-22 09:15 · Score: 1
  
  In a number of expressions, Nathan uses [A-z] to capture all letters.
  
  How can this be a good book when it makes such mistakes? If this book is for beginners (as it seems) the editing process should have been much better.
  
  I guess the author was a novice and used Word to type the book. Word is notorious of automiscorrecting technical documents.
15. Re:Unacceptable mistakes by Anonymous Coward · 2005-03-22 09:40 · Score: 0
  
  Uh? It gives
  $ echo '^' | grep '[A-z]' $ echo '^' | grep '[A-Za-z]' $ _
  You forgot to start with
  setenv LC_ALL C
16. Re:Unacceptable mistakes by roman_mir · 2005-03-22 09:41 · Score: 1
  
  oops, you are right, I should have said Perl 5.8 regexp are Turing complete, that's why I said it was interesting.
  
  --
  You can't handle the truth.
17. Re:Unacceptable mistakes by hankwang · 2005-03-22 10:16 · Score: 1
  
  You forgot to start with setenv LC_ALL C
  I can't reproduce this with various combinations of LC_CTYPE and LC_COLLATE (my default is ctype=en_US, collate=C). Anyway, the fact that the result depends on system-wide settings makes the error even worse.
  
  --
  Avantslash: low-bandwidth mobile slashdot.
18. Re:Unacceptable mistakes by dubious9 · 2005-03-22 10:25 · Score: 1
  
  Here's an explaination to those of you who are wondering why the you would put puncutation between the upper and lower case letters.
  
  Regular expressions operate on the ascii table. Thus the letters are encoded as numbers (duh). Anyway, they have it so that the upper and lower case letters are exactly 0x20 or 32 away. This is to be able to flip one bit to upper or lower the case letters so that it's a very efficient operation.
  
  A = 0x41 = 0100 0001
  a = 0x61 = 0110 0001
  
  In pseudo code A xor 0x20 -> a.(for those that know what xor is)
  
  I relize that explaining this in an article about regexs may not find any readers who didn't already know it. But hey, I though it was cool when I learned it.
  
  --
  Why, o why must the sky fall when I've learned to fly?
19. Re:Unacceptable mistakes by dubious9 · 2005-03-22 10:29 · Score: 1
  
  Doh. I knew I should have read a few more posts. I should have known somebody else would've explained it earlier, (albiet much less verbose) than I.
  
  --
  Why, o why must the sky fall when I've learned to fly?
20. Re:Unacceptable mistakes by brpr · 2005-03-22 11:37 · Score: 1
  
  (Researching this post [yes! researching!] I found a couple of mailing list posts from various peoplel suggesting that Perl regexes are Turing complete. If this is true [which I have not established], it's because Perl extends the concept of REs in various ways)
  If Perl "regular expressions" were Turing complete, it would not be possible to parse them (parsing would reduce to the halting problem). They may well be more powerful than real regular expressions, but they are probably less powerful than CFGs.
  
  --
  Freedom is not increased by mere diminuation of government. Anarchy is freedom for the strong and slavery for the weak.
21. Re:Unacceptable mistakes by LnxAddct · 2005-03-22 11:48 · Score: 1
  
  What?! Just because something is turing complete doesn't mean you can't parse it. Perl's enhanced regular expressions *are* turing complete, as is perl and perl is parsed. The only thing you can't determine from a turing complete program is if it will ever halt, thus the halting problem. The halting problem also only applies to special cases (granted its an infinite number), but in many programs of turing complete languages you *can* determine if it will ever stop. The halting problem simply states that for any turing complete language, it is possible to produce a program that can not be determined to halt or not, it doesn't say that every turing complete program can't be decided.
  Regards,
  Steve
22. Re:Unacceptable mistakes by brpr · 2005-03-22 12:19 · Score: 1
  
  Sorry, I didn't make myself clear. Of course the syntax of the Perl regexp could be parsed, but the language described by the Perl regexp could not be parsed, in the general case.
  Either Perl regexps are not Turing complete, or Larry Wall has solved the halting problem.
  And on the more practical side, Perl regexps are matched with strings efficiently. This would not be possible if they were (much) more powerful than CFGs, which are definitely not Turing complete.
  
  --
  Freedom is not increased by mere diminuation of government. Anarchy is freedom for the strong and slavery for the weak.
23. Re:Unacceptable mistakes by brpr · 2005-03-22 12:23 · Score: 1
  
  As a challenge, can a Perl regexp match a string of the following form?
  A^n B^n C^n (i.e. X number of As followed by X number of Bs followed by X number of Cs, for any single value of X).
  If not, Perl regexps can't be more powerful than CFGs, and are consequentially not Turing complete.
  
  --
  Freedom is not increased by mere diminuation of government. Anarchy is freedom for the strong and slavery for the weak.
24. Re:Unacceptable mistakes by Anonymous Coward · 2005-03-22 13:03 · Score: 0
  
  m/^(A*)(??{'B' x length $1})(??{'C' x length $1})$/'
25. Re:Unacceptable mistakes by brpr · 2005-03-22 13:10 · Score: 1
  
  OK, so there are tricks which allow counting. Try this one. A sequence of As and Bs, such that the first half of the sequence is the reverse of the second half. Again, if Perl regexps can't do this, they're not even as powerful as CFGs.
  (Note that I'm just trying to show that Perl regexps are less powerful than CFGs here. Whether or not this is the case, it is clearly not the case that they're Turing complete. I thought it might be more convincing to show that they're nowhere near being Turing complete).
  
  --
  Freedom is not increased by mere diminuation of government. Anarchy is freedom for the strong and slavery for the weak.
26. Re:Unacceptable mistakes by Daniel · 2005-03-22 14:07 · Score: 1
  
  Either Perl regexps are not Turing complete, or Larry Wall has solved the halting problem. ...or the Perl regexp matcher sometimes goes into infinite loops. (NB: I don't know which of these three is actually the case, except that it's not the second one ;-) )
  
  Daniel
  
  --
  Hurry up and jump on the individualist bandwagon!
27. Re:Unacceptable mistakes by Anonymous Coward · 2005-03-22 20:36 · Score: 0
  
  If we allow a halting-deciding Turing-machine to give the output "yes", "no", "don't know" (but we require that it never gives the answer "yes" if the machine doesn't halt or "no" if it does in fact halt), trivially, for every Turing machine there exists a halting-deciding Turing-machine that decides (ie returnes "yes" or "no") that particular machine. Given a finite number of halting-deciding Turing-machines it is trivial to merge them into another that can decide all the machines that the merged machines decides. However, this machine is still subject to Turing's theorem on the halting problem, so this shows that there exists no finite collection of halting-deciding Turing-machines that decides *all* Turing-machines.
28. Re:Unacceptable mistakes by Anonymous Coward · 2005-03-22 23:01 · Score: 0
  
  That's easy:
  m/^((a*b*)*)(??{reverse $1})$/
  The point is that since perl regexps can include arbitrary perl code so they are indeed Turing complete.
29. Re:Unacceptable mistakes by Anonymous Coward · 2005-03-22 23:09 · Score: 0
  
  or perhaps
  m/^([ab]*)[ab]?(??{reverse $1})$/
  depending on the definition.
Minor variations by pocari · 2005-03-22 07:55 · Score: 5, Funny

However, I did feel a little cheated by the fact that several chapters covered essentially the same task, with only minor variations.
I can relate. I have cookbooks for food that have all these recipes that are nothing but flour, butter, eggs, and sugar. Do we need all these recipes for pancakes, cupcakes, cookies, crepes, waffles, popovers, bread, quick bread, bread sticks? Won't people figure out eventually to put a little less sugar in waffles with savory ingredients?
Japanese cookbooks are even worse. Soy sauce, sake, mirin...boooooooring!
1. Re:Minor variations by null+etc. · 2005-03-22 07:58 · Score: 1
  
  I can relate. I have cookbooks for food that have all these recipes that are nothing but flour, butter, eggs, and sugar. Do we need all these recipes for pancakes, cupcakes, cookies, crepes, waffles, popovers, bread, quick bread, bread sticks?
  If you think there's only a minor variation between cookies and bread, let me adopt you. You'll be the easiest kid ever to take care of.
  Yum, peanut butter and jelly cookies'mich.
2. Re:Minor variations by brayniac · 2005-03-22 08:08 · Score: 1
  
  this is a poor analogy.you're saying that using the same ingredients for different tasks is the same as using the same basic ingredients for very similar tasks. i don't need my metaphorical cookbook to have 5 recipies for rye bread.
3. Re:Minor variations by winkydink · 2005-03-22 09:00 · Score: 1
  
  1) commercial yeast technique
  2) sourdough starter technique
  3) poolish technique
  4) pumpernickel
  5) ok, I can only think of 4 offhand :)
  
  --
  "I'd rather be a lightning rod than a seismometer." -Ken Kesey
4. Re:Minor variations by Gulthek · 2005-03-22 09:12 · Score: 1
  
  Mmm...sourdough starter.
5. Re:Minor variations by WiFiBro · 2005-03-22 11:45 · Score: 1
  
  in ireland they have soda bread :P
  would that work on rye?
I personally... by BlueCodeWarrior · 2005-03-22 07:55 · Score: 5, Informative

...use 'Mastering Regular Expressions . It's a good book on the topic as well.
1. Re:I personally... by Michael_Burton · 2005-03-22 10:12 · Score: 1
  
  My own testimonial: I struggled with regular expressions for a long time before I read Friedl's book. I could handle the basics, but I never felt entirely confident that my expressions would work as intended. Since reading Mastering Regular Expressions, I've used regular expressions effectively and with confidence. I still need to look things up from time to time, but what once seemed esoteric now seems familiar. I know where the dark corners are, and I know where to find the light to guide me in those corners.
  
  Reading Mastering Regular Expressions was one of the most satisfactory computer learning investments I've made. Highly recommended.
  
  --
  When all you have is an axe, everything looks like a grindstone.
2. Re:I personally... by Bryson · 2005-03-22 10:30 · Score: 3, Informative
  
  > use 'Mastering Regular Expressions . It's a good book on the topic as well.
  
  I'm one of the few people who doesn't like Friedl's /Mastering
  Regular Expressions/. (I have the first edition.)
  
  First, he says that extended regexp engines, such as Perl's, use
  nondeterministic finite automata (NFA). Not true; NFA's can
  accept exactly the same languages as DFA's (deterministic finite
  automata). The extended regexps use search-and-backtrack
  engines.
  
  Friedl gives some examples of (extended) regexps that have
  catastrophic worst-case behavior, but doesn't present a
  systematic method for recognizing or avoiding them. The naive
  use of extended regexps, mostly by people who think they have
  mastered them, is setting us up for denial-of-service attacks
  based on the worst-case complexity of regular expressions.
  
  Formal regular expressions are exactly the languages DFA's and
  NFA's can accept. A DFA can parse any string in time
  proportional to the length of the string. Compiling the DFA may
  be exponential time, and space, but at least we find out at
  compile time, not when some attacker figures out a case we
  missed.
3. Re:I personally... by DrEasy · 2005-03-22 11:11 · Score: 1
  
  Where are my mod points when I need them? Thanks for a very informative posts!
  
  What's the worst time performance of matching a regex with a DFA though? Is it polynomial at least?
  
  --
  "In our tactical decisions, we are operating contrary to our strategic interest."
4. Re:I personally... by jadavis · 2005-03-22 11:24 · Score: 1
  
  A DFA can parse any string in time
  proportional to the length of the string.
  
  Not only that, they require only a finite amount of memory to parse an infinite string.
  
  --
  Social scientists are inspired by theories; scientists are humbled by facts.
5. Re:I personally... by nycbicyclist · 2005-03-22 12:35 · Score: 1
  
  Is there a reference you would recommend for a beginner?
add this book to your list by yagu · 2005-03-22 07:55 · Score: 3, Informative

While I can't vouch for the quality of the reviewed book,if you want something definitive on regular expressions, Mastering Regular Expressions, Second Edition by Jeffrey E. F. Friedl is an absolute must for your professional library. Jeffrey breaks down and then builds back up what regular expressions are and how they work, and offers an entire matrix breakout of the slightly different implementations among the most common utilities (grep, sed, awk, perl...). Not to shill for amazon, but if you select the reviewed book, the "buy this book too, and you get this great price" deal actually includes the Mastering Regular Expressions, Second Edition. . Get 'em both, you won't be sorry.
1. Re:add this book to your list by Anonymous Coward · 2005-03-22 08:11 · Score: 0
  
  How much does one get for a referal such as yours?
2. Re:add this book to your list by yagu · 2005-03-22 08:21 · Score: 1
  
  I wish... (hope we're far enough to be out of the modding radar....). I actually have had a recent very bad experience with amazon.... so this took a bit of a swallow to recommend this way, but the "Mastering..." is SUCH a great book... I think any professional should have at LEAST "Mastering..." as part of their library (like I said in original post, can't vouch for that book... the general reviews I've seen lead me to think it isn't nearly as good).
two problems by EphemeralPhart · 2005-03-22 07:56 · Score: 4, Funny

Some people, when confronted with a problem, think ``I know, I'll use regular expressions.'' Now they have two problems.

Jamie Zawinski
1. Re:two problems by GerritHoll · 2005-03-22 08:29 · Score: 1
  
  The original post can be found here
2. Re:two problems by Anonymous Coward · 2005-03-22 08:35 · Score: 0
  
  And some people have their head up their ass, like on how to turn a profit - like with a nightclub.
3. Re:two problems by Anonymous Coward · 2005-03-22 09:45 · Score: 0
  
  Wait a minute, are you saying that one of the egotistical programming "rock stars" is an ego driven prick who doesn't really know it all?
  Say it aint so.
Include "$35 cover price" or $24 after discount by Anonymous Coward · 2005-03-22 07:58 · Score: 0

Why can't a book review for an available include the COVER PRICE ? /. editors should reject these reviews if they omit the cover price
1. Re:Include "$35 cover price" or $24 after discount by Anonymous Coward · 2005-03-22 08:26 · Score: 0
  
  Oh noes, I'm too stupid to look it up on amazon or click the "make money for /. at B&N" link!
Alternatively, check out Textpipe by Anonymous Coward · 2005-03-22 07:58 · Score: 0

from http://datamystic.com/

it has easy patterns:

http://datamystic.com/easypatterns.html

I used easy patterns in a project and the language is like an extra layer on top of regex making it simpler. Maybe the proprietary nature of easy patterns isn't great but there are some free tools that do conversions into Perl patterns.
I need something easier by Anonymous Coward · 2005-03-22 08:03 · Score: 0

Every now and then, (like once or twice a year) I can benefit from using regular expressions. It isn't worth my while to spend a lot of time learning the really arcane stuff that I need to know to use them. It's usually easier to find another way around the problem.

On the other hand, if someone produced a tool that can take any idiot (me for instance) through a step by step process that doesn't require a lot of prior knowledge and gets the job done; then I'd get really excited. For sure, I won't be reading this book; the effort will never repay itself.
Linda Richmond says... by Anonymous Coward · 2005-03-22 08:05 · Score: 5, Funny

I'm feeling a bit verklempt!

Talk amongst yourselves!

Alright, I'll give you a tawpic:

"Regular Expressions are neither regular nor expressions."

Discuss.
1. Re:Linda Richmond says... by Anonymous Coward · 2005-03-22 08:31 · Score: 0
  
  Hilarious!
  
  BTW, it's "Linda Richman".
a Cookbook eh? by chiapetofborg · 2005-03-22 08:08 · Score: 3, Funny

Anyone have any good recipies for [cookies]+ ?
Quoth Zawinski by Stavr0 · 2005-03-22 08:08 · Score: 0, Redundant

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski alt.religion.emacs 1997/08/12
Regexes are overused by ryantate · 2005-03-22 08:14 · Score: 5, Informative

Anyone who drops in regularly on a Perl discussion forum (like perlmonks.org) knows that programmers tend to over-use regular expressions.

Regexes are actually a pretty poor way to extract information from comma-delimited or tab-delimited files, for example. By the time you're done dealing with escaped commas, escaped tabs, quoting characters (which many CSV and TDT exporters use in addition to commas and tabs), escaped quote characters, escaped newlines, and escaped escape chars, you end up with a super-complicated regex.

HTML is even more complicated. You have HTML comments and nested tags on top of everything else.

To validate a simple email address, Jeffrey Friedl in his Mastering Regular Expressions book for O'Reilly writes an *11-page* regex.

Most of the time the correct answer is not "here is a regex recipe" but rather "here is a simple library to do the job property with a parser", like Text::CSV or HTML::Parser in perl.
1. Re:Regexes are overused by stratjakt · 2005-03-22 08:20 · Score: 2, Informative
  
  Of course, the compiled regex will likely be faster than any parsing library you write. So it all depends what you're doing.
  
  For some sort of system that processes umpteen billion transactions per second, they can be a godsend. For parsing a .conf file once every six months when the machine is rebooted, it's a waste of time.
  
  It's all about knowing how and when to use the tool. A pneumatic nailgun can save a carpenter hours on a jobsite, but it's a waste of time to set it all up if you only need to knock in one nailhead that's popped through the drywall.
  
  --
  I don't need no instructions to know how to rock!!!!
2. Re:Regexes are overused by JoshRosenbaum · 2005-03-22 08:25 · Score: 1
  
  I was going to make this exact point myself as soon as I saw the words CSV/HTML/URLs. Most of these are things you should be doing with a proper module that parses the data.
  
  They could of course be useful for simple jobs, but on the whole, you'd be a lot smarter to future proof your work, and do it the right way the first time.
  
  If I had mod points, I'd give them to you.
  
  -- Josh
3. Re:Regexes are overused by Black+Perl · 2005-03-22 08:29 · Score: 3, Insightful
  
  Yes, exactly. Any good book on Regexes should have a chapter on when NOT to use them.
  
  I see many people trying to use regexes to do parsing, when they should be using a specialized parser.
  
  --
  bp
4. Re:Regexes are overused by ryantate · 2005-03-22 08:30 · Score: 1
  
  Very true. But I doubt someone who knows how to benchmark code and is handling thousands or more transactions per second is grabbing a regex recipe out of a book.
5. Re:Regexes are overused by smittyoneeach · 2005-03-22 08:31 · Score: 2, Informative
  
  Consider the boost libraries http://boost.org/.
  
  You get tokenizer, regex, and a parser library (spirit), in sorted by increasing caliber.
  
  It's all about the right tool for the job.
  
  --
  Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
6. Re:Regexes are overused by Anonymous Coward · 2005-03-22 08:37 · Score: 4, Funny
  
  > *11-page* regex.
  
  I think that's a sure sign of insanity. Or autism at the least.
7. Re:Regexes are overused by MikeBabcock · 2005-03-22 08:48 · Score: 1
  
  I agree -- many parsing jobs are much simpler doing basic character-at-a-time C code, especially validation.
  
  If you're searching for occasions of something or other in a long document, grep is obviously going to be an easy way (with regex's), but if you want to extract the hostname from a URI, just code it.
  
  --
  - Michael T. Babcock (Yes, I blog)
8. Re:Regexes are overused by Anonymous Coward · 2005-03-22 08:53 · Score: 0
  
  Best. Sig. Ever.
9. Re:Regexes are overused by Anonymous Coward · 2005-03-22 10:00 · Score: 0
  
  To me the regex is the "tool" equivalent of the void pointer, sometimes treating your data as an unstructured byte stream is exactly what you need - and sucks to the xml and asn.1 bigots who believe that structured data is delivered by the data stork. A regex is ridiculous overkill for actual structured data, but how else are you going to get information from emails, web pages, and other random text documents into a form that can be turned into structured data.
10. Re:Regexes are overused by Anonymous Coward · 2005-03-22 10:17 · Score: 0
  
  Of course, the compiled regex will likely be faster than any parsing library you write.
  
  Faster than lex+yacc? Those are pretty damn fast, you know.
11. Re:Regexes are overused by Alan+Shutko · 2005-03-22 11:15 · Score: 2, Insightful
  
  To validate a simple email address, Jeffrey Friedl in his Mastering Regular Expressions book for O'Reilly writes an *11-page* regex.
  
  That's not quite fair. That regex validates any RFC822 address, and the syntax allowed isn't simple. Validating things that are currently used is fairly easy, but there's a lot of historical baggage in RFC822 addressing.
12. Re:Regexes are overused by Anonymous Coward · 2005-03-22 12:51 · Score: 0
  
  To validate a simple email address, Jeffrey Friedl in his Mastering Regular Expressions book for O'Reilly writes an *11-page* regex.
  
  To be fair, validating an email address 100% correctly was done a while back in about 5000 lines of C code; it's not a trivial task.
  
  I agree though that regexp is often not the best choice, it depends on your goal. It's great if you want a quick parser which doesn't have to be 100% accurate though. For instance I wrote up 8 different regexps to capture the methods that had changed in a c file diff. Certainly not foolproof, but it worked for almost all diffs and the regexps were written in about 20 minutes.
13. Re:Regexes are overused by 2short · 2005-03-22 15:17 · Score: 1
  
  "the compiled regex will likely be faster than any parsing library you write"
  
  If I write custom code to do a specific parsing job in the fastest way possible, it will be slower than using generic code to do it in one particular way? I must not be understanding what you're saying.
14. Re:Regexes are overused by 2short · 2005-03-22 15:30 · Score: 3, Insightful
  
  "an *11-page* regex."
  
  That's insane. My feelings on Regexes were set early in my career. I discovered them, and like many started using them everywhere. Then in a code review, my boss pointed to one particularly complex one and said "See, there's why you shouldn't try to do such complex things with regular expressions, this one has a bug" "Where?" says me. "Let's leave that as an exercise for the student. Come ask me if you can't figure it out in an hour or so." Well, I certainly wasn't going to admit defeat, even though it took me several hours to find the rather subtle problem. So I went back and demanded to know how he had spotted it so fast. And he said "I didn't. It was a regex 3 lines long. It had to have a bug."
15. Re:Regexes are overused by kris_lang · 2005-03-23 04:00 · Score: 1
  
  Oh I so sincerely agree. The hardest thing to do as a consultant is to tell the client to NOT do something, or NOT use a particular toy, library, package, etc... especially if they've brought you in as a consultant to bolster their own plans rather than actually investigate and give honest consultation.
F*ck this book and all others like it: by stratjakt · 2005-03-22 08:16 · Score: 1, Informative

All you need is regexlib.com and a copy of Regulator (I believe thats the free as in beer one) that will break out a regex into english steps like "capture (" "capture 3 or more 0's", and so on.. .NET has a regex facility that's slicker than greased pigeon shit, so I've been making heavy use of it lately.

--
I don't need no instructions to know how to rock!!!!
1. Re:F*ck this book and all others like it: by yahyamf · 2005-03-22 08:24 · Score: 1, Interesting
  
  .Net regular expressions can parse from right to left as well. Very useful sometimes
2. Re:F*ck this book and all others like it: by Anonymous Coward · 2005-03-22 08:31 · Score: 0, Flamebait
  
  Yeah, mod me down as a troll, don't even READ my comment. Or maybe I was modded down for praising something in .NET.
  
  Whatever.
  
  You dumb slashbot fucks have no idea what a regex is or where it's used and wouldn't know one if it was right in front of you. You probably know it's sort of a linux thing so thats good.
  
  Sycophants and asshats, monkeys who crawl around above my office trying to figure out which wire the rats chewed through. Know-nothing idiots who ask me to unplug my cablemodem and plug it back in when I call to report that a big rig just rolled by and yanked the whole goddamned wiring bundle out of the side of my house.
  
  BUY THIS BOOK FOR 39.99!!! Because regexes are magic and you need to give /. bn referrer money, there's absolutely no way to figure it out for free, nor is there an online library that already has dozens of examples for whatever you might need.
  
  Fuck you and your iPods. All those white earbuds do is help me pick out the clueless wannabes. No true geek would own one.
3. Re:F*ck this book and all others like it: by winkydink · 2005-03-22 09:04 · Score: 1
  
  slicker than greased pigeon shit
  I somehow think that a lot of /.'ers will find an analogy of .NET to pigeon shit as quite apropos. :)
  
  --
  "I'd rather be a lightning rod than a seismometer." -Ken Kesey
4. Re:F*ck this book and all others like it: by winkydink · 2005-03-22 09:08 · Score: 1, Funny
  
  Yeah, mod me down as a troll, don't even READ my comment. [...]
  
  You dumb slashbot fucks have no idea what a regex is [...]
  
  Sycophants and asshats, monkeys who crawl around above my office trying to figure out which wire the rats chewed through. Know-nothing idiots [...]
  
  Fuck you and your iPods. All those white earbuds do is help me pick out the clueless wannabes. No true geek would own one.
  
  Let me guess... you didn't finish the Dale Carnegie course, did you?
  
  --
  "I'd rather be a lightning rod than a seismometer." -Ken Kesey
5. Re:F*ck this book and all others like it: by east+coast · 2005-03-22 09:14 · Score: 1
  
  you didn't finish the Dale Carnegie course, did you?
  
  I got a laugh out of hsi/her comments, if that counts for anything.
  
  And I found the iPod comment very insightful...
  
  --
  Dedicated Cthulhu Cultist since 4523 BC.
6. Re:F*ck this book and all others like it: by Anonymous Coward · 2005-03-22 09:15 · Score: 0
  
  You're right, most have absolutely no clue, and think their job doing ISP tech support via phone, or data entry in a punchcard mill qualifies them as a computer geek.
7. Re:F*ck this book and all others like it: by Anonymous Coward · 2005-03-22 14:40 · Score: 0
  
  It doesn't surprise me that you are using greased pigeon shit heavily. It shows in your post.
jeez by roman_mir · 2005-03-22 08:17 · Score: 0, Redundant

way to ask a question that would certainly cause at least posts to be moderated as 'Redundant'!

--
You can't handle the truth.
1. Re:jeez by roman_mir · 2005-03-22 08:21 · Score: 0, Redundant
  
  way to ask a question that would certainly cause at least 10 posts to be moderated as 'Redundant'!
  
  Man, I wish there was a way to edit a submitted comment.
  
  --
  You can't handle the truth.
2. Re:jeez by Surt · 2005-03-22 08:43 · Score: 1
  
  That would totally change the nature of slashdot. Think about what would happen to arguments if you could go back and make little corrections to your logic/premises. You'd be able to make your responders look like fools.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
3. Re:jeez by roman_mir · 2005-03-22 08:51 · Score: 1
  
  well, I think it should still be possible to edit your comment within say an hour of posting IF noone replied to you yet.
  
  --
  You can't handle the truth.
Regex Coach helps building Regexp by uss_valiant · 2005-03-22 08:18 · Score: 5, Informative

Regex Coach

This program assists you building regular expressions. I've never used it (real men code regexp at once and it works). But some friends recommend it.
1. Re:Regex Coach helps building Regexp by GodLived · 2005-03-22 08:34 · Score: 1
  
  Don't you mean, Regex Crutch?
  
  IMHO, if you're using UNIX, you gotta learn some basic regexes backward and forward. They are used in vi, grep, stream editing, and many other text utilities. If you don't innately know some basic constructs, you will be forever asking your cubicle mate - or worse, like my cube mate, using VI and clacking out "j-j-cw-foo-enter", "j-j-cw-foo-enter", "j-j-cw-foo-enter", "j-j-cw-foo-enter", ... With a tool like Regex Coach, it would make you dependent on the tool.
2. Re:Regex Coach helps building Regexp by DigitalDeviation · 2005-03-22 08:56 · Score: 2, Informative
  
  Regex Coach is nice for those long regexs that you may have missed an escape somewhere. I write most regexs myself, but I'm no guru at it. Regex Coach is a nice verification that the regex works (particularly for extracting something from a large string).
3. Re:Regex Coach helps building Regexp by Gwar9999 · 2005-03-22 10:34 · Score: 1
  
  You should also consider Kodos at http://kodos.sourceforge.net
Common RegExp Mistake by samspot · 2005-03-22 08:20 · Score: 1

It's \. not /. =P
1. Re:Common RegExp Mistake by michaeldot · 2005-03-22 12:11 · Score: 1
  
  It's \. not /. =P
  
  Sloshdot not Slashdot.
phew! by Anonymous Coward · 2005-03-22 08:20 · Score: 0

I can see a lot of mod points wasted here to mark all these comments (but the first) redundant.
The problem is, you load the page, read, and by the time you reply there are already others that replied the same thing.
1. Re:phew! by Anonymous Coward · 2005-03-22 13:52 · Score: 0
  
  Ahh, but moderation is not simply a method of punishing poor posters. It also helps make Slashdot more interesting to read.
Now they have two problems by GerritHoll · 2005-03-22 08:27 · Score: 0, Redundant

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. (source)
ambiguous use of "they/them" by pomakis · 2005-03-22 08:28 · Score: 1

The vi-style regular-expression substitution technique might help: :-)

"If you spend time working writing applications that have to do pattern matches and/or replacements, you know about some of the intricacies of $regular expressions$. For $many people$ \1 can be an arcane hodgepodge of odd characters that somehow manage to do wonderful things, but \2 don't have enough time (or interest) to really understand how to code \1."
Different flavors? by dpbsmith · 2005-03-22 08:30 · Score: 3, Informative

In an average month, I use regular expressions as implemented in Microsoft Visual C++ 6.0, BBEdit Lite, TextWrangler, Apple MPW, and REALBasic. Every single one of them has _significant_ differences in syntax and semantics.

My understanding is that even the UNIX world sports several different flavors of regular expression in grep, egrep, fgrep, etc.

The biggest barrier to _my_ use of regular expressions is that every time I switch from one regular expression context to another, it takes me a good half hour to refresh my memory of what does and doesn't work in each environment.

--
"How to Do Nothing," kids activities, back in print!
1. Re:Different flavors? by wk633 · 2005-03-22 08:46 · Score: 1
  
  My understanding is that even the UNIX world sports several different flavors of regular expression in grep, egrep, fgrep, etc.
  
  Er, well, not exactly. grep, extended (egrep) and fixed (fgrep) allow for different feature/speed tradeoffs, but they are consistent in their use of regular expressions. Where you will find differences is between the regex syntax of vi, perl, sed, grep, etc.
  
  After ten+ years, I still consult a reference for all the escape codes and such. Used to be a book, now it's google.
2. Re:Different flavors? by DigitalDeviation · 2005-03-22 09:00 · Score: 1
  
  After ten+ years, I still consult a reference for all the escape codes and such. Used to be a book, now it's google Right there with you. At least on the Google part... People often try to amaze one another by how much they "know by heart". I'd rather RTFM and see how it's done correctly...
3. Re:Different flavors? by hankwang · 2005-03-22 10:24 · Score: 1
  
  Er, well, not exactly. grep, extended (egrep) and fixed (fgrep) allow for different feature/speed tradeoffs, but they are consistent in their use of regular expressions. Where you will find differences is between the regex syntax of vi, perl, sed, grep, etc.
  What I don't like about the grep/egrep-style REs is that it is internally inconsistent. Some characters are taken to be literal if you place a backslash in front of them, other ones become active with a backslash. Perl is much simpler: a backslash always turns a non-letter into a literal, whether it was literal or not. I'm always fighting with the conventions in grep and emacs. Argh. Maybe I should find a perl-based grep and a perl-re emacs module or something.
  
  --
  Avantslash: low-bandwidth mobile slashdot.
HTML, XML, CSV, but why? by AGTiny · 2005-03-22 08:33 · Score: 3, Interesting

Of course everyone should know how to build a regex, but why take time discussing how to parse common formats such as HTML, XML, CSV, and so on? Every language likely has a good standard module/library/package that does it all for you, hopefully in the most efficient way, and gives you an easy API. I write Perl, and have used XML::*, HTML::*, DBD::CSV, Text::CSV, the list goes on. No need to write a single regex there. Another good set of modules is Regexp::Common, giving you correct regexes for parsing semi-hard things like IP addresses, MAC addresses, phone numbers, etc.
1. Re:HTML, XML, CSV, but why? by HumanTorch · 2005-03-22 18:35 · Score: 1
  
  It's amazing how 'grey' the rules seem, yet in practise work very well. One of the closest ways we have to representing how a human being goes about picks out patterns IMO. I wonder how fuzzy regexps would work.
Regexen that parse from right to left by jonadab · 2005-03-22 08:38 · Score: 1

> .Net regular expressions can parse from right to left as well.
> Very useful sometimes

Yeah, especially for parsing Hebrew text. HTH.HAND.

--
Cut that out, or I will ship you to Norilsk in a box.
Darn by SleepyHappyDoc · 2005-03-22 08:44 · Score: 1

I was hoping for an innovatively written cookbook for geeks (shell scripts to describe how to make a white sauce, that kinda thing). That would have made a fantastic gag gift.

--
Stasis is death. Embrace change.
1. Re:Darn by Anonymous Coward · 2005-03-22 12:35 · Score: 0
  
  (shell scripts to describe how to make a white sauce, that kinda thing)
  You mean like...
  #!/bin/sh /usr/local/bin/gimp ~/pr0n/pr0npic01.png
Great 95 more regex's to rm by my_haz · 2005-03-22 08:46 · Score: 0

I don't think im alone in saying (having spent plenty of time on freenode #sed) that of the many regex's i have had to formulate only about 5% of them are really reuseable. Most of the time its "get the some info in file X to to File Y" or make odd file X pretty. So i could bye this book, but then i would have 95 more examples of regex's to toss out.
Free Alternative by MudButt · 2005-03-22 08:48 · Score: 4, Informative

This is free... And interactive...
http://www.regexlib.com/
Try them out by DavidNWelton · 2005-03-22 08:52 · Score: 5, Insightful

Sometimes, with complex regexp's, it's handy to be able to build them incrementally. I know it's just one of many, but I wrote a little tool that's handy for this. It's called regexpviewer, and it's available here:

http://www.dedasys.com/freesoftware/applications.h tml

Perhaps other people can recommend other tools they've found useful for learning/building regular expressions.

--
http://www.welton.it/davidw/
1. Re:Try them out by Gwar9999 · 2005-03-22 10:32 · Score: 1
  
  Check out Kodos at http://kodos.sourceforge.net
2. Re:Try them out by c_ollier · 2005-03-22 10:48 · Score: 3, Interesting
  
  The Regulator is a nice Open Source tool, but Windows only. It integrates expressions from RegExLib.com, and has syntax highlighting & brace matching.
3. Re:Try them out by Anonymous Coward · 2005-03-22 12:05 · Score: 0
  
  There's always kregexpeditor. I have used it occasionally for some of the larger expressions I've made. It is useful, as you say, for building them incrementally.
  -SNS
4. Re:Try them out by nycbicyclist · 2005-03-22 12:25 · Score: 1
  
  Although I haven't actually used it yet, I've been meaning to check out the program redet, which, unlike most regex tools, isn't tied to a particular language, but can mimic the regex flavor of a variety of programs (e.g., emacs and grep).
  
  http://www.cis.upenn.edu/~wjposer/redet.html
5. Re:Try them out by Anonymous Coward · 2005-03-22 14:30 · Score: 0
  
  This is the one I use:
  
  http://laurent.riesterer.free.fr/regexp/
About 279 pages too long by natoochtoniket · 2005-03-22 08:57 · Score: 4, Insightful

I have a huge, 1000+ page Betty Crocker cookbook which I hardly ever use. It gives detailed recipes for particular dishes, but nothing that helps me to just throw a dinner together. And nothing that helps me to create anything new.
My very favorite recipe book is a tiny little thing of about 40 pages. For each kind of meat and each kind of vegetable, it lists what spices and sauces go well with it, how long and how hot to cook it, and how to tell when it is done. There is a little section on how to make about a dozen differnet sauces. That's it.
A programming language has syntax and semantics. For regular expressions, Chomsky gave both fully in his original paper on the subject. The added conveniences that some utilities provide are all listed in their respective man pages. The entire subject, if it were collected together, should be about 10 pages. With some explanation of language theory, grammars, and such, the whole might be worth a chapter. Get out an undergraduate compiler-theory book (such as Aho/Sethi/Ullman). They have less than a chapter on regular expressions, and they cover the topic fairly well.
But, I suppose, there is a difference between a cookbook that is made for cooks to use as a reference, and a cookbook that is made for non-cooks to follow by rote. Learn how to cook. You will be surprised how seldom you actually refer to the 1000+ page cookbooks.
1. Re:About 279 pages too long by KateCrufi · 2005-03-22 09:30 · Score: 1
  
  The "Joy of Cooking" cookbook (1970's version is my preference, as it's more do-it-yourself and less you-must-have-these-precise-ingredients) is something of a compromise between the two cookbook styles you describe, and is a format I'd enjoy seeing replicated in other "cookbooks".
  
  While it has a basic recipe for a good 90% of common dishes/desserts (at least Western ones), it also has a list of ingredients, common ways of cooking these ingredients, sample menus, mentions of common pitfalls, and a wonderful index (that lists most things under multiple categories - you can find shortbread cookies under both shortbread and cookies, for instance [I think]). Many of the recipes come with both specific variations and guidelines for creating your own variations.
  
  I'd particularly be interested in seeing a book discussing languages in this format, with discussion of the strengths and weaknesses, common applications, and such. Oh, well.
2. Re:About 279 pages too long by ErikZ · 2005-03-22 09:39 · Score: 1
  
  Actually, I'd be very interesting in what your 40 page cookbook is called.
  
  --
  Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
3. Re:About 279 pages too long by ErikZ · 2005-03-22 09:45 · Score: 1
  
  (sigh) Interested.
  
  --
  Democrats or Republicans. They are both taking us to the same place and they are not afraid of us anymore.
4. Re:About 279 pages too long by mikeage · 2005-03-22 19:28 · Score: 1
  
  What's this 40 page cookbook called?
  
  --
  -- Is "Sig" copyrighted by www.sig.com?
ignorance is bliss by RelliK · 2005-03-22 09:04 · Score: 2, Informative

If you want more of a challenge, try writing a regular expression that find any tags along with anything in between using only greedy matching.
duh! Repeat after me: HTML is not a regular language. There is no regular expression that can match it. The problem arises when people try to use regular expressions without understanding what they are. But, as the saying goes, when the only tool you have is a hammer, everything looks like a nail...

--
___
If you think big enough, you'll never have to do it.
1. Re:ignorance is bliss by DeadSea · 2005-03-22 10:15 · Score: 2, Informative
  
  > duh! Repeat after me: HTML is not a regular language. There is no regular expression that can match it.
  Script tags cannot be nested which makes that portion of html able to be matched by a regular expression.
  --
  Currency conversion calculator
In one ear, out the other by sahonen · 2005-03-22 09:10 · Score: 1

Whenever I need to use some regex, I google for a regex reference and try to figure out how to do what I want to do. Then the next time I need to use regex, I have to do it again. I literally cannot hold regex in my head for more than a day or so.

--
Make me a friend and I'll mod you up
1. Re:In one ear, out the other by Anonymous Coward · 2005-03-22 15:55 · Score: 0
  
  I second that!!!
  and third...
Typical... by Saeed+al-Sahaf · 2005-03-22 09:11 · Score: 1

This seems to be typical for tech books: Way overpriced (although this one seems more reasonable), incredibly crappy binding, and less than aggressive proof reading.

--
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
1. Re:Typical... by nametaken · 2005-03-22 10:08 · Score: 1
  
  You may be right, but its my experience that tech books are surprisingly reasonable in price. My average textbook in any other field costs >$100, and my tech books are usually in the $40 range. Of course, this could just be that there's less demand for "Financial Accounting" than my PHP or Java books. Dunno, just my experience.
2. Re:Typical... by Saeed+al-Sahaf · 2005-03-22 12:04 · Score: 1
  
  My average textbook in any other field costs >$100, and my tech books are usually in the $40 range.
  Yes, and compared to a pound of platinum, this book is VERY affordable. But seriously...
  Basically tech books are overpriced, and textbooks are WAY overpriced.
  
  --
  "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
3. Re:Typical... by nametaken · 2005-03-22 12:34 · Score: 1
  
  tech books are overpriced, and textbooks are WAY overpriced.
  
  Ok, that I can agree with.
4. Re:Typical... by Monkelectric · 2005-03-22 17:20 · Score: 3, Insightful
  
  I had a brief skirmish with the tech book publishing industrty (and believe me thats the right word). The real problem is they pay authors BY THE PAGE so their incentive is to write flowery, lengthy language which conveys as little information in as much space as possible. This in turn justifies high book prices and higher author royalties.
  
  --
  Religion is a gateway psychosis. -- Dave Foley
check out regex coach if you want to learn by Anonymous Coward · 2005-03-22 09:12 · Score: 2, Interesting

I found this tool while doing my undergrad. Having this tool and playing with it showed me how to understand and how to sucessfully write regexs. 5 minutes of playing with it and you be enlightened.

http://www.weitz.de/regex-coach/
avoids 'read this book, it's only $50' syndrome by Anonymous Coward · 2005-03-22 09:19 · Score: 0

Let me correct your sentance

>I'm too stupid to

should be

"I'm too stupid to write a proper book review."
Re:Regexes How2 by softcoder · 2005-03-22 09:23 · Score: 5, Informative

In addition to a good book, or even INSTEAD of a good book, download and use THE REGEX COACH
http://www.weitz.de/regex-coach/

It is a very very nice interactive pgm that lets you debug REGEXES on the fly visually, by feeding them sample text.
BTW, thanks Stephen by boomgopher · 2005-03-22 09:28 · Score: 1

Your Syntax Highlighting library for Java rocks, thanks a million.

--
Your hybrid is not saving the environment. Its purpose is to make you feel good about buying something.
1. Re:BTW, thanks Stephen by DeadSea · 2005-03-22 10:17 · Score: 1
  
  You are quite welcome. Are you using it for anything interesting that I can take a look at?
2. Re:BTW, thanks Stephen by boomgopher · 2005-03-22 13:17 · Score: 1
  
  You are quite welcome. Are you using it for anything interesting that I can take a look at?
  
  Yeah, I'll send a link in a month or so when it's ready, I'll email via your site.
  
  - boomgopher
  
  --
  Your hybrid is not saving the environment. Its purpose is to make you feel good about buying something.
Regular Expression Coach by s1234d · 2005-03-22 09:31 · Score: 0, Redundant

This free tool is great for helping you to write regular expressions: http://www.weitz.de/regex-coach/
Separating the men from the boys... by mnemotronic · 2005-03-22 09:37 · Score: 1

That's a phrase a co-worker once tossed out to differentiate regex wranglers from lowly code cowboys. The implication being that real programmers use REs. At that time in my life I knew a dozen or so programming languages, but had avoided learning REs. That little quip prompted me to start learning, first in AWK, then via Perl. Today, I'm proud to say that I can fumble my way around a regex pretty good. I'm still a little fuzzy when it comes to concepts like "negative look-ahead" and "positive look-behind", which sounds like what I'd be doing at the beach. And please don't ask me to do an improv. on the finer points of DFA vs. NFA, or php vs. python vs. vi.
I would like to share my regex religion with the other programmers where I work, but can't get our training department psyched-up enough to find someone to teach a class.

--
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
Yummy! by Anonymous Coward · 2005-03-22 09:56 · Score: 0

REs? YUMMY!

Life is like REs, you never know what you get... Great Fun!

Yes, I'm drunk. Uh, nevermind...
Use more than regular expressions by klui · 2005-03-22 10:10 · Score: 1

Rather than relying only on regular expressions, it would be beneficial to use regexps along with sed/(g)awk/perl. If the incantation that you use using regexps is obscure to you, how will the next guy who will support your stuff feel? Break up your uber regexp into a simpler combination of regexp(grep)/sed/awk combination.

With that, I almost always use anchoring via ^ or $.
Mastering Regular Expressions by photon317 · 2005-03-22 10:11 · Score: 1

No discussion of regex books is complete without mentioning the ebst one out there: O'Reilly's "Mastering Regular Expressions".

--
11*43+456^2
1. Re:Mastering Regular Expressions by webhat · 2005-03-23 00:06 · Score: 1
  
  s/ebst/best/ ;)
  
  --
  'I am become Shiva, destroyer of worlds'
Kodos - regex debugger by Gwar9999 · 2005-03-22 10:36 · Score: 1

If you need a good regex debugger, you should consider kodos. http://kodos.sourceforge.net
Two words... Snobol and Icon by Anonymous Coward · 2005-03-22 10:39 · Score: 0

Snobol is the granddaddy of pattern matching languages. Yes -- it's an old language, and it has an unorthodox syntax -- but regular expressions are also old and have an unorthodox syntax.

There are several different implementations out there, the easiest to deal with is csnobol. http://www.snobol4.org/ has a bunch of information on the language.

The other language, Icon, was developed from research done on Snobol. Icon provides a more modern syntax and flow control. While not as powerful in pure pattern matching as Snobol, the whole lanauge can be used when string scanning. http://www.cs.arizona.edu/icon has a bunch of information on the language. There is an object oriented version of Icon that is being developed, unicon. http://www.unicon.org/ has information.
Whatever, newbie. by Anonymous Coward · 2005-03-22 10:46 · Score: 0

That's the "classic" regular expressions, not the modern regular expressions accepted by PCRE, and Perl itself. In fact, Perl regular expressions are full Turing machines, with PCRE being a few steps behind that. So PCRE isn't really PCRE... it's P-likeCRE.
What a bunch of crap. You don't know squat about PERL do you?
Here's some advice, buddy: before you go spouting off about PERL, why don't you go read that book with the llama on the cover. Then you can come back here and tell us all you know about PERL.
1. Re:Whatever, newbie. by jargoone · 2005-03-22 13:49 · Score: 1
  
  Funny you should mention that. I was arguing with a guy one day at work about something Perl related. I searched google groups and found one of Randal's posts proving me right. He said, "Who the hell is that guy?". I pointed to my bookshelf. He said, "Oh." The discussion subsequently ended.
Generating regex by Muttonhead · 2005-03-22 11:11 · Score: 1

This is a handy little tool for generating regex:

http://txt2regex.sourceforge.net/
It's not that hard by cliveholloway · 2005-03-22 11:24 · Score: 1

Assuming, of course that your x?html is valid - at least in Perl anyway:

m|<script[^>]*>[^<]+</script>|is

cLive ;-)

--
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
oh crap by cliveholloway · 2005-03-22 11:31 · Score: 1

I see what you were saying now :)

OK, mine works if you never use < in your JS.

use if (3>$i) rather than if ($i<3)

:)

cLive ;-)

--
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
I can. by itomato · 2005-03-22 11:58 · Score: 1

If I were sitting down to hack away at something that cannot be done (sanely, correctly, speedily, etc) without regex, I would rather have a "Cookbook" with more than enough information than adding to my work and stress load by having to scrape up what I needed (or worse, maybe *not*) on the Interweb.

AmigaBASIC was the same way. I knew enough basic BASIC to pull off some things, but without a (thick) handy manual, it's just not as much fun.
Also interested in the name of the book by JurgenThor · 2005-03-22 13:38 · Score: 0

What is it?

--
GENERAL PUBLIC SIGNATURE (GPS) Any replies (derivatives) of this post must also use the GPS
:help pattern by digitect · 2005-03-22 14:23 · Score: 3, Informative

Of course, if you use the one true text editor, all you need to know about regular expressions is:

:help pattern

:)

--
There is no need to use a SlashDot sig for SEO...
1. Re::help pattern by Anonymous Coward · 2005-03-26 05:45 · Score: 0
  
  $my_day !~ /(?::help)/;
  $my_day =~ /(?:uphill|llihpu)/, /snow/, /naked/, /chased by wolves/;
[FD]UCK (Off|You) by Anonymous Coward · 2005-03-22 14:23 · Score: 0

Have you noticed how regex creeps into nerd talk, like a slightly nerdier version of phone text talk. All these nerds hanging around like the fon[zs]e at the [dj]uke box with their pocket protectors while sticking square brackets around their (letters|words) because they're more used to speaking regex like that to a screen than using the word "or" to another human being?
Re:Regexes How2 by belmolis · 2005-03-22 16:18 · Score: 1

There are quite a few regular expression tools available, with different capabilities and purposes. For the novice who doesn't want to learn more or doesn't have time, the best is probably txt2regex, which walks you through the construction of the regexp and generates output for 20 different programs and languages. It is one of the few tools that I know of that isn't specialized for a particular language or program. My own tool, Redet, provides an interface to 29 regular expression implementations. It is aimed at people who know something about regular expressions or are willing to spend some time learning but helps out by providing palettes showing the notation for each program and a history system, so that you can first construct the pieces of a complex regexp, then assemble them. It also has features aimed at providing a search environment that may be useful for people who need no help constructing their regular expressions.

regex-coach uses PERL-style regular expressions. Its particular virtue is that it can single-step through the match and show the parse tree, so it is useful if you want to understand the matching process in detail. Similar in that it helps to understand the implementation of regular expressions is re_graph, which given a regular expression draws the corresponding finite state automaton.

A couple of nice tools aimed at Python users are Kiki and Kodos.

These and some other tools and libraries are listed on this page.
regex by Anonymous Coward · 2005-03-22 18:38 · Score: 0

On the topic of regex's and off-topic of book reviews...
This should be an easy solution but...anyone see a regex that will always grab just the domain portion from the following:

anonymous.coward.org

coward.org

anonymous.slashdot.offtopic.posting.coward.org

$domain = $1; #should be coward.org for all above
previewing Emacs regexps by Anonymous Coward · 2005-03-22 20:18 · Score: 0

For Emacs users: M-x re-builder. It let's you test your Emacs regexps interactively.
Negative Lookahead by Anonymous Coward · 2005-03-23 01:47 · Score: 0

I'm still scratching my head trying to figure out how to exclude words from a regular expression that matches also....
go look up some computability theory by smoany · 2005-03-23 03:11 · Score: 1

1) they aren't close to Turing complete, CFG's are much closer, but still don't do it. 2) Enhanced reg exps? This is an implementation of a program that seems to function like a regular expression parser. "Enhanced Reg. Exp.'s" are not enhanced Reg. Exp.'s, they are a way of writing code similar to a regular expression that must be handled WITH A STACK FRAME OR TWO. Note that this isn't, in any sense of the word, a regular expression. This is analogous to trying to explain to a user the differnce between advanced user features and the underlying programming constructs. Enhanced reg. exp's are simply a nice user interface and a misuse of the term. Questions? ask? -Dan