Mastering Regular Expressions

← Back to Stories (view on slashdot.org)

Mastering Regular Expressions

Posted by ryuzaki0 on Wednesday September 13, 2006 @07:29AM from the programmers-handbook dept.

Simon P. Chappell writes "Classics are funny things, especially in the world of books. There are books that people say "should' be classics (I'll refrain from mentioning names to protect the pretentious) and then there are books that people are too busy actually using to get around to listing as classics. Mastering Regular Expressions, now in it's third edition, is in the second group. It's one of those books that you see on desks in computer departments the world over. This is a real "doers" book." Read the rest of Simon's review. Mastering Regular Expressions author Jeffrey E.F. Friedl pages 515 (31 page index) publisher O'Reilly rating 11 out of 10 reviewer Simon P. Chappell ISBN 0596528124 summary A classic of modern computer literature.

This is a book for programmers; managers, project managers and architects need not apply. If you talk about code instead of writing it and have teams of programmers report to you, then consider buying this book and giving it to them. If you're a technical lead or lead programmer, then shame on you if an earlier edition of this book isn't already on your shelves! The majority of examples are written using Perl, but if you can read basic Perl (Pidgin Perl, perhaps?) then you'll be fine with the examples. Programmers in PHP, Java, .NET and Ruby also have dedicated sections of the book, so it's very inclusive and almost platform agnostic.

The book has ten chapters divided into two parts. Chapters one through six are what Mr. Friedl calls the "story" of regular expressions. Chapters seven through ten are an examination of the specific regular expression capabilities of Perl, Java, .NET and PHP.

Chapter one is an introduction to regular expressions. At only 33 pages, you might think that it would be shallow, but rather, it is knowledge dense. The examples in the first chapter use egrep extensively. This makes a lot of sense as it's an advanced tool, easy to use and freely available for most modern operating systems.

Chapter two builds on this introduction with extended introductory examples. These are written in Perl (again, simple and easy to follow), but there is no doubt that the regular expressions are the stars of the show around here. The examples are small Perl programs, but their benefit is that Mr. Friedl talks the reader through the process of creating each of them. This is more useful than just presenting example programs, because with just pure examples, you are out of luck if your specific problem is not covered. With this approach, you're coached towards thinking in regular expressions and are more equipped to address your personal regular expression needs.

Chapter three provides an overview of regular expression features and flavors. It starts with a historical view of the development of regular expressions, including a few asides about the influence that the earlier versions of the book have had on that development. After that, the chapter uses a search and replace example to demonstrate some of the differences between flavors of regular expression capabilities provided by different programming languages. Strings, Unicode and metacharacters round out this overview.

Strap yourself in for chapter four; it's time to talk about the computer science that makes all of that matching work. If you didn't know the difference between an NFA and a DFA regular expression engine before you start this chapter, you most certainly will by the end of it. At first sight, it might seem that this is chapter for the pure propeller heads amongst us. While there is much theory here, it's all presented in the light of how your regular expression engine is trying to do what you asked. By understanding the approaches to regular expression processing, we can learn to help ourselves. We help ourselves when we write regular expressions that run faster and use less memory. We write better regular expressions when we understand the consequences of what we write. For example, the oft written ".*" (dot star) seems like a great way to ignore a bunch of stuff in the middle of an expression, but such simplistic use is just waiting to bite you. This chapter explains why and how to deal with the situations where you'd be tempted to use simplistic expressions and how just a little extra thought can bring you the behavior you want.

Chapter five is a practical counterpoint to the previous theory chapter. Here, Mr. Friedl discusses practical regular expression techniques. There are a number of short examples, before he works through medium sized HTML processing examples and finished up with a look at processing Comma Separated Value (CSV) data.

Chapter six is efficiency. Your regular expression can be as correct as you like, but if it takes what seems like eternity to run, then it's of little use. This chapter mostly addresses NFA based engines, because they have the greatest variability based on how the regular expression is written.

Chapters seven through ten cover the specifics of using regular expressions in Perl, Java, .NET and PHP. They're well written and cover everything you need to apply the content of the first six chapters to your programming language of choice.

Everything about this book is great. This is the kind of book that O'Reilly built its reputation with. A master of the subject matter, writing in a clear, easily understood manner, leaving the reader educated and able to operate comfortably with the subject matter. I may not be a regular expression guru, but I feel that I have a much better grasp of the fundamentals that I would need if I did want to be such a guru.

Mr. Friedl is to be commended for his clear explanations of what is, in all reality, much more complex computer science than many of us are used to dealing with. The fact that his explanations are highly readable and enjoyable is a significant bonus.

There is a website for the book, regex.info and a blog at regex.info/blog, where Mr. Friedl has some wonderful photographs of Japanese gardens with their autumn colors. (Nothing to do with regular expressions, but they appealed to my inner photographer.)

Lastly, while the book is not intended to be an encyclopedia of regular expressions, all of the examples are very relevant to programmers needs and this book can easily serve that reference role.

At the risk of sounding like some kind of O'Reilly shill or a relative of Mr. Friedl, I must report that I don't think that I found a single thing I didn't like about this book.

This is a classic of the first order. Nail it to your desk unless you want to be constantly retrieving it from your co-workers. If I might be permitted a Spinal Tap reference, this one goes to eleven. If you ever use regular expressions, are thinking of using regular expressions or are in the same room as a regular expression, then you need this book.

You can purchase Mastering Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

161 of 208 comments (clear)

Maybe it's just me... by grub · 2006-09-13 07:31 · Score: 2, Funny

...but it seems funny that someone signing himself as Simon P. Chappell would worry about "protect[ing] the pretentious".

--
Trolling is a art,
1. Re:Maybe it's just me... by Sideshow+Coward · 2006-09-13 07:45 · Score: 5, Funny
  
  protect[ing] the pretentious
  Don't you mean "protect(ing)? the pretentious"?
2. Re:Maybe it's just me... by leonardluen · 2006-09-13 08:11 · Score: 4, Funny
  
  i see you have not yet mastered reqular expressions
3. Re:Maybe it's just me... by Joe+Snipe · 2006-09-13 08:22 · Score: 1
  
  the above comments rating of informative is hilarious!
  
  --
  Sometimes, life itself is sarcasm...
4. Re:Maybe it's just me... by slashbob22 · 2006-09-13 08:36 · Score: 1
  
  I can't wait to Metamoderate the comment:
  
  Fair [ ][ ][ ] Unfair --- [ ] Contextually Hilarious.
  
  --
  Proof by very large bribes. QED.
5. Re:Maybe it's just me... by Paul+Rose · 2006-09-13 08:45 · Score: 3, Funny
  
  I use my middle initial all the time for that reason
  What? Anonymous J. Coward
6. Re:Maybe it's just me... by Anonymous+P.+Coward · 2006-09-13 09:12 · Score: 2, Funny
  
  No, "P" for Pleanty-of-other-people-named-Anonymous-Coward
7. Re:Maybe it's just me... by wavq · 2006-09-13 09:30 · Score: 2, Insightful
  
  $_ =~ s/q/g/;
8. Re:Maybe it's just me... by jrockway · 2006-09-13 10:42 · Score: 1
  
  s/// works on $_ by default, ya know.
  
  --
  My other car is first.
9. Re:Maybe it's just me... by Anthracks · 2006-09-13 11:12 · Score: 1
  
  Doh! Got me ; )
  
  --
  Rock over London, Rock on Chicago. Wheaties: Breakfast of Champions.
10. Re:Maybe it's just me... by David+A.+Ventimiglia · 2006-09-14 00:51 · Score: 1
  
  What's so damn funny about it?
11. Re:Maybe it's just me... by Inks · 2006-09-14 09:16 · Score: 1
  
  Phew; thanks for that! When I read "protect[ing]", I stared at it for a minute thinking, "I don't get it."
  
  --
  "This is a model of a model of iron, modelled in iron."
Agreed! by MLopat · 2006-09-13 07:31 · Score: 1

The first two editions were also great books. An indispensable resource for sure and mandatory reading for my devs.
1. Re:Agreed! by Anonymous Coward · 2006-09-13 07:45 · Score: 2, Interesting
  
  Guess what - in Silicon Valley, a bunch of sometimes arrogant, more often brilliant, unrepentant commercialists made a system for the Macintosh called MPW. I used their proprietary system for years. I never wanted to deal with four uses of * and / and . and all the others. With a few greek characters, the expressions for Position before A, and Selection between A nd B and a bunch of others worked really, really well.
  
  Now that NeXT acquired Apple, the web is indispensible, and the BSD that drives the Mac is settled, I now use regular expressions as in this book. They are not bad, but not the only way, either. Simply put, they won. There will never be another set of regular expressions for 1000 years now, but dont forget, there is More than One Way to Do Things.
2. Re:Agreed! by $RANDOMLUSER · 2006-09-13 08:17 · Score: 1
  
  Here ya go, sonny. Read up on Alonzo Church's grad student. "*" isn't "splat" in this context, it's the Kleene star.
  
  --
  No folly is more costly than the folly of intolerant idealism. - Winston Churchill
3. Re:Agreed! by orasio · 2006-09-13 08:20 · Score: 3, Informative
  
  Regular expressions have academic books behind them, and computer science books are written about them.
  Maybe what you talk about is nice, but REs (with extensions) are kind of ultimate solutions to the problem they try to solve (describing an automaton in a string of characters).
  
  The only thing that is needed to use another complete system is a theorem that proves there is a two way conversion between the system you like and REs, and then it would be fairly easy to implement everywhere.
4. Re:Agreed! by nickos · 2006-09-13 11:13 · Score: 1
  
  I've heard interesting things about MPW (Macintosh Programmer's Workbench), but never used it myself.
  
  Anyone got any ideas where I can find a copy and how I can play with it?
Third Edition? Already? by Chuck+Milam · 2006-09-13 07:35 · Score: 1

I wish the review would have addressed which edition of the book was being discussed. I assume the 3rd because that's where the bn.com link points. I'm sitting here looking at my 1st and 2nd editions. I wonder if I should spring for the third. Bummer--I just bought the 2nd a few months back.
Knock Knock by neonprimetime · 2006-09-13 07:36 · Score: 5, Funny

What did one regex say to the other?

.+
1. Re:Knock Knock by A+beautiful+mind · 2006-09-13 07:42 · Score: 4, Funny
  
  You're not a geek if you don't find .* of it funny.
  
  --
  It takes a man to suffer ignorance and smile
  Be yourself no matter what they say
2. Re:Knock Knock by neonprimetime · 2006-09-13 08:00 · Score: 1
  
  Obviously the modders of the parent and grandparent have not yet read this book and thus have not yet mastered regular expressions!
3. Re:Knock Knock by kalirion · 2006-09-13 08:14 · Score: 1
  
  I found .*? of it funny myself.
4. Re:Knock Knock by Mattintosh · 2006-09-13 08:25 · Score: 1
  
  Newlines are your friends, Mr. Regex. Please don't ignore them.
5. Re:Knock Knock by merc · 2006-09-13 15:06 · Score: 4, Funny
  
  greedy bastards! .*?
  
  --
  It's true no man is an island, but if you take a bunch of dead guys and tie 'em together, they make a good raft.
I bought/read this a couple months ago. by Jester998 · 2006-09-13 07:37 · Score: 5, Interesting

I bought this (along with a few other O'Reilly titles) a couple months back, and I highly recommend Mastering Regular Expressions. Even though it's a dry technical topic, the presentation is awesome.

I read through the whole thing as if it were a novel, and picked up more than a few new things about regexes.

Very handy book, both to read through to really learn how regexes work, and as a day-to-day reference. The score of 11/10 given by the reviewer is bang on.
1. Re:I bought/read this a couple months ago. by Breakfast+Pants · 2006-09-13 09:51 · Score: 1
  
  This book also goes by another name: Cliff's Notes: Interviewing With Steve Yegge Style Interviewers. It's like all the guy talks about.
  
  --
  
  --
  
  WHO ATE MY BREAKFAST PANTS?
My Expressions by Anonymous Coward · 2006-09-13 07:38 · Score: 3, Funny

are always sad
We need to protect all expressions by 140Mandak262Jamuna · 2006-09-13 07:43 · Score: 4, Funny

The author and the reviewer are blatantly biased in favour of the regular expressions, ignoring the plight of the millions of downtrodden irregular expressions who are not able to get a platform to voice their grievances. All because they are viewed as somehow deviant or deficient. It is time for the irregular expressions to come out of the closet and assume their role as legitimate members of the syntax.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
1. Re:We need to protect all expressions by Kesch · 2006-09-13 08:08 · Score: 1
  
  If you are experiencing irregularity in your expressions I would suggest using some laxatives.
  
  --
  If this signature is witty enough, maybe somebody will like me.
2. Re:We need to protect all expressions by tool462 · 2006-09-13 08:21 · Score: 1
  
  That just causes me to core dump :(
3. Re:We need to protect all expressions by Mattintosh · 2006-09-13 08:27 · Score: 5, Funny
  
  I would suggest using some laxatives
  
  Would that be RegExLax?
4. Re:We need to protect all expressions by Pandishar · 2006-09-13 09:27 · Score: 1
  
  I think the correct term is expression challenged, not irregular expression. Don't want to offend anyone.
5. Re:We need to protect all expressions by TheOtherChimeraTwin · 2006-09-13 09:45 · Score: 1
  
  "Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems."
  -- attributed to Jamie Zawinski
6. Re:We need to protect all expressions by pjt33 · 2006-09-13 10:07 · Score: 1
  
  Well, it's got chapters on Java and Perl "regexes" which are provably irregular. I don't know about .NET and PHP ones, but I suspect they will be too.
This book is awsome, and Amazon Has it cheaper by heffel · 2006-09-13 07:43 · Score: 1

I own an older version of this book and it really rocks.

As usual, Amazon has it cheaper than BN ($29.69 vs $35.99).

--
Expert Java EE Consulting
Re:Third Edition? Already? by MrBoombasticfantasti · 2006-09-13 07:45 · Score: 2, Funny

From the review:

Mastering Regular Expressions, now in it's third edition, [...]

I'm keeping my second edition though. My book fetish is already expensive enough without buying every edition and reprint... ;-)

--
!ERR: Signature not found.
Personally... by rainman_bc · 2006-09-13 07:46 · Score: 5, Informative

I just like to go to http://www.regular-expressions.info/ myself - I seem to find all the stuff I forget from time to time there...

--
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
1. Re:Personally... by owlstead · 2006-09-13 09:09 · Score: 2, Insightful
  
  Note that I immediately found erros on the Java section of this site. E.g., according to the site, the default Java regexp support does not include searching for case insensitive strings, which it does. Beware.
2. Re:Personally... by JohnnyBigodes · 2006-09-13 10:26 · Score: 1
  
  I seem to find all the stuff I forget from time to time there...
  
  That has got to be a memory leak. Oh well, at least you have garbage collection.
3. Re:Personally... by jobst · 2006-09-13 15:02 · Score: 1
  
  Lets see where the AUTHOR of that website learned from ....
  
  http://www.regular-expressions.info/books.html
  
  shows that he learned form Jeff Friedl too:
  
  http://www.regular-expressions.info/hipowls.html
  
  Ahh, again THAT book.
  
  jobst
  
  --
  to code or not to code, that is the question.
Re:For Programmers? NOT! by gurps_npc · 2006-09-13 07:48 · Score: 4, Insightful

Their are and continue to be a LOT of people that start off dablling, get a job and become full time programmers without EVER getting a formal education in computer science. You can easily get a 60k job doing that. While someone with a good formal education can get paid a lot more, these people are still programmers.

--
excitingthingstodo.blogspot.com
Maybe it's just me but isn't 515 pages too much? by Anonymous Coward · 2006-09-13 07:49 · Score: 2, Insightful

Does anyone remember "Moby Dick" (hint: "Call me Ishmael ...")? It weighed more than Roseanne Barr/Arnold/Thomas because publishers charged more money for heavier books and thus encouraged the writers to write big books.
Now, we have a regular-expression primer that has 515 pages. Is the publisher earning more money by producing a bigger book?
The only information that most people need is contained on a small web page. Armed with the information on that web page, the beginner can learn best by doing: writing various regular expressions in short Perl programs and determining whether they do what you want them to do.
an anecdote caused by this good book by Sebastopol · 2006-09-13 07:50 · Score: 3, Interesting

When I read the 2nd edit of this book I was floored by how much richness I was missing in the regex language (well, in Perl regex, that is).

Like I kid at christmas, I immediately went nuts on my next project with \G and the lookaround operator(s).

Sadly, when a big bundle of code I wrote was delivered to a team in a city on another very large eastern continent, no one could understand what I had written, so they deleted my nifty \G loops and replaced it all with a crappy first-year-college-grad-non-indented parsing state machine using gotos. The complaint was not that I went nuts with regex, but that I was using NONSTANDARD perl version which supported them (instead of their ancient version!), and that it was my duty to deliver a tool using standard versions. I was most angry at the fact that they just replaced the code with a buggy state machine, and then asked me to debug another problem caused by their mess because it was my tool originally. Ugh!

Anyway, my point is: (perl) regex are a far richer tool than meets the eye, but beware The Boneheads: the people who refuse to learn something new that could make their life easier and cling to the old way. Gawd forbid someone learn something new on the job.

Sigh. I was hoping at least ONE programmer over there would have shared my enthusiasm for \G. /endrant

--
https://www.accountkiller.com/removal-requested
1. Re:an anecdote caused by this good book by Sebastopol · 2006-09-13 08:07 · Score: 3, Informative
  
  The regex extentions have been mainstream since perl5.8.
  
  The other co. is using perl5.004, which doesn't even support >2GB files.
  
  Trolls are the worst when they make uninformed assertions, you must work for the company that got my code.
  
  --
  https://www.accountkiller.com/removal-requested
2. Re:an anecdote caused by this good book by yerM)M · 2006-09-13 09:49 · Score: 1
  
  Indeed, as (probably Jamie Zawinsky) said:
  Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.
  Fight the power (of regular expressions).
Slightly offtopic, Regex related. by BrookHarty · 2006-09-13 07:50 · Score: 1

This is slightly offtopic, but its regex related. Where are the regex training programs for windows/linux? Or even regex tools to parse data and help you design your expressions?

Seems like a typical thing thats always overlooked. I saw regex buddy for PC, but it missing awk/sed/bash regex.

While reading a book helps, a tool for the inexperienced would help train and get the job done.
1. Re:Slightly offtopic, Regex related. by doti · 2006-09-13 08:05 · Score: 1
  
  Specially if the this tool could work with the various regexp formats around (sed, vim, perl, etc).
  Like, you type a regexp in a format, and it also shows the equivalent regexp in the other formarts.
  
  --
  factor 966971: 966971
2. Re:Slightly offtopic, Regex related. by Otter · 2006-09-13 08:10 · Score: 1
  
  Where are the regex training programs for windows/linux? Or even regex tools to parse data and help you design your expressions?
  Check out KRegExpEditor in KDE...
  
  --
  What I'm listening to now on Pandora...
3. Re:Slightly offtopic, Regex related. by prostoalex · 2006-09-13 08:27 · Score: 3, Informative
  
  The Regex Coach - The Regex Coach is a graphical application for Windows and Linux/x86 (also usable on FreeBSD) which can be used to experiment with (Perl-compatible) regular expressions interactively.
  
  The Regulator - The Regulator is an advanced, free regular expressions testing and learning tool written by Roy Osherove. It allows you to build and verify a regular expression against any text input, file or web, and displays matching, splitting or replacement results within an easy to understand, hierarchical tree.
4. Re:Slightly offtopic, Regex related. by Chrax · 2006-09-13 12:39 · Score: 1
  
  "Where are the regex training programs for [Linux]?" /usr/bin/perl -w
5. Re:Slightly offtopic, Regex related. by Chrax · 2006-09-13 12:42 · Score: 1
  
  Woops. That's what I get for not previewing.
  
  s/" /"\n\n/
Re:Third Edition? Already? by Chuck+Milam · 2006-09-13 07:50 · Score: 1

Ah yes. There is is. Must have been the incorrect use of "it's" that confused me.
New for 3rd Edition by greysky · 2006-09-13 07:51 · Score: 1

Does anyone know what is new in the 3rd edition? This is missing from the review.
1. Re:New for 3rd Edition by Skiron · 2006-09-13 08:03 · Score: 2, Funny
  
  Yes, they missed a . on page 102, paragraph 14.
2. Re:New for 3rd Edition by c0rr1n · 2006-09-13 08:27 · Score: 4, Informative
  
  Mastering Regular Expressions, Third Edition, now includes a full chapter devoted to PHP and its powerful and expressive suite of regular expression functions, in addition to enhanced PHP coverage in the central "core" chapters. Furthermore, this edition has been updated throughout to reflect advances in other languages, including expanded in-depth coverage of Sun's java.util.regex package, which has emerged as the standard Java regex implementation. The languages covered in Mastering Regular Expressions include Perl, Python, Ruby, Java, VB.NET and C# (and any language using the .NET Framework), PHP, and MySQL.
3. Re:New for 3rd Edition by niceone · 2006-09-13 08:58 · Score: 1
  
  Yeah, that's the blurb - but the reviewer should have addressed this, I mean if you already read the previous editions and know all about perl regexps then PHP is pretty easy... so do the new bits of the book add anything? (I'm not saying they don't, I'd really like to know)
  
  --
  ccalam - acoustic versions of new songs.
4. Re:New for 3rd Edition by Shanep · 2006-09-13 23:22 · Score: 1
  
  Yes, they missed a . on page 102, paragraph 14.
  
  There are only 5 paragraphs on page 102.
  
  --
  War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
I'll not refrain by cptgrudge · 2006-09-13 07:51 · Score: 2, Funny

There are books that people say "should' be classics (I'll refrain from mentioning names to protect the pretentious)

I'm not going to refrain.

The Three Musketeers, Alexandre Dumas
Pride and Prejudice, Jane Austen
David Copperfield, Charles Dickens

Look at me, I'm being pretentious!

--
Qualitas edurus commercium, nullus penitus net rimor, nullus deus beneficium
1. Re:I'll not refrain by Gulthek · 2006-09-13 07:59 · Score: 1
  
  You missed it, those books you list are already classics.
  
  The post author was referring to books that only pretentious people know about and think *should* be classics. Stuff like "Attack of the Bacon Robots" or something.
2. Re:I'll not refrain by Mattintosh · 2006-09-13 08:32 · Score: 1
  
  Attack of the Bacon Robots
  
  I have signed copy #353 of 1500 of that book. It should be a classic, which would make my signed copy worth even more.
3. Re:I'll not refrain by legLess · 2006-09-13 08:56 · Score: 1
  
  The Baroque Cycle? Are you out of your mind? I like a lot of Stephenson (and even waited in line to have him sign Quicksilver), but Baroque Cycle is what Dickens would have written had he been paid by the word instead of by the pound.
  
  --
  This isn't as much "normalization" as it is "don't take so many drugs when you're designing tables."
4. Re:I'll not refrain by ScentCone · 2006-09-13 09:06 · Score: 1
  
  The Baroque Cycle? Are you out of your mind? I like a lot of Stephenson (and even waited in line to have him sign Quicksilver), but Baroque Cycle is what Dickens would have written had he been paid by the word instead of by the pound.
  
  No problem! We have different tastes. I completey relished it, and understood, at the tail end of a lengthy bit o' prose, why he took his time getting where he was going. I liked it, a lot. In fact, much more than Snowcrash. For that matter, the lengthier exposition in Cryptonomicon didn't have nearly the pleasant feel and movement to it that I think he polished in the BC. To each our own, of course.
  
  --
  Don't disappoint your bird dog. Go to the range.
5. Re:I'll not refrain by rve · 2006-09-13 21:20 · Score: 1
  
  That's a bit too much honour for Charles Dickens
6. Re:I'll not refrain by cptgrudge · 2006-09-14 06:05 · Score: 1
  
  Why no love for this Dickens book? The style? I mean, it was originally a serialized novel.
  
  Responding with a polite version of "David Copperfield Sucks!" doesn't shed any light on why you think so. What was it when you read it that made you think it shouldn't be a classic?
  
  --
  Qualitas edurus commercium, nullus penitus net rimor, nullus deus beneficium
Re:Third Edition? Already? by Chuck+Milam · 2006-09-13 07:52 · Score: 1

/me sees the "is is" in my last comment. Argh.
Apparently I need to stop posting today, since I can't seem to get anything right.
Re:Third Edition? Already? by MrBoombasticfantasti · 2006-09-13 07:57 · Score: 1

Isn't it a given that there is always at least one typo in a post commenting on a typo? ;-)

Well, I'm off to bed, thanks for the heads up on the time...

--
!ERR: Signature not found.
A book on regular expressions? by xxxJonBoyxxx · 2006-09-13 08:04 · Score: 1

A book on regular expressions? What, is the Internet broken?
Tech manual a classic ???? by IamWhoIam · 2006-09-13 08:05 · Score: 1, Offtopic

Damn I'm old. I remember when classics were such books as Ivanhoe, The Virginian, the Iliad and the like. Never ever a technical manual no matter how well written it may be. Not to say one couldn't be. That is if you threw in a bit of sword/gun play and a love interest or so.

--
IF you can't be famous be infamous. But for GODS sake be something
1. Re:Tech manual a classic ???? by niceone · 2006-09-13 08:50 · Score: 1
  
  I remember when classics were such books as Ivanhoe, The Virginian, the Iliad and the like. Never ever a technical manual no matter how well written it may be. Not to say one couldn't be. That is if you threw in a bit of sword/gun play and a love interest or so.
  
  I'm praying Larry Wall isn't ready this, those perl books were goofy enough already.
  
  --
  ccalam - acoustic versions of new songs.
2. Re:Tech manual a classic ???? by Abcd1234 · 2006-09-13 09:24 · Score: 1
  
  Well, my dear elitist, apparently you aren't aware of the fact that "classic" may, believe it or not, be used in a domain-specific fashion.
3. Re:Tech manual a classic ???? by IamWhoIam · 2006-09-13 10:05 · Score: 1
  
  Perhaps I am a bit elitist about what I consider a classic. But you got to admit any "technical classic" would read better if a bit of spice was added. Ie. Egbert caressed the keyboard as if it were his beloved Caroline's creamy full breasts. (Insert chapter two examples here) The Perl script flowing from his fingers as his mind raced, dreaming of Caroline's flashing perfectly formed thighs and the evening's pleasures awaiting him. Ecstasy washes over Egbert like a tropical waterfall as he finishes the example successfully, and Caroline in his mind.Sitting back in his chair Egbert breathes a sigh of relief, His eyes suddenly fly open, and Egbert leaps from his chair gripping his steely sword, rushing off to do battle with his raging inner demons. Now don't that read a whole lot better than just blah blah blah.
  
  --
  IF you can't be famous be infamous. But for GODS sake be something
4. Re:Tech manual a classic ???? by Abcd1234 · 2006-09-13 10:21 · Score: 1
  
  See, now if I had read you whole post from beginning to end, my humour detector might have fired off... in my defense, this has been a *long* day (woo woo, going on 11 hours...).
.* is bad? by MagicM · 2006-09-13 08:08 · Score: 1

So, why is ".*" bad, other than that you sometimes want Perl's non-greedy ".*?" instead?

Now I'm curious (but still too cheap to buy the book).
1. Re:.* is bad? by PRMan · 2006-09-13 08:15 · Score: 1
  
  Because most people write .* when they mean .*?, that's why. (Myself included sometimes. Did it today in fact.)
  
  --
  Peter predicted that you would "deliberately forget" creation 2000 years ago...
2. Re:.* is bad? by tehshen · 2006-09-13 08:22 · Score: 1
  
  It's because some people didn't know about the greedy thing.
  
  The greedy thing goes thus: If you have a string like %{"Attack of the Bacon Robots" is better than "Pride and Prejudice"}, and you want to extract whatever's inside the quotes, the obvious thing for regex younglings to do is to use one like /"(.*)"/. Starts with a quote, stuff in the middle, ends with a quote.
  
  This is expected to catch "Attack of the Bacon Robots"; but because * is greedy, it eats up the entire string, all the way from Attack to Prejudice. Dot star is bad because the greedy thing bites people.
  
  The proper solution is, I think, /"([^"]*)"/.
  
  --
  Guy asked me for a quarter for a cup of coffee. So I bit him.
Two problems by MarkByers · 2006-09-13 08:09 · Score: 2, Funny

Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.

--
I'll probably be modded down for this...
1. Re:Two problems by toomz · 2006-09-13 08:19 · Score: 1
  
  $message =~ s/problem/opportunity/g
  
  Oops.
  
  $message =~ s/opportunitys/opportunities/g
  
  Excellent.
  
  --
  If a chair is thrown in a forest, and there are no witnesses, did Ballmer still do it?
Regular Expressions are of the Devil by trigeek · 2006-09-13 08:13 · Score: 3, Funny

To quote: "Sometimes a hacker has a problem, and he thinks to himself 'I know, I'll solve it with a regular expression!'. Now he has two problems." -- Jamie Zawinski

--
Sometimes I doubt your committment to SparkleMotion!
question for the floor by ArbitraryConstant · 2006-09-13 08:15 · Score: 1

I'm already quite proficient at regexen (people at work come to me for help etc). How much do I stand to gain from this book?

--
I rarely criticize things I don't care about.
1. Re:question for the floor by gerbercj · 2006-09-13 08:51 · Score: 2, Informative
  
  This book is not not really to teach you how to write regular expressions. This book teaches you to understand how your regular expressions will be parsed so that you can understand the impact of your approach and start creating expressions that are much more efficient, or that handle special cases more elegantly. It's the book that, in my case, took my skills to the next level. I still refer to it a few times a year, and am glad that it's a part of my library.
  
  --
  The weird part is that I can feel productive even when I'm doomed.
2. Re:question for the floor by LuckyStarr · 2006-09-13 08:55 · Score: 1
  
  The theoretical part would perhaps further your insight into regexen. Hard to tell how good you really are. This book really was an eye-opener to me.
  
  --
  Meme of the day: I browse "Disable Sigs: Checked". So should you.
3. Re:question for the floor by teslar · 2006-09-13 09:41 · Score: 1
  
  Honestly?
  If you have to ask, probably enough to warrant buying the book.
4. Re:question for the floor by aflag · 2006-09-13 11:43 · Score: 1
  
  It depends on how much the people at your job know about regex.
5. Re:question for the floor by ArbitraryConstant · 2006-09-13 16:30 · Score: 1
  
  I can generally write a regex to do what I want, and at one point I wrote a library to merge regexen into a single pattern in order to deal with Python's large function call overhead (this was easier than getting everyone to maintain a shared regex), but I consider myself relatively weak at other stuff like performance optimization of the patterns themselves.
  
  --
  I rarely criticize things I don't care about.
I once wrote a complete... by pyrrho · 2006-09-13 08:19 · Score: 2, Funny

.... wordprocessor and email program with a regular expression!

PS: not really but wouldn't that be feckin' awesome! it was emacs... if I really had done it I mean.

--
-pyrrho
Re:For Programmers? NOT! by Amouth · 2006-09-13 08:19 · Score: 2, Interesting

that is me.. i wrote code for a while but was mainly a sys admin - got a job that now i write code all day.. decided to go to school for it .. on one of the asignments the prof docked me for using a regex for finding links in pages instead of a fsm - because he didn't teach them in class.

that pissed me off..

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
Snob by wsanders · 2006-09-13 08:21 · Score: 2, Insightful

I dunno about the latest edition, but a large percentage of people I interview have a computer science degrees, are total dumbasses, and don't know a regular expression from their own ass.

The more advanced the CS degree, sometimes, the more significant the dumassery.If you don't know what a regular expression is, at least admit to using a cookbook.

--
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
1. Re:Snob by bnavarro · 2006-09-13 11:38 · Score: 1
  
  I have to take umbrage with this.
  
  My computer science degree taught me *nothing* about regular expressions. In fact, I would expect that any quality computer science degree wouldn't teach you about RE. Here's why: A good computer science degree teaches you one language, then it teaches you the concepts behind programming -- algorithim analysis, discrete math, data structures, fundamentals of programming on modern operating systems (threads, semaphores, etc), and once you learn all of the fundamentals, you are expected to be able to learn any programming language virtually at will.
  
  This book is not a "cookbook". I am an accomplished programmer with a real computer science degree, and I used the first edition of this book to teach myself regular expressions. For most people, even experienced programmers, learning RE is hard when all you have are "reference" manuals. This is coming from someone who, when my boss asked me once "What book did you read to learn Linux?", replied, "UNIX MAN Pages."
  
  I highly recommend this book for even the most accomplished programmer, if you don't yet know RE. If your University's curricula includes teaching you RE, I would suggest you find a better University, since they are likely focusing on the "high level" stuff, and not the basic fundamentals you will need to adapt to a rapidly changing environment.
2. Re:Snob by Karma+Farmer · 2006-09-13 11:56 · Score: 1
  
  Regular expressions are just a language to describe finite state machines. You should be able to learn them virtually at will.
  
  If you didn't learn finite state machines at university, I suggest you find a better university.
3. Re:Snob by Jerf · 2006-09-13 15:23 · Score: 2, Informative
  
  My computer science degree taught me *nothing* about regular expressions. In fact, I would expect that any quality computer science degree wouldn't teach you about RE. Here's why: A good computer science degree teaches you one language, then it teaches you the concepts behind programming -- algorithim analysis, discrete math, data structures, fundamentals of programming on modern operating systems (threads, semaphores, etc), and once you learn all of the fundamentals, you are expected to be able to learn any programming language virtually at will.
  Very wrong, albeit with qualifications.
  
  Any competent Computer Science course should contain a discussion of the Chomsky Language Hierarchy. If you hold a computer science degree and that page is gibberish to you, you have been robbed. (Or at least, it should be familiar gibberish, for those who didn't like that course.)
  
  Regular Expressions come up in that course because the languages they are capable of describing are provably isomorphic to the languages that can be recognized by a Finite State Automaton, another word you ought to know if you have a computer science degree.
  
  The "qualifications" mentioned before is that the Regular Expressions in this case are a very limited, precisely-specified language that forms only the barest shell of the Regular Expressions that the book in question discusses. (The mathematical definition is practically useless, because very simple things like "i{0,50}" translate into a horrific mathematical RE, but most RE features can so be translated and many homework problems consist of doing just that, just to prove it can be done.) Nevertheless, it is important to understand these limitations so you don't press Regular Expressions to do something they can't really do, like entirely parse Context-Free or Context-Sensitive languages, like most programming languages (at least mostly) are.
  
  (It is true that some extensions, especially those found in perl, push the regular expression into Context-free or Context-sensitive territory when used correctly, but generally speaking, you're really asking for one disaster of a Regular Expression. You're better off using a parser. Perl 6 has some interesting innovations on this front, essentially building on their regular expression support to upgrade the language to built-in parser support, presumably at least the context-free level, perhaps more. I don't know.)
  
  It is true that a Programmer degree may not cover regular expressions, but you absolutely should have at least seen the mathematical basis for a Regular Expression in your Computer Science course. At Michigan State University where I got my education, it is, IIRC, in the sophmore course on the theory track.
  
  The language hierarchy is one of the absolute fundamentals of computer science.
4. Re:Snob by jared9900 · 2006-09-13 15:30 · Score: 1
  
  "I would expect that any quality computer science degree wouldn't teach you about RE"
  
  Do Georgia Tech, MIT, Carnegie-Mellon and Stanford (just to name a few small programs) have low quality CS programs?
  
  "A good computer science degree teaches you one language"
  
  I certainly hope not. A good computer science degree should teach you several languages, perhaps not as formally as the first one in CS 1 and 2. Suppose that language is Java (as it is in many programs these days), when you later take a graphics course should it still be using Java? No, you should be using C/C++, at least if you want to be able to use that knowledge when you leave college. Operating systems and systems programming? Should compilers be taught in that same language? Well, maybe, but SML, C with Lex/Yacc (or Flex/Bison), make equally effective tools. Exposure to additional languages in a CS program is a wonderful thing. It provides the students with an environment to learn how to learn. Only being taught one language through the introductoy courses, how can students be expected to know how to teach themselves new languages later?
  
  "then it teaches you the concepts behind programming"
  
  These are the concepts behind programming. It's usually studied in a CS theory or automata class. RE, context free grammars and Turing machines are all studied in these courses (typically all are studied in CS theory, some programs diverge a bit, or include more such as algorithm analysis, NP completeness and the like). Understanding these things enable you to be a better programmer, and hopefully even a better employee as you'll be more capable of producing or extending tools as it is needed.
5. Re:Snob by Karma+Farmer · 2006-09-14 02:01 · Score: 1
  
  Learn to read, dumbass. I didn't say finite state machines were trivial.
  
  I said that every decent university covers finite state machines, and that regular expressions are a language for describing finite state machines. The poster I replied to said that learning new langauages was trivial. The logical conclusion, then, is that learning regular expressions is trivial.
  
  Of course, every finite state machine is trivial. If you don't know that, you need a few more years of school under your belt.
6. Re:Snob by ClosedSource · 2006-09-14 02:38 · Score: 1
  
  Sure, finite state machines with millions of states are trivial.
7. Re:Snob by Jerf · 2006-09-14 02:40 · Score: 1
  
  What's a Programmer degree? Sounds made-up to me.
  It's certainly not something they put on the degree itself, but it sounds like what the GGP got, as well as the sibling post to yours.
  
  There's nothing necessarily wrong with that, other than the name. Computer science is about decidability and algorithmic efficiency (without regard to any particular machine) and complexity and some other thing like that that are all pretty much about math more than real machines. A Computer Scientist could in theory have an entire career without ever uttering the word "API". And an excellent, award-winning Computer Scientist can be one of the worst coders you'll ever lay eyes on.
  
  We really ought to have a "software engineering" degree that truly, honestly focuses just on programming. Such a degree probably still ought to talk about regexs and the theory behind them, but it wouldn't get into it as much as my program did. I think the concern is that nobody would be interested in Computer Science after that.
8. Re:Snob by mrchaotica · 2006-09-14 10:37 · Score: 1
  
  A good computer science degree teaches you one language, then it teaches you the concepts behind programming -- algorithim analysis, discrete math, data structures, fundamentals of programming on modern operating systems (threads, semaphores, etc), and once you learn all of the fundamentals, you are expected to be able to learn any programming language virtually at will.
  
  Bah, that's nothing! My school's computer science program expects you to learn any programming language virtually at will after the first semester, and tests you on it by using a different language in every class! I learned Scheme and a little bit of Python my first semester, then Java, then C, then Smalltalk, and the two classes I'm in now use C++ (or C, but I'm choosing to use C++) and Matlab, respectively.
  
  --
  "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
A poorly stated case by fm6 · 2006-09-13 08:26 · Score: 4, Interesting

So regular expressions are evil because they're too hard to maintain? If that's you're argument, you need to come up with an alternative that isn't time consuming to code and doesn't require advanced skills that are difficult to master. Good programmers don't hand code fancy solutions any more often than they have to. They rely on well-documented, well-tested language features and APIs. Which describes Perl regular expressions to a T, whatever their shortcomings.
Anyway, Perl regular expressions don't have to be "line noise". That's just the way sloppy people are used to coding. Perl actually allows you to create a clearly formated regular expression in which the structure is pretty obvious, with a little commenting. It does this by providing high-level metacharacters, and by allowing you to use blanks for formatting instead of representing blanks.
stay away from regexes by hashmap · 2006-09-13 08:27 · Score: 1

When some people are confonted with a programming problem, they say to themselves, "I know, I'll use regular expressions" ... Now they have two problems.
I cringe every time I hear people using regular expressions to parse HTML or CSV or whatever ... why don't you just use an already made parser to do that job? You'll get it done sooner it will work faster and it will more maintainable, extensible than some crazy-ass-convoluted-line-noise that "seems" to do the job for now ...
1. Re:stay away from regexes by Grey_14 · 2006-09-13 09:09 · Score: 1
  
  well, I would think most of those parseres use regex's, so I Guess it really depend on what you are doing...
2. Re:stay away from regexes by StimpyCat · 2006-09-13 10:48 · Score: 2, Insightful
  
  Regular expressions are a fantastic thing to use in development and are perfect for data validation. I needed to parse huge csv files and filter out all the bogus values in certain colums. Without regex this would have been a nightmare. I used a csv parser to break all the tokens up and then regex to validate. Everything has its place, you must learn to embrace change not deter others from using it because you wont learn it.
Re:For Programmers? NOT! by gurps_npc · 2006-09-13 08:33 · Score: 1

Or maybe English is not my first language?

--
excitingthingstodo.blogspot.com
For binaries? by Cybert4 · 2006-09-13 08:36 · Score: 1

Anybody try doing regexes on binaries? They invariable muck up linefeed characters, no matter how I try to tell it not to.
1. Re:For binaries? by Anonymous Coward · 2006-09-13 08:45 · Score: 1, Funny
  
  Anybody try doing regexes on binaries? They invariable muck up linefeed characters, no matter how I try to tell it not to.
  
  Are you some sort of idiot?
Moo by Chacham · 2006-09-13 08:38 · Score: 1

By understanding the approaches to regular expression processing, we can learn to help ourselves.

Which is why i would reccomend Assembly Language Step By Step, by Jeff Duntman for any programmer. It's easy to learn, and is merely a preparation for Assembly, but would be great for all programmers, if only to know the difference between CS and DS, near calls and far calls, and the like.

The only thing i don't understnad about regular expressions, is why they have to be so cryptic. Wouldn't it be easier to debug if the patterns were a little more clear?

--
Have you read my journal today?
1. Re:Moo by Chapter80 · 2006-09-13 09:50 · Score: 1
  
  I always thought regular expressions were garbage code in perl. Non-maintainable.
  Multi-line regular expressions in Python are actually easy to use and easy to maintain. Then again, this shouldn't be a surprise - it's Python after all. (I WISH we were encouraged to use Python on the job!)
  As a side note, I bought this book, and thought it was great. But got laughed at by my co-workers, who couldn't believe I'd waste time reading a book about a topic that is covered in one page in Wikipedia.
  I just responded: "Read the book, and you'll get it. You'll understand why my code works and yours doesn't, why my code is fast and yours is slow."
2. Re:Moo by Nicolay77 · 2006-09-13 10:19 · Score: 1
  
  You can try this:
  http://weitz.de/cl-ppcre/#parse-tree-synonym
  
  It's in lisp, but lisp is good to learn too.
  
  --
  We are Turing O-Machines. The Oracle is out there.
3. Re:Moo by furry_marmot · 2006-09-13 13:40 · Score: 1
  
  Multi-line regular expressions are built into Perl. I hope you enjoyed bashing Perl, but you should know that you minimal knowledge of it exposes you for the poseur you are.
4. Re:Moo by Chacham · 2006-09-13 14:24 · Score: 1
  
  Thanx for the link.
  
  --
  Have you read my journal today?
Re:Maybe it's just me but isn't 515 pages too much by pizza_milkshake · 2006-09-13 08:40 · Score: 4, Funny

The last 300 pages are actually a single regular expression.
Re:Maybe it's just me but isn't 515 pages too much by morcego · 2006-09-13 08:54 · Score: 5, Informative

You are obvious a newbie regarding regular expresions, based on your post.

First, 515 is not too much when talking about regular expressions. There is much to be discussed, not to mention tips&tricks to give away.

Also, you are deadly wrong about the "small web page". First, it only talks about Perl Regular Expressions. There are other kinds, including the classic (basic?), extended, posix and (from your reference) perl regular expressions. Mastering the different kinds is enough to fill 300 pages of the book.

Where are you going to use REs ? sed ? VI ? perl ? php ? C ? SQL ? You need to know what flavor of REs you need for that particular environment.

Regular expressions is a very tricky topic, and understanding them is not something easily acomplished. Come to think about it, 515 might not even be enough.

--
morcego
Re:For Programmers? NOT! by drauh · 2006-09-13 08:59 · Score: 1

what does the flying spaghetti monster have to do with html screen-scraping?

--
This is a tautology.
No by Cybert4 · 2006-09-13 09:03 · Score: 1

Sometimes I'd like to use the power of regexps while still thinking of the file as a binary.
1. Re:No by Shanep · 2006-09-13 23:46 · Score: 1
  
  Sometimes I'd like to use the power of regexps while still thinking of the file as a binary.
  
  How do you deal with non-text portions of the binary matching your regexps? Do you really want to filter text which is interspersed with what looks like noise, but without telling the filter what is text and what is noise?
  
  Regexps can be a gamble at times as it is, for those who like to use them for everything. You've taken it to the next level!
  
  --
  War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
Re:Third Edition? Already? by Catamaran · 2006-09-13 09:12 · Score: 2, Funny

Thanks for clarifying. I thought your post was a subtle play on RE's, Bill Clinton, and Max Headroom!

--
Test 1 2 3 4
Great book and good review by tcopeland · 2006-09-13 09:15 · Score: 1

We've got all three editions of this book in our office and they keep getting better. As the review says, this book will teach you the difference between a DFA and an NFA engine if you want to learn that, or just how to do some simple capturing if that's all you need. Friedl's writing is very approaching and the book's notation for showing what part of a string a regex will select is very helpful.

And this stuff comes up over and over - if you ever need to tweak a JavaCC grammar knowing how to specify a DFA vs a NFA can make a nice performance difference. Great stuff!

--
The Army reading list
Re:hmm by iluvcapra · 2006-09-13 09:15 · Score: 1

Probably not as much of a troll as you intended. But Kerry might appreciate the read more, now that Gore is all about public speaking and evangelism.

But, engarde, Rumsfeld clearly read this book: "Stuff Happens" etc.

--
Don't blame me, I voted for Baltar.
Re:hmm by Opie812 · 2006-09-13 09:16 · Score: 1

This book would be a great gift idea for Al Gore.

Funny, I was thinking GWB needed some brushing up on his regular expressions. Here's two examples:

I think -- tide turning -- see, as I remember -- I was raised in the desert, but tides kind of -- it's easy to see a tide turn -- did I say those words?" --George W. Bush, asked if the tide was turning in Iraq, Washington, D.C., June 14, 2006

"There's an old saying in Tennessee -- I know it's in Texas, probably in Tennessee -- that says, fool me once, shame on -- shame on you. Fool me -- you can't get fooled again." --Nashville, Tenn., Sept. 17, 2002

--
I'm not a nerd. Nerds are smart.
Regular expressions for geeks by MobyDisk · 2006-09-13 09:19 · Score: 3, Funny

I am glad to see this on Slashdot since regular expressions is an area that geeks could really use help in.

For example, instead of saying the common geek expression "Greetings Program!" try a more regular expression such as "Hello Sir" or the more casual "Wassup?" IRL, Tron references are not considered cool. Another common faux pas is using the expression "Hey n00b, what's your function?" instead of something more regular like "Hey dog, what's your problem?" If someone tries to threaten you, think about their technical skills before saying "Close your port before I pwn j00!" Life is not an FPS. "Shut up before I kick your ass" works very well.
Topical plug: Regex Powertoy by gojomo · 2006-09-13 09:24 · Score: 3, Interesting

Give a try to my web-based tool, Regex Powertoy. Its interface is all DHTML/CSS/Javascript, but requires a hidden Java (1.5) applet for the advanced and steppable regex engine.

Given that Java core, there are options for adding/removing usual Java literal escaping, which in Java code means lotsa backslashes. Not all Perl advanced features are supported.

I hadn't considered a pick for awk/sed/bash syntax limits/conversion but will consider it. Any handy reference to how their syntax differs from Perl/Java? (The thing that usu. bites me with sed is escaping of parentheses.)
Mark Twain on classics by swiftstream · 2006-09-13 09:40 · Score: 1

"A classic is something that everybody wants to have read and nobody wants to read."

One of my favorite Mark Twain quotes...

--
Be a PATRIOT--because the only thing we have to fear is the lack thereof.
Re:What is there to master? by EvanED · 2006-09-13 09:50 · Score: 1

Regular expressions are pretty much the first thing you learn in computer science.

Where did you go to school? At my alma mater, the only reason I learned about them AT ALL was because I took a small elective class (~20 people) that's offered once every other semester.

Granted, there are A LOT of things that it could have done A LOT better, and the curriculum is currently weak, but I don't think that this is an uncommon situation either.
Stating the obvious by teslar · 2006-09-13 09:52 · Score: 2, Funny

It's a good review and the book's great and all that, but I still had to cringe when I read this:
there is no doubt that the regular expressions are the stars of the show around here.
You don't say... in a book called 'Mastering Regular Expressions', that must have come as a real surprise...
Re:What is there to master? by EvanED · 2006-09-13 09:53 · Score: 1

Oh, and in the class where we learned about them, they were a lot simpler than what actual languages have. We learned about them from a theoretical standpoint, not a practical one. So you use [0-9]+? We learned to write (0 | 1 | 2 | ... | 9)(0 | 1 | 2 | ... | 9)*. [A-Z]{5}? We're writing out (A | B | ... | Z) five times.

It's just as expressive, and makes the translation to and from FAs easier, but it's not practical.
people look at you funny by schlick · 2006-09-13 09:54 · Score: 3, Funny

Years ago I was calling around to bookstores looking for this book. A few bookstore employees asked me if it had a lot of pictures. They thought is was a book for people who have trouble communicating. Like knowing when to say,'hi' vs. 'hello' or somehting. sheesh. Now granted many people who read this book may be socially challenged, but this book won't help that.

--
"It's because they're stupid, that's why. That's why everybody does everything." -Homer Simpson
Nope. But then, who needs it? by Ayanami+Rei · 2006-09-13 10:04 · Score: 1

If you wanted to learn or develop some regexes, you sat down with regex(7) open in one terminal and an interactive perl in another window to test them out.

It never occured to me that I would need or want a tool to generate them. It's not like they're that hard to comprehend. (Although they can be a pain to document... thankfully perl allows you to add whitespace and comments to a regular expression so it can make sense to a third party)

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Re:Maybe it's just me but isn't 515 pages too much by hotdiggitydawg · 2006-09-13 10:20 · Score: 1

I bet they parse as valid Perl too.
Re:hmm by User+956 · 2006-09-13 10:40 · Score: 1

arr... tis a fine line between "funny" and "troll", matey. a little lighthearted humor about our former robotic vice-president is all in good fun.

--
The theory of relativity doesn't work right in Arkansas.
I'll say it again by pvera · 2006-09-13 10:43 · Score: 3, Interesting

I bought this book years ago and still can't STFU about it, sorry.

At my previous job (web-based custom market research) we did hundreds of web surveys which had on the average some 400 data points per survey. These had distinct variable names, etc. and were built 100% by hand when I was hired in the company some time in 2002. My first survey project was a disaster, it took me about 20 hours from the final approved survey document to the dynamic version. The process was riddled with manual steps that created an infinite amount of room for errors.

Enter regular expressions.

While fiddling with BBEdit Pro I finally decided to take a shot at regular expressions. After an hour or so of experimenting I started writing a few filters that allowed me to cut down the turnaround from 20 hours per survey to a little over 10 hours. When I got to the point in which I wasn't able to figure things out from the BBEdit documentation and he web, I convinced the boss to buy me Mastering Regular Expressions.

Within the first 50 pages, I had picked up on additional regular expressions concepts that allowed me to eventually cut down the turnaround per survey to less than 8 hours. That's not 8 hours programming, that's 8 hours from the moment the approved survey is handed over to programming to the moment it passes QA checks and is considered ready to go live.

This was a $50 or so book, and it saved us thousands of dollars over the four years I worked at that company. Of course, my reward for saving the company all that money was to lay me off, and I "forgot" to leave instructions on how to use the text filters, so I imagine my replacement is right now writing surveys by hand.

Some of the things that proved to be killer uses for regular expressions within that context:

1. The approved survey would have specific variables that the analysts would need to keep for importing into SPSS later down the process. A text filter picks up those variables and generates a unique list of every variable needed for he survey. The variables are named with specific patterns, so you know which ones are strings, integers, etc.

2. Now that we have a list of variables, it means we can quickly generate the CREATE TABLE statement for the survey data. What used to be done by copying and pasting 400 times is (was?) now done by highlighting the text and running a macro. The output is the SQL command you need.

3. Since you already have the list of variables, you can generate the 400 statements needed to read each form variable into its proper variable in the asp code.

4. The same way you can generate the hidden form fields that you need.

5. The same way you can generate the INSERT statement to send your data to he database.

Little things like that. Eliminating all that copying and pasting really cut down on the QA overhead per project.

--
Pedro
----
The Insomniac Coder
Re:For Programmers? NOT! by phazer · 2006-09-13 11:20 · Score: 2, Informative

Well, I hate to say it, but I agree with the Prof. There are really two worlds in computer science: academia and work.

Pretty much _all_ assignments that will be given in CS courses can be solved quite easily by using a library that implements a solution. In the working life, that would be the proper solution, but not so in school.
Of course you can just call a class in your standard library that implements regular expressions and solve a problem that way. But that's not why you're in college. You ALREADY know how to call a library that someone else wrote. Calling libraries is trivial, you can pick that up with a few pages reading and some practice. The Professor isn't there to teach you how to call libraries though. What you're supposed to take away from the class is the understanding of how the class does the work.

Finite state machines are the underlying theory of regular grammars (See: Chomsky hierarchy of languages.) So if the class covers how FSM's work, and what their usefulness is, then you should try to actually apply that knowledge to the problem. The assignment isn't so much one of "find the answer" (nobody cares about the answer) but one of "apply the theory" and learn something new.
One day you'll find come across a similar problem that is very similar to regular expressions, but not quite like it, and you may remeber this assignment and write a FSM to solve it, and you'll be glad for it.

It's like you're learning about sorting algorithms, and then you come along and use Collection.sort() instead of writing your own quicksort (and understanding the algorithm while you do so.)
Re:For Programmers? NOT! by Amouth · 2006-09-13 11:52 · Score: 1

I agree with you.. But if you are going to be grading someone directly on it then you need to state to use a FSM..

the project did not say you had to do it a certin way.. only that you had to do it and that it worked.

if it had said "Use a FSM" then the prof would have been right but it didn't so it pissed me off

--
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
safari for me by zIRtrON · 2006-09-13 12:00 · Score: 1

at $9.99 a month [$13.50 AUS], I can't lose, always getting stuff to read
1. Re:safari for me by epee1221 · 2006-09-14 11:59 · Score: 1
  
  $10 per month? The web site says 15-20....
  
  --
  "The use-mention distinction" is not "enforced here."
Don't believe the "computer science" here. by Bryson · 2006-09-13 12:00 · Score: 1

"Nondeterministic finite automata" is well defined in comp-sci
and Friedl has it wrong. The set of languages accepted by NFA's
is exactly the same as the set accepted by DFA's.

Perl's engine and its brethren use search-and-backtrack. They
accept a lot things that are not regular expressions. Such
engines don't have much theory behind them, and it's hard to
reason generally about what they do and how fast they do it.
1. Re:Don't believe the "computer science" here. by the+donner+party · 2006-09-13 16:33 · Score: 1
  
  Perl's engine and its brethren use search-and-backtrack. They
  accept a lot things that are not regular expressions. Such
  engines don't have much theory behind them, and it's hard to
  reason generally about what they do and how fast they do it.
  
  Of course, that is a failing of computer scientists, not of Perl. As scientists, they should aim for an analysis of such a practical tool, even if (and because of) it turns out to be hard.
2. Re:Don't believe the "computer science" here. by Steve+Friedl · 2006-09-15 04:42 · Score: 1
  
  Oh yah?
  
  http://regex.info/blog/2006-09-15/248
  
  --
  Steve Friedl / Unix Wizard / Microsoft MVP / www.unixwiz.net
Re:What is there to master? by EvanED · 2006-09-13 12:27 · Score: 1

Regular expressions are a good introduction to formal computer science because they're practically important and immediately useful, so they're not too boring, and at the same time they're the lowest level of formal languages and present a motivation for the mathematical abstraction of computing and for complexity observations

I agree. I'd like to see them more; I'm just saying that, at least at my school, we didn't.

How do you learn about compilers and formal language hierarchies without understanding regular expressions?

The class where we learned about regular expressions WAS the (only undergraduate) class where you learn about the language hierarchy. And few take it.

At my school you see BNFs in our PL class (which is required), but you don't go into anything more than showing that they are useful for defining part of the syntax of a language, and the complications that arise because of that, like ambiguity in parse trees, associativity, prescedence, etc. I don't think we talked at all about the larger picture of how CFGs describe a class of languages, and that there is a more restrictive set called regular languages, or anything like that. The only think I think we did on that line was to say that CFGs can't define all of the syntax of a language, and there might have been an anecdote about how something like type checking has to take place outside of the system because those aren't describeable by a CFG.

And certainly you've had practical courses about lex (and yacc)...

Ha... again, there's really only one course I think that covers this (compilers), and it's unpopular (even more than the automata and languages class) and only offered every other semester. You don't see lex or yacc in the normal flow of curriculum. And even in that class, the use of those tools was very poorly explained. I essentially had to go to 3rd party sources for instructions of how they should be used. (Then again, the compilers class was one of the worst courses the dept. offers. Here at my grad institution, the undergrad compilers course actually codes up a compiler through the course of the semester. (Fancy that.) At my undergrad school, we did a calculator in lex/yacc and two straight-line register allocators.)

I think that this discussion might reflect poorly on my school, and that's only partly what I intend. There's a lot they didn't do wrong too, and I'm definitely a much better computer scientist and programmer having gone through it. But, there's a fair amount they do wrong too, so this discussion has pulled out some of those complaints. (I also don't see how you can teach two semesters on C++ programming and cover neither virtual functions nor exceptions, but that's another discussion.) I'd be interested at comparing what I went through with other schools that seem like they should be in the same class.
Re:Maybe it's just me but isn't 515 pages too much by chromatic · 2006-09-13 12:44 · Score: 5, Insightful

You've mastered an entire field of computation by reading a short introduction to one implementation? I think I've fixed your code in about a dozen different companies!

--
how to invest, a novice's guide
Re:Third Edition? Already? by onlyjoking · 2006-09-13 13:19 · Score: 2, Informative

No need for the 3rd edition unless you desperately need the extra 40-odd pages on PHP regexes. That's the only difference between the 3rd and 2nd editions as far as I can tell.
speaking of classics by adrianmonk · 2006-09-13 13:20 · Score: 1

One of my favorite articles on the web about regular expressions is How Regexes Work by Mark-Jason Dominus. It's a great article if you're at the point where you already have some experienceusing regular expressions, but you want to gain some insight into how they do what they do. I found that after I read this article it was easier for me to come up with cleaner regexps more quickly.

I haven't read the book being discussed. It probably covers the same stuff, but I found M-J D's article easy to read, short, and very informative.
The old adage by Millenniumman · 2006-09-13 13:26 · Score: 1

"Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems."

--
Stupidity is like nuclear power, it can be used for good or evil. And you don't want to get any on you.
Re:Maybe it's just me but isn't 515 pages too much by kisielk · 2006-09-13 13:28 · Score: 1

Hint: The book is called Mastering Regular Expressions. It's not meant to be be an introductory primer (though it does include one) but a detailed look at uses of regular expressions, variations between systems, and advanced techniques as well. Sysadmins and programmers often have to deal with many Regexp systems and a book like this really helps. I know I use it quite often.
This book changed my life by onlyjoking · 2006-09-13 13:36 · Score: 1

Honest. I'd learned HTML and Dreamweaver 4 had a search and replace facility for using these weird hieroglyhics for specifying patterns. "Dreamweaver 4 Bible" referred to them as regular expressions, citing the 1st edition of Jeffrey Friedl's book and I found a copy in the local (Islington/London) library. I was fascinated by the book and read it day after day. Since Perl seemed to be THE regex language I soon developed a fascination with Perl through Larry Wall's "Programming Perl". My "web design" career soon became a "web development" career when I learned how to build end-to-end web applications with Perl and MySQL. So, thanks to you, Jeffrey Friedl, I now have a much wider skill set.
Ok I give up by p3d0 · 2006-09-13 14:35 · Score: 1

Can someone explain the joke for the humourously challenged?

--
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Re:Maybe it's just me but isn't 515 pages too much by jobst · 2006-09-13 14:48 · Score: 1

its all about "perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(1 15),10);'". ;-) jobst

--
to code or not to code, that is the question.
Regex Coach by Anonymous Coward · 2006-09-13 18:24 · Score: 1, Informative

I can't believe no one has mentioned The Regex Coach at http://weitz.de/regex-coach/. While I totally agree with the 11/10 rating for MRE (I have the first and second editions), The Regex Coach is an invaluable prototyping and debugging tool for PCRE (Perl Compatible Regular Expressions). It runs on Windows and Linux and is free (but not open source IIRC).

Both the book and the tool are 100% essential to anyone writing any regex more complicated than /^foo\s(\w+)\sbar$/.
If you're interested in this book.... by beaviz · 2006-09-13 19:34 · Score: 1

Maybe you're also interested in another title by the very same author: "Mastering Line Noise" ...
Dupe by Ed+Avis · 2006-09-13 21:33 · Score: 1

The book was previously reviewed on Slashdot. Better just to note what's changed in the new edition.

--
-- Ed Avis ed@membled.com
Re:Maybe it's just me but isn't 515 pages too much by tehcyder · 2006-09-13 22:28 · Score: 1

Does anyone remember "Moby Dick" (hint: "Call me Ishmael ...")? It weighed more than Roseanne Barr/Arnold/Thomas because publishers charged more money for heavier books and thus encouraged the writers to write big books.
That is one of the most asinine opening comments I have seen recently on slashdot, so how come the post is modded as insightful?

--
To have a right to do a thing is not at all the same as to be right in doing it
Re:Maybe it's just me but isn't 515 pages too much by twbecker · 2006-09-14 00:52 · Score: 1

Of course, there's no regular expressions in that code. . .

--
"The problem with internet quotations is that many are not genuine" -Abraham Lincoln
Re:Third Edition? Already? by twbecker · 2006-09-14 00:56 · Score: 1

I was curious about what's new in the 3rd ed. as well since I just bought the 2nd about 2 months ago. From http://regex.info/:

What's New
New in the Third Edition are a new chapter on PHP (and upgraded PHP coverage throughout the core chapters), and a completely rewritten Java chapter to reflect changes from Java 1.4.0 to Java 1.5/1.6. Otherwise, there are only minor updates and typo fixes. (For example, if your interest is Perl or .NET, there's little new in the Third Edition that's not in the Second Edition.)

--
"The problem with internet quotations is that many are not genuine" -Abraham Lincoln
Re:Third Edition? Already? by jman.org · 2006-09-14 00:58 · Score: 1

Sounds like he added PHP & updated Java.

I've found the 2nd edition to be a great learning resource; since I do more PHP than Perl, have sprung for the 3rd. Used the readme.doc link off his site.

Would have to concur with what many others have said: This is one of the best written books I have seen, the author has a real gift for explaining things. I usually just dig into code & "figger" it out, but that can be veeeeery daunting when it comes to regex!
Virginia Tech computer science didn't teach them by ClioCJS · 2006-09-14 01:45 · Score: 1

Virginia Tech computer science didn't teach them either. Oh, we learned C, LISP, Prolog ... but mostly stuck to C. I've never used any of it since graduating, but I've used regexes a-plenty.

--
-Clio
Karma: Bad (mostly from not giving a fuck)
Blog: http://clintjcl.wordpress.com
Mastering Apostrophes by xandroid · 2006-09-14 02:39 · Score: 1

Maybe Simon should read "Mastering Apostrophes" next.

--
$ echo "ceci n'est pas une pipe" | sed -Ee 's/(eci n|pas )//g'
There are reasons. by Cybert4 · 2006-09-14 04:41 · Score: 1

Consider trying to nuke all CR characters. There's no straightforward way to do this--while a binary regexp would be quite easy.
1. Re:There are reasons. by Shanep · 2006-09-14 20:08 · Score: 1
  
  Consider trying to nuke all CR characters. There's no straightforward way to do this--while a binary regexp would be quite easy.
  
  You say it would be easy, but before you said...
  
  Anybody try doing regexes on binaries? They invariable muck up linefeed characters, no matter how I try to tell it not to.
  
  These seem to contradict. Are you saying that if a binary regexp util existed, that it would then be easy? Also, do you want to remove CR's or LF's? I am not sure what you are trying to do. You must remove CR's or LF's from anywhere in a binary, or only in the text portions of a binary? You realise that in a portion of data which is considered to be a string, that an LF is represented by a byte with a decimal value of 10, while a CR is represented by a byte with a decimal value of 13? And you realise that byte values of 10 and 13 are going to be found scattered all around a binary, including in lots of areas which do not represent a string and are thus portions of executable code or otherwise data which that code will use? You will also be striping them out, damaging the binary. In fact, even if you do just strip out CR's or LF's from only the text portion of a binary, you are still likely to damage that binary for any number of various reasons. If it is executable, you may have shifted code and thus broken jumps. If it is just a data file, you may have broken the format. And if if is either of those, the file may be checksum protected and then fail to run because of the resulting change in checksum.
  
  There are apparently 28 occurances of LF's in an OpenBSD copy of "yes"...
  
  # cat /usr/bin/yes | tr -cd "\n"| wc -c
  28
  
  Yet counting the number of LF's which have passed through "strings" from "yes", shows that there are probably 8 occurances which are not part of a string...
  
  # strings /usr/bin/yes | tr -cd "\n"| wc -c
  20
  
  What are you trying to achieve? If you want to modify text within a binary and can't get your hands on the source code, then you might be better off doing it by hand with a hex editor, as long as you are armed with the knowledge of how that binary file is formatted.
  
  If all you want to do is get a copy of the strings out of a binary, but also remove CR's or LF's from the output, then can you not use "strings"?
  
  Show the strings in a binary...
  # strings /usr/bin/yes
  
  Show the strings in a binary and remove the LF's...
  # strings /usr/bin/yes | tr -d "\n"
  
  Show the strings in a binary and replace the LF's with spaces, to make the text a little more readable...
  # strings /usr/bin/yes | tr "\n" " "
  
  --
  War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
2. Re:There are reasons. by Shanep · 2006-09-15 15:43 · Score: 1
  
  There are apparently 28 occurances of LF's in an OpenBSD copy of "yes"...
  
  I would like to clarify this statement. I don't really mean "LF's", instead what I mean is "bytes with a decimal value 10". Since a byte with a decimal value of 10 is only considered to be a line-feed if that byte is part of a string. Since it is an ASCII mapping from bytes (10 dec) to characters (LF). That mapping of course only coming into effect if a series of data is interpretted as a string. Not all bytes of 10 decimal are LF's.
  
  I figured that it would be obvious that I beleived that, from the rest of what I had written. But I just wanted to clarify, since that sentence it technically wrong as it stands.
  
  --
  War crimes, torture, lies, illegal spying... Would someone give Bush a blowjob, already, so he can be impeached?
Re:Maybe it's just me but isn't 515 pages too much by Darby · 2006-09-14 08:05 · Score: 1

I bet they parse as valid Perl too.

What doesn't?
Re:What is there to master? by mrchaotica · 2006-09-14 10:42 · Score: 1

I think that this discussion might reflect poorly on my school, and that's only partly what I intend. There's a lot they didn't do wrong too, and I'm definitely a much better computer scientist and programmer having gone through it. But, there's a fair amount they do wrong too, so this discussion has pulled out some of those complaints. (I also don't see how you can teach two semesters on C++ programming and cover neither virtual functions nor exceptions, but that's another discussion.) I'd be interested at comparing what I went through with other schools that seem like they should be in the same class.

Wow, you really did go to a bad school! No legitimate Computer Science curriculum has any business having a class "on C++" (or any other language, for that matter). Having a class that uses C++ to teach about OO or something is okay, but teaching technologies/languages instead of concepts is for vocational schools (e.g. DeVry).

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:What is there to master? by EvanED · 2006-09-14 11:58 · Score: 1

Wow, you really did go to a bad school! No legitimate Computer Science curriculum has any business having a class "on C++" (or any other language, for that matter).

Bull. Just about EVERY CS curriculum has a class I would say is "on" a language. What would you suggest they do, just dump you in a class on data structures and say "program this, oh and by the way, you'll have to go learn this language on your own?"
Re:What is there to master? by mrchaotica · 2006-09-14 12:44 · Score: 1

My first CS class at Georgia Tech was in Scheme, which has about the simplest syntax you can imagine. Therefore, we spent a few weeks on the basic syntax of that, and then spent the rest of the time learning about algorithms and data structures (e.g. recursion, lists and trees of various kinds, etc.). It's the algorithms and data structures that were the primary focus of the course, not merely the syntax of the language.

The second course I took was on Object-Oriented Programming. Again, we spent a few weeks learning Java syntax, and then the rest of the time was spent on OO concepts.

The third class was called "Languages and Translation," and we were expected to understand basic C (i.e., no pointers, preprocessor, etc.), not to mention UNIX and `make` (which didn't even get a lecture devoted to it -- we learned it in the first lab), within a week or two. Then, we learned about the fancier constructs over the semseter, in lectures interspersed with the "real" content of the course, which was stuff like how the hardware actually manages memory, RegExes, finite state machines, compilers, etc. (the final project for the class was to write an assembler for Tech's fake MIPS-like ISA, in C).

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz
Re:What is there to master? by EvanED · 2006-09-14 13:29 · Score: 1

My first CS class at Georgia Tech was in Scheme, which has about the simplest syntax you can imagine. Therefore, we spent a few weeks on the basic syntax of that, and then spent the rest of the time learning about algorithms and data structures (e.g. recursion, lists and trees of various kinds, etc.). It's the algorithms and data structures that were the primary focus of the course, not merely the syntax of the language.

There are a few schools that use Scheme or ML (as at CMU) in their intro to programming course. I think I'm in favor of this technique*, and it's what I would consider to be an exception to my statement before that CS curriculums have a class on a language.

The second course I took was on Object-Oriented Programming. Again, we spent a few weeks learning Java syntax, and then the rest of the time was spent on OO concepts.

But then again, without seeing your curriculum, I might consider this a class on Java. I know you're covering more than just Java in it, but then again all those CS classes I said do too. Even mine did; there was coverage of the STL and how various data structures were implemented. So it was in some sense a mini data structures class mixed with C++.

To bring this back to what we were talking about before though, I do sort of wonder if my curriculum is too... technicalish, DeVries-like if you will, but there was a fair amount of stuff that wasn't really too. (So you wrote a MIPS-like assembler in C? We wrote a MIPS disassembler in MIPS assembly, and then ran it on a VHDL model of the chip.) However, regardless of whether it's trying to be the start of a computer SCIENCE curriculum or more technical, it's not currently doing its job.

The intro courses are changing starting now, so hopefully that will improve the situation a bit.

* Actually just about a week ago this came up in conversation, and I've sorta been thinking about it since (and a bit before too). It has a few problems, but I think it's the best solution. For instance, you'd probably want an introductory course in a language that would be immediately practical for non-CS majors who still want or need programming experience; currently the intro CS course seems to usually serve both purposes, so this split would require adding a new course.
Re:What is there to master? by mrchaotica · 2006-09-14 13:47 · Score: 1

Actually just about a week ago this came up in conversation, and I've sorta been thinking about it since (and a bit before too). It has a few problems, but I think it's the best solution. For instance, you'd probably want an introductory course in a language that would be immediately practical for non-CS majors who still want or need programming experience; currently the intro CS course seems to usually serve both purposes, so this split would require adding a new course.

Unfortunately, Georgia Tech has changed their curriculum since I took the course. Now, they've got three different "intro to computing" classes: one for "normal" CS majors, which is based on Python; one for engineers (and scientists, etc.), which is based on Matlab, and one for "Computational Media" majors (which falls into the College of Computing, along with Computer Science), which is based on Jython.

Personally, I liked the "Scheme for everybody" approach better, because it focused entirely on the fundamentals of programming rather than on the domain-specific stuff the new classes do. In my opinion, even engineers ought to understand algorithms well, and I'm not convinced a Matlab-based course is sufficient for that.

Anecdote: my Statics professor (I'm a CS/C[ivil]E double major, and he's a both CE prof and the lead programmer for GT STRUDL) continually complains about the poor programming skills of engineering students, because all they know how to use is "Kindergarden Bill [Gates]'s" software. Apparently, what engineers really need to learn is still FORTRAN. : )

--
"[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz