Mastering Regular Expressions
Simon P. Chappell writes "Classics are funny things, especially in the world of books. There are books that people say "should' be classics (I'll refrain from mentioning names to protect the pretentious) and then there are books that people are too busy actually using to get around to listing as classics. Mastering Regular Expressions, now in it's third edition, is in the second group. It's one of those books that you see on desks in computer departments the world over. This is a real "doers" book." Read the rest of Simon's review.
Mastering Regular Expressions
author
Jeffrey E.F. Friedl
pages
515 (31 page index)
publisher
O'Reilly
rating
11 out of 10
reviewer
Simon P. Chappell
ISBN
0596528124
summary
A classic of modern computer literature.
This is a book for programmers; managers, project managers and architects need not apply. If you talk about code instead of writing it and have teams of programmers report to you, then consider buying this book and giving it to them. If you're a technical lead or lead programmer, then shame on you if an earlier edition of this book isn't already on your shelves! The majority of examples are written using Perl, but if you can read basic Perl (Pidgin Perl, perhaps?) then you'll be fine with the examples. Programmers in PHP, Java, .NET and Ruby also have dedicated sections of the book, so it's very inclusive and almost platform agnostic.
The book has ten chapters divided into two parts. Chapters one through six are what Mr. Friedl calls the "story" of regular expressions. Chapters seven through ten are an examination of the specific regular expression capabilities of Perl, Java, .NET and PHP.
Chapter one is an introduction to regular expressions. At only 33 pages, you might think that it would be shallow, but rather, it is knowledge dense. The examples in the first chapter use egrep extensively. This makes a lot of sense as it's an advanced tool, easy to use and freely available for most modern operating systems.
Chapter two builds on this introduction with extended introductory examples. These are written in Perl (again, simple and easy to follow), but there is no doubt that the regular expressions are the stars of the show around here. The examples are small Perl programs, but their benefit is that Mr. Friedl talks the reader through the process of creating each of them. This is more useful than just presenting example programs, because with just pure examples, you are out of luck if your specific problem is not covered. With this approach, you're coached towards thinking in regular expressions and are more equipped to address your personal regular expression needs.
Chapter three provides an overview of regular expression features and flavors. It starts with a historical view of the development of regular expressions, including a few asides about the influence that the earlier versions of the book have had on that development. After that, the chapter uses a search and replace example to demonstrate some of the differences between flavors of regular expression capabilities provided by different programming languages. Strings, Unicode and metacharacters round out this overview.
Strap yourself in for chapter four; it's time to talk about the computer science that makes all of that matching work. If you didn't know the difference between an NFA and a DFA regular expression engine before you start this chapter, you most certainly will by the end of it. At first sight, it might seem that this is chapter for the pure propeller heads amongst us. While there is much theory here, it's all presented in the light of how your regular expression engine is trying to do what you asked. By understanding the approaches to regular expression processing, we can learn to help ourselves. We help ourselves when we write regular expressions that run faster and use less memory. We write better regular expressions when we understand the consequences of what we write. For example, the oft written ".*" (dot star) seems like a great way to ignore a bunch of stuff in the middle of an expression, but such simplistic use is just waiting to bite you. This chapter explains why and how to deal with the situations where you'd be tempted to use simplistic expressions and how just a little extra thought can bring you the behavior you want.
Chapter five is a practical counterpoint to the previous theory chapter. Here, Mr. Friedl discusses practical regular expression techniques. There are a number of short examples, before he works through medium sized HTML processing examples and finished up with a look at processing Comma Separated Value (CSV) data.
Chapter six is efficiency. Your regular expression can be as correct as you like, but if it takes what seems like eternity to run, then it's of little use. This chapter mostly addresses NFA based engines, because they have the greatest variability based on how the regular expression is written.
Chapters seven through ten cover the specifics of using regular expressions in Perl, Java, .NET and PHP. They're well written and cover everything you need to apply the content of the first six chapters to your programming language of choice.
Everything about this book is great. This is the kind of book that O'Reilly built its reputation with. A master of the subject matter, writing in a clear, easily understood manner, leaving the reader educated and able to operate comfortably with the subject matter. I may not be a regular expression guru, but I feel that I have a much better grasp of the fundamentals that I would need if I did want to be such a guru.
Mr. Friedl is to be commended for his clear explanations of what is, in all reality, much more complex computer science than many of us are used to dealing with. The fact that his explanations are highly readable and enjoyable is a significant bonus.
There is a website for the book, regex.info and a blog at regex.info/blog, where Mr. Friedl has some wonderful photographs of Japanese gardens with their autumn colors. (Nothing to do with regular expressions, but they appealed to my inner photographer.)
Lastly, while the book is not intended to be an encyclopedia of regular expressions, all of the examples are very relevant to programmers needs and this book can easily serve that reference role.
At the risk of sounding like some kind of O'Reilly shill or a relative of Mr. Friedl, I must report that I don't think that I found a single thing I didn't like about this book.
This is a classic of the first order. Nail it to your desk unless you want to be constantly retrieving it from your co-workers. If I might be permitted a Spinal Tap reference, this one goes to eleven. If you ever use regular expressions, are thinking of using regular expressions or are in the same room as a regular expression, then you need this book.
You can purchase Mastering Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This is a book for programmers; managers, project managers and architects need not apply. If you talk about code instead of writing it and have teams of programmers report to you, then consider buying this book and giving it to them. If you're a technical lead or lead programmer, then shame on you if an earlier edition of this book isn't already on your shelves! The majority of examples are written using Perl, but if you can read basic Perl (Pidgin Perl, perhaps?) then you'll be fine with the examples. Programmers in PHP, Java, .NET and Ruby also have dedicated sections of the book, so it's very inclusive and almost platform agnostic.
The book has ten chapters divided into two parts. Chapters one through six are what Mr. Friedl calls the "story" of regular expressions. Chapters seven through ten are an examination of the specific regular expression capabilities of Perl, Java, .NET and PHP.
Chapter one is an introduction to regular expressions. At only 33 pages, you might think that it would be shallow, but rather, it is knowledge dense. The examples in the first chapter use egrep extensively. This makes a lot of sense as it's an advanced tool, easy to use and freely available for most modern operating systems.
Chapter two builds on this introduction with extended introductory examples. These are written in Perl (again, simple and easy to follow), but there is no doubt that the regular expressions are the stars of the show around here. The examples are small Perl programs, but their benefit is that Mr. Friedl talks the reader through the process of creating each of them. This is more useful than just presenting example programs, because with just pure examples, you are out of luck if your specific problem is not covered. With this approach, you're coached towards thinking in regular expressions and are more equipped to address your personal regular expression needs.
Chapter three provides an overview of regular expression features and flavors. It starts with a historical view of the development of regular expressions, including a few asides about the influence that the earlier versions of the book have had on that development. After that, the chapter uses a search and replace example to demonstrate some of the differences between flavors of regular expression capabilities provided by different programming languages. Strings, Unicode and metacharacters round out this overview.
Strap yourself in for chapter four; it's time to talk about the computer science that makes all of that matching work. If you didn't know the difference between an NFA and a DFA regular expression engine before you start this chapter, you most certainly will by the end of it. At first sight, it might seem that this is chapter for the pure propeller heads amongst us. While there is much theory here, it's all presented in the light of how your regular expression engine is trying to do what you asked. By understanding the approaches to regular expression processing, we can learn to help ourselves. We help ourselves when we write regular expressions that run faster and use less memory. We write better regular expressions when we understand the consequences of what we write. For example, the oft written ".*" (dot star) seems like a great way to ignore a bunch of stuff in the middle of an expression, but such simplistic use is just waiting to bite you. This chapter explains why and how to deal with the situations where you'd be tempted to use simplistic expressions and how just a little extra thought can bring you the behavior you want.
Chapter five is a practical counterpoint to the previous theory chapter. Here, Mr. Friedl discusses practical regular expression techniques. There are a number of short examples, before he works through medium sized HTML processing examples and finished up with a look at processing Comma Separated Value (CSV) data.
Chapter six is efficiency. Your regular expression can be as correct as you like, but if it takes what seems like eternity to run, then it's of little use. This chapter mostly addresses NFA based engines, because they have the greatest variability based on how the regular expression is written.
Chapters seven through ten cover the specifics of using regular expressions in Perl, Java, .NET and PHP. They're well written and cover everything you need to apply the content of the first six chapters to your programming language of choice.
Everything about this book is great. This is the kind of book that O'Reilly built its reputation with. A master of the subject matter, writing in a clear, easily understood manner, leaving the reader educated and able to operate comfortably with the subject matter. I may not be a regular expression guru, but I feel that I have a much better grasp of the fundamentals that I would need if I did want to be such a guru.
Mr. Friedl is to be commended for his clear explanations of what is, in all reality, much more complex computer science than many of us are used to dealing with. The fact that his explanations are highly readable and enjoyable is a significant bonus.
There is a website for the book, regex.info and a blog at regex.info/blog, where Mr. Friedl has some wonderful photographs of Japanese gardens with their autumn colors. (Nothing to do with regular expressions, but they appealed to my inner photographer.)
Lastly, while the book is not intended to be an encyclopedia of regular expressions, all of the examples are very relevant to programmers needs and this book can easily serve that reference role.
At the risk of sounding like some kind of O'Reilly shill or a relative of Mr. Friedl, I must report that I don't think that I found a single thing I didn't like about this book.
This is a classic of the first order. Nail it to your desk unless you want to be constantly retrieving it from your co-workers. If I might be permitted a Spinal Tap reference, this one goes to eleven. If you ever use regular expressions, are thinking of using regular expressions or are in the same room as a regular expression, then you need this book.
You can purchase Mastering Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
...but it seems funny that someone signing himself as Simon P. Chappell would worry about "protect[ing] the pretentious".
Trolling is a art,
The first two editions were also great books. An indispensable resource for sure and mandatory reading for my devs.
I wish the review would have addressed which edition of the book was being discussed. I assume the 3rd because that's where the bn.com link points. I'm sitting here looking at my 1st and 2nd editions. I wonder if I should spring for the third. Bummer--I just bought the 2nd a few months back.
What did one regex say to the other?
.+
I bought this (along with a few other O'Reilly titles) a couple months back, and I highly recommend Mastering Regular Expressions. Even though it's a dry technical topic, the presentation is awesome.
I read through the whole thing as if it were a novel, and picked up more than a few new things about regexes.
Very handy book, both to read through to really learn how regexes work, and as a day-to-day reference. The score of 11/10 given by the reviewer is bang on.
are always sad
It's a simple concept with a complete and concise specification. Regular expressions are pretty much the first thing you learn in computer science. It doesn't take long and covers much more than you would have to know to simply use them. A whole book about regular expressions might be pushing the page count a little too far...
Read the ./ summary above the actual review.
Third sentence.
"Mastering Regular Expressions, now in it's third edition, is in the second group."
The author and the reviewer are blatantly biased in favour of the regular expressions, ignoring the plight of the millions of downtrodden irregular expressions who are not able to get a platform to voice their grievances. All because they are viewed as somehow deviant or deficient. It is time for the irregular expressions to come out of the closet and assume their role as legitimate members of the syntax.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
I own an older version of this book and it really rocks.
As usual, Amazon has it cheaper than BN ($29.69 vs $35.99).
Expert Java EE Consulting
All that nice formal theory about what Regular Expressions are, and what they're capable of goes out the window with Perl; because what they call "regular expressions" aren't.
Throw away those nice formalisms, because they don't apply. Well, technically they do, but only because your machine is itself secretly a Finite State Machine with wild-eyed dreams of becoming a real Turing Machine someday, if only it could somehow manage the infinite RAM.
From the review:
;-)
Mastering Regular Expressions, now in it's third edition, [...]
I'm keeping my second edition though. My book fetish is already expensive enough without buying every edition and reprint...
!ERR: Signature not found.
I just like to go to http://www.regular-expressions.info/ myself - I seem to find all the stuff I forget from time to time there...
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
Their are and continue to be a LOT of people that start off dablling, get a job and become full time programmers without EVER getting a formal education in computer science. You can easily get a 60k job doing that. While someone with a good formal education can get paid a lot more, these people are still programmers.
excitingthingstodo.blogspot.com
Now, we have a regular-expression primer that has 515 pages. Is the publisher earning more money by producing a bigger book?
The only information that most people need is contained on a small web page. Armed with the information on that web page, the beginner can learn best by doing: writing various regular expressions in short Perl programs and determining whether they do what you want them to do.
When I read the 2nd edit of this book I was floored by how much richness I was missing in the regex language (well, in Perl regex, that is).
/endrant
Like I kid at christmas, I immediately went nuts on my next project with \G and the lookaround operator(s).
Sadly, when a big bundle of code I wrote was delivered to a team in a city on another very large eastern continent, no one could understand what I had written, so they deleted my nifty \G loops and replaced it all with a crappy first-year-college-grad-non-indented parsing state machine using gotos. The complaint was not that I went nuts with regex, but that I was using NONSTANDARD perl version which supported them (instead of their ancient version!), and that it was my duty to deliver a tool using standard versions. I was most angry at the fact that they just replaced the code with a buggy state machine, and then asked me to debug another problem caused by their mess because it was my tool originally. Ugh!
Anyway, my point is: (perl) regex are a far richer tool than meets the eye, but beware The Boneheads: the people who refuse to learn something new that could make their life easier and cling to the old way. Gawd forbid someone learn something new on the job.
Sigh. I was hoping at least ONE programmer over there would have shared my enthusiasm for \G.
https://www.accountkiller.com/removal-requested
This is slightly offtopic, but its regex related. Where are the regex training programs for windows/linux? Or even regex tools to parse data and help you design your expressions?
Seems like a typical thing thats always overlooked. I saw regex buddy for PC, but it missing awk/sed/bash regex.
While reading a book helps, a tool for the inexperienced would help train and get the job done.
Ah yes. There is is. Must have been the incorrect use of "it's" that confused me.
Does anyone know what is new in the 3rd edition? This is missing from the review.
There are books that people say "should' be classics (I'll refrain from mentioning names to protect the pretentious)
I'm not going to refrain.
The Three Musketeers, Alexandre Dumas
Pride and Prejudice, Jane Austen
David Copperfield, Charles Dickens
Look at me, I'm being pretentious!
Qualitas edurus commercium, nullus penitus net rimor, nullus deus beneficium
/me sees the "is is" in my last comment. Argh.
Apparently I need to stop posting today, since I can't seem to get anything right.
Isn't it a given that there is always at least one typo in a post commenting on a typo? ;-)
Well, I'm off to bed, thanks for the heads up on the time...
!ERR: Signature not found.
Mastering Regular Expressions
This book would be a great gift idea for Al Gore.
The theory of relativity doesn't work right in Arkansas.
s/is/it/
A book on regular expressions? What, is the Internet broken?
Damn I'm old. I remember when classics were such books as Ivanhoe, The Virginian, the Iliad and the like. Never ever a technical manual no matter how well written it may be. Not to say one couldn't be. That is if you threw in a bit of sword/gun play and a love interest or so.
IF you can't be famous be infamous. But for GODS sake be something
So, why is ".*" bad, other than that you sometimes want Perl's non-greedy ".*?" instead?
Now I'm curious (but still too cheap to buy the book).
Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems.
I'll probably be modded down for this...
To quote: "Sometimes a hacker has a problem, and he thinks to himself 'I know, I'll solve it with a regular expression!'. Now he has two problems." -- Jamie Zawinski
Sometimes I doubt your committment to SparkleMotion!
Spoken like a true recent graduate.
I'm already quite proficient at regexen (people at work come to me for help etc). How much do I stand to gain from this book?
I rarely criticize things I don't care about.
.... wordprocessor and email program with a regular expression!
PS: not really but wouldn't that be feckin' awesome! it was emacs... if I really had done it I mean.
-pyrrho
that is me.. i wrote code for a while but was mainly a sys admin - got a job that now i write code all day.. decided to go to school for it .. on one of the asignments the prof docked me for using a regex for finding links in pages instead of a fsm - because he didn't teach them in class.
that pissed me off..
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
I dunno about the latest edition, but a large percentage of people I interview have a computer science degrees, are total dumbasses, and don't know a regular expression from their own ass.
The more advanced the CS degree, sometimes, the more significant the dumassery.If you don't know what a regular expression is, at least admit to using a cookbook.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
Woohoo! Oh, wait... gardens. Umm... no thanks, dude.
Proud neuron in the Slashdot hivemind since 2002.
So regular expressions are evil because they're too hard to maintain? If that's you're argument, you need to come up with an alternative that isn't time consuming to code and doesn't require advanced skills that are difficult to master. Good programmers don't hand code fancy solutions any more often than they have to. They rely on well-documented, well-tested language features and APIs. Which describes Perl regular expressions to a T, whatever their shortcomings.
Anyway, Perl regular expressions don't have to be "line noise". That's just the way sloppy people are used to coding. Perl actually allows you to create a clearly formated regular expression in which the structure is pretty obvious, with a little commenting. It does this by providing high-level metacharacters, and by allowing you to use blanks for formatting instead of representing blanks.
I cringe every time I hear people using regular expressions to parse HTML or CSV or whatever ... why don't you just use an already made parser to do that job? You'll get it done sooner it will work faster and it will more maintainable, extensible than some crazy-ass-convoluted-line-noise that "seems" to do the job for now ...
FSM and regex are equivalent.
Or maybe English is not my first language?
excitingthingstodo.blogspot.com
I recall the 1st Perl Conference in San Jose many years ago. Jeffrey spoke to a standing room only crowd and gave an amazing lecture, despite his late flight in the night before and being sick.
I bought the book (1st edition) and was absolutely amazed at the richness of the book! I imagine most of it is the same, with additions put in for various languages, etc. that have come along since then.
The visual presentation on how the parsing worked was worth the cost of admission to that show. I used principles from it immediately and made Perl that much more powerful for me at the time.
Since then (cue the dramatic organ music), I've become a pointy haired boss...but without the staff (I'm a policy guy now), and I still use vim with some principles from the book to hack/whack through tables, etc.
I'd put this at an 11 if it's anything like the 1st edition.
Anybody try doing regexes on binaries? They invariable muck up linefeed characters, no matter how I try to tell it not to.
By understanding the approaches to regular expression processing, we can learn to help ourselves.
Which is why i would reccomend Assembly Language Step By Step, by Jeff Duntman for any programmer. It's easy to learn, and is merely a preparation for Assembly, but would be great for all programmers, if only to know the difference between CS and DS, near calls and far calls, and the like.
The only thing i don't understnad about regular expressions, is why they have to be so cryptic. Wouldn't it be easier to debug if the patterns were a little more clear?
Have you read my journal today?
The last 300 pages are actually a single regular expression.
What, did you get your degree in the 50's? Regular expressions have been part of computer science curicula for a loooong time.
You are obvious a newbie regarding regular expresions, based on your post.
First, 515 is not too much when talking about regular expressions. There is much to be discussed, not to mention tips&tricks to give away.
Also, you are deadly wrong about the "small web page". First, it only talks about Perl Regular Expressions. There are other kinds, including the classic (basic?), extended, posix and (from your reference) perl regular expressions. Mastering the different kinds is enough to fill 300 pages of the book.
Where are you going to use REs ? sed ? VI ? perl ? php ? C ? SQL ? You need to know what flavor of REs you need for that particular environment.
Regular expressions is a very tricky topic, and understanding them is not something easily acomplished. Come to think about it, 515 might not even be enough.
morcego
what does the flying spaghetti monster have to do with html screen-scraping?
This is a tautology.
Sometimes I'd like to use the power of regexps while still thinking of the file as a binary.
Who needs a book when you can create regular expressions quickly and painlessly with Kodos http://kodos.sourceforge.net/ ?
Thanks for clarifying. I thought your post was a subtle play on RE's, Bill Clinton, and Max Headroom!
Test 1 2 3 4
We've got all three editions of this book in our office and they keep getting better. As the review says, this book will teach you the difference between a DFA and an NFA engine if you want to learn that, or just how to do some simple capturing if that's all you need. Friedl's writing is very approaching and the book's notation for showing what part of a string a regex will select is very helpful.
And this stuff comes up over and over - if you ever need to tweak a JavaCC grammar knowing how to specify a DFA vs a NFA can make a nice performance difference. Great stuff!
The Army reading list
I am glad to see this on Slashdot since regular expressions is an area that geeks could really use help in.
For example, instead of saying the common geek expression "Greetings Program!" try a more regular expression such as "Hello Sir" or the more casual "Wassup?" IRL, Tron references are not considered cool. Another common faux pas is using the expression "Hey n00b, what's your function?" instead of something more regular like "Hey dog, what's your problem?" If someone tries to threaten you, think about their technical skills before saying "Close your port before I pwn j00!" Life is not an FPS. "Shut up before I kick your ass" works very well.
Give a try to my web-based tool, Regex Powertoy. Its interface is all DHTML/CSS/Javascript, but requires a hidden Java (1.5) applet for the advanced and steppable regex engine.
Given that Java core, there are options for adding/removing usual Java literal escaping, which in Java code means lotsa backslashes. Not all Perl advanced features are supported.
I hadn't considered a pick for awk/sed/bash syntax limits/conversion but will consider it. Any handy reference to how their syntax differs from Perl/Java? (The thing that usu. bites me with sed is escaping of parentheses.)
"A classic is something that everybody wants to have read and nobody wants to read."
One of my favorite Mark Twain quotes...
Be a PATRIOT--because the only thing we have to fear is the lack thereof.
Interesting guess, I've been making 75-100K for the past ten years. Not having a degree cost me $5,000/yr on my first job, but I don't think it's been too much of an issue since.
In my experience, going to college has no relationship to being a good programmer. Most of the better programmers I've known that did go to college studied music. Most programmers on teams I've lead coming straight out of college have to be retrained anyway--some have taught me a little (First heard of design patterns from one of 'em. Didn't learn any new patterns, but knowing the names helps when someone says "describe the listener pattern" in an interview).
Of course I did have a couple years at a jr college (no degree, no CS course available back then) nad a couple years of Navy training.
My brother, however, never graduated highschool or got his GED. He's CTO for a company of a few hundred people in southern california. I assume he makes more than me, but I could be wrong.
As for regular expressions--WHY. You can almost always come up with a better, more expressive way to do something, why not do it that way? It's certianly the most difficult to learn user interface ever invented (Yes, the CLI of Grep is a user interface).
I've virtually never needed them. The few times I have needed them was more because the tools only supported RE's when every task I've ever needed them for could have gotten away with a much simpler interface, but used REs because the programmer didn't understand about UIs and the real world.
It may save you a few keystrokes, perhaps quite a few, but when I need that type of functionality in code I can be much more expressive and make my code more readable if I use more basic tools and loop--and I'll still be more efficent than RE processing.
If you're really into saving a few keystrokes, you should investigate APL, it saves a LOT of keystrokes over nearly every language available, and is just about as readable as Regular Expressions.
Years ago I was calling around to bookstores looking for this book. A few bookstore employees asked me if it had a lot of pictures. They thought is was a book for people who have trouble communicating. Like knowing when to say,'hi' vs. 'hello' or somehting. sheesh. Now granted many people who read this book may be socially challenged, but this book won't help that.
"It's because they're stupid, that's why. That's why everybody does everything." -Homer Simpson
If you wanted to learn or develop some regexes, you sat down with regex(7) open in one terminal and an interactive perl in another window to test them out.
It never occured to me that I would need or want a tool to generate them. It's not like they're that hard to comprehend. (Although they can be a pain to document... thankfully perl allows you to add whitespace and comments to a regular expression so it can make sense to a third party)
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
I bet they parse as valid Perl too.
I bought this book years ago and still can't STFU about it, sorry.
At my previous job (web-based custom market research) we did hundreds of web surveys which had on the average some 400 data points per survey. These had distinct variable names, etc. and were built 100% by hand when I was hired in the company some time in 2002. My first survey project was a disaster, it took me about 20 hours from the final approved survey document to the dynamic version. The process was riddled with manual steps that created an infinite amount of room for errors.
Enter regular expressions.
While fiddling with BBEdit Pro I finally decided to take a shot at regular expressions. After an hour or so of experimenting I started writing a few filters that allowed me to cut down the turnaround from 20 hours per survey to a little over 10 hours. When I got to the point in which I wasn't able to figure things out from the BBEdit documentation and he web, I convinced the boss to buy me Mastering Regular Expressions.
Within the first 50 pages, I had picked up on additional regular expressions concepts that allowed me to eventually cut down the turnaround per survey to less than 8 hours. That's not 8 hours programming, that's 8 hours from the moment the approved survey is handed over to programming to the moment it passes QA checks and is considered ready to go live.
This was a $50 or so book, and it saved us thousands of dollars over the four years I worked at that company. Of course, my reward for saving the company all that money was to lay me off, and I "forgot" to leave instructions on how to use the text filters, so I imagine my replacement is right now writing surveys by hand.
Some of the things that proved to be killer uses for regular expressions within that context:
1. The approved survey would have specific variables that the analysts would need to keep for importing into SPSS later down the process. A text filter picks up those variables and generates a unique list of every variable needed for he survey. The variables are named with specific patterns, so you know which ones are strings, integers, etc.
2. Now that we have a list of variables, it means we can quickly generate the CREATE TABLE statement for the survey data. What used to be done by copying and pasting 400 times is (was?) now done by highlighting the text and running a macro. The output is the SQL command you need.
3. Since you already have the list of variables, you can generate the 400 statements needed to read each form variable into its proper variable in the asp code.
4. The same way you can generate the hidden form fields that you need.
5. The same way you can generate the INSERT statement to send your data to he database.
Little things like that. Eliminating all that copying and pasting really cut down on the QA overhead per project.
Pedro
----
The Insomniac Coder
Well, I hate to say it, but I agree with the Prof. There are really two worlds in computer science: academia and work.
Pretty much _all_ assignments that will be given in CS courses can be solved quite easily by using a library that implements a solution. In the working life, that would be the proper solution, but not so in school.
Of course you can just call a class in your standard library that implements regular expressions and solve a problem that way. But that's not why you're in college. You ALREADY know how to call a library that someone else wrote. Calling libraries is trivial, you can pick that up with a few pages reading and some practice. The Professor isn't there to teach you how to call libraries though. What you're supposed to take away from the class is the understanding of how the class does the work.
Finite state machines are the underlying theory of regular grammars (See: Chomsky hierarchy of languages.) So if the class covers how FSM's work, and what their usefulness is, then you should try to actually apply that knowledge to the problem. The assignment isn't so much one of "find the answer" (nobody cares about the answer) but one of "apply the theory" and learn something new.
One day you'll find come across a similar problem that is very similar to regular expressions, but not quite like it, and you may remeber this assignment and write a FSM to solve it, and you'll be glad for it.
It's like you're learning about sorting algorithms, and then you come along and use Collection.sort() instead of writing your own quicksort (and understanding the algorithm while you do so.)
I agree with you.. But if you are going to be grading someone directly on it then you need to state to use a FSM..
the project did not say you had to do it a certin way.. only that you had to do it and that it worked.
if it had said "Use a FSM" then the prof would have been right but it didn't so it pissed me off
'...if only "Jumping to a Conclusion" was an event in the Olympics.'
at $9.99 a month [$13.50 AUS], I can't lose, always getting stuff to read
"Nondeterministic finite automata" is well defined in comp-sci
and Friedl has it wrong. The set of languages accepted by NFA's
is exactly the same as the set accepted by DFA's.
Perl's engine and its brethren use search-and-backtrack. They
accept a lot things that are not regular expressions. Such
engines don't have much theory behind them, and it's hard to
reason generally about what they do and how fast they do it.
You've mastered an entire field of computation by reading a short introduction to one implementation? I think I've fixed your code in about a dozen different companies!
how to invest, a novice's guide
No need for the 3rd edition unless you desperately need the extra 40-odd pages on PHP regexes. That's the only difference between the 3rd and 2nd editions as far as I can tell.
One of my favorite articles on the web about regular expressions is How Regexes Work by Mark-Jason Dominus. It's a great article if you're at the point where you already have some experienceusing regular expressions, but you want to gain some insight into how they do what they do. I found that after I read this article it was easier for me to come up with cleaner regexps more quickly.
I haven't read the book being discussed. It probably covers the same stuff, but I found M-J D's article easy to read, short, and very informative.
"Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems."
Stupidity is like nuclear power, it can be used for good or evil. And you don't want to get any on you.
Hint: The book is called Mastering Regular Expressions. It's not meant to be be an introductory primer (though it does include one) but a detailed look at uses of regular expressions, variations between systems, and advanced techniques as well. Sysadmins and programmers often have to deal with many Regexp systems and a book like this really helps. I know I use it quite often.
Honest. I'd learned HTML and Dreamweaver 4 had a search and replace facility for using these weird hieroglyhics for specifying patterns. "Dreamweaver 4 Bible" referred to them as regular expressions, citing the 1st edition of Jeffrey Friedl's book and I found a copy in the local (Islington/London) library. I was fascinated by the book and read it day after day. Since Perl seemed to be THE regex language I soon developed a fascination with Perl through Larry Wall's "Programming Perl". My "web design" career soon became a "web development" career when I learned how to build end-to-end web applications with Perl and MySQL. So, thanks to you, Jeffrey Friedl, I now have a much wider skill set.
It sounds like the book covers just about every scripting language but Python. That's a shame, because Python is very good for text processing and has strong regex capabilities. So, a plea from the Pythonistas, if you please--include a section about Python in the 4th edition!
Can someone explain the joke for the humourously challenged?
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
its all about "perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(1 15),10);'". ;-)
jobst
to code or not to code, that is the question.
If you want a good treatment of Regular Expressions, I've never seen a better one than that found in Michael Sipser's "Introduction to the Theory of Computation." It is clear, easy to follow, and quite exhaustive. He also doesn't take 500 pages to get the point across.
I can't believe no one has mentioned The Regex Coach at http://weitz.de/regex-coach/. While I totally agree with the 11/10 rating for MRE (I have the first and second editions), The Regex Coach is an invaluable prototyping and debugging tool for PCRE (Perl Compatible Regular Expressions). It runs on Windows and Linux and is free (but not open source IIRC).
/^foo\s(\w+)\sbar$/.
Both the book and the tool are 100% essential to anyone writing any regex more complicated than
Maybe you're also interested in another title by the very same author: "Mastering Line Noise" ...
Save yourself $1.80 by buying the book here: Mastering Regular Expressions. And if you use the "secret" A9.com discount, you can save an extra 1.57%!
The book was previously reviewed on Slashdot. Better just to note what's changed in the new edition.
-- Ed Avis ed@membled.com
To have a right to do a thing is not at all the same as to be right in doing it
Of course, there's no regular expressions in that code. . .
"The problem with internet quotations is that many are not genuine" -Abraham Lincoln
"The problem with internet quotations is that many are not genuine" -Abraham Lincoln
Sounds like he added PHP & updated Java.
I've found the 2nd edition to be a great learning resource; since I do more PHP than Perl, have sprung for the 3rd. Used the readme.doc link off his site.
Would have to concur with what many others have said: This is one of the best written books I have seen, the author has a real gift for explaining things. I usually just dig into code & "figger" it out, but that can be veeeeery daunting when it comes to regex!
Virginia Tech computer science didn't teach them either. Oh, we learned C, LISP, Prolog ... but mostly stuck to C. I've never used any of it since graduating, but I've used regexes a-plenty.
-Clio
Karma: Bad (mostly from not giving a fuck)
Blog: http://clintjcl.wordpress.com
There are two jokes in this thread so I'll try to explain them both from my best of knowledge.
.+
.* of it funny.
Joke: What did one regex say to the other?
Answer:
Explanation: In regular expressions (which is what the book is about) . matches any character but the newline. + matches 1 or more of the previous items. Therefore what did one regex say to the other? Basically you fill in the blank.
Joke: You're not a geek if you don't find
Explanation: In regular expressions, * matches 0 or more of the previous items. Therefore he's saying that you're not a geek unless you find none or all of the previous joke funny.
Does that help? Reminder: I'd never go out and claim that computer scientists that try to tell jokes are actually going to be funny. But they try.
Maybe Simon should read "Mastering Apostrophes" next.
$ echo "ceci n'est pas une pipe" | sed -Ee 's/(eci n|pas )//g'
Consider trying to nuke all CR characters. There's no straightforward way to do this--while a binary regexp would be quite easy.
I bet they parse as valid Perl too.
What doesn't?