(Useful) Stupid Regex Tricks?
careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"
To filter a string to make sure it's a valid ip address this regexp is quite useful.
/^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/
/^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/
And this one for mac addresses
(Useful) Stupid * Tricks
Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.
Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.
Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html
Unselfish actions pay back better
I actually like these. Nice little highly enriched concentrations of geekery on a single page. Think how long it might take to round up the sort of stuff that appears here by Googling.
Turing word: insipid
In a sentence: You find this page insipid but I find it inspiring.
Why couldn't Bill try out his regular expressions?
His mom wouldn't let him play with matches.
Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.
Slashcode is interestingly interpreting my formatting, but you should get the gist.
$text =~ /
^[A-Z]+ # starts with at least one capital letter
\s? # followed by an optional space
\d+ # followed by one or more digits
or $text =~ /
^\d+ # starts with one or more digits
\. # followed by a single decimal
or $text =~ /
\d+ # starts with one or more digits
\s # and a space
or $text =~ /
Thesis # starts with "Thesis"
\d{4} # then four numbers - year
\s+ # separated by at least one space
[A-Z]+ # from one or more capital letters
\d+ # followed by one or more numbers
or $text =~ /
\d+ # starts with one or more digits
\- # connected with a dash
\d+ # to one or more following digits
or $text =~ /
\d+ # starts with one or more digits
# followed by a space
[A-Z]* #followed by zero or more capital letters
\d+ # followed by one or more digits
I've never found regexes to be useful at all. I prefer to write my own parsers from scratch in assembly language, or conway's game of life, if I'm feeling m/(ambitious|artistic|autistic|masochistic)/.
But even an artist gets lazy sometimes.
This regex matches a number: interger or float, scientific notation or plain, plus or minus...
[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?
Colorless green Cthulhu waits dreaming furiously.
use Regex::Common qw(URI net);
$text_with_urls =~ m/$RE{URI}/;
$text_with_ips =~ m/$RE{net}{IPv4}/;
Build it, and they will come^Hplain.
I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15
OK, you asked for stupid tricks, but this one's just plain lazy.
Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!
/I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/
A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.
Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.
Even in a simple case like this, half the fun is in explaining it. :)
SSN
^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$
US phone with or without parentheses
^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$
ISO Date (19th to 21st century only)
^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$
Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.
In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.
I like it, but I've got a bookmark folder called "Slash-doc" where I store useful threads that contain a lot of information.
I've got a lot of threads bookmarked.
Best Practices for Process Documentation
How would you make a distributed Office system
Quality Open Source / Calendar / Messaging Systems
and some others.
Some of the information in the threads is out of date, but the ideas are useful and interesting to read. I need to go back through Ask Slashdot and get the more recent threads that seem to act as references
Check out my sysadmin blog!
#$%^&*(&^%{{}}{/\/\||```
(No, that's not a regex at all. And no, I don't even have a single girlfriend.)
8 of 13 people found this answer helpful. Did you?
(I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)
Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.
I came up with a Regex that can be used to match literally anything (yes, anything!). It is, therefore, the most flexible regex ever concocted. Here it is:
.*
IMHO, this is exactly the way that Slashdot should be going. Threads like this are interesting, add to the reservoirs of internet knowledge, and have the highest quality to noise ratios.
I (and I suspect many others) read Slashdot not for the latest +5 funny comment (though those can be fun to read) but to read the opinions of brilliant minds. And when those minds start trading secrets... Everyone wins.