(Useful) Stupid Regex Tricks?
careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"
To filter a string to make sure it's a valid ip address this regexp is quite useful.
/^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/
/^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/
And this one for mac addresses
(Useful) Stupid * Tricks
Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.
Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.
Stupid (Useful) Ask Slashdot tricks?
I'm not sure whether these are legitimate, or just a "I don't know what the hell I'm doing, so let's see if I can get someone else to show me how to do my job, under the guise of sharing information."
I'd like to say the former, but my cynicism is making me lean to the latter.....
"City hall" in German is "Rathaus" Kinda explains a few things......
Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html
Unselfish actions pay back better
MS Office does support regexp while not as good as Perl regex, they are very helpful.
.bas addon for regexp, which helped me alot.
Link to and excel
Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}
http://www.tmehta.com/regexp/using_functions.htm
please validate using the rfc and not your sketchy interpretation of an e-mail address. /.*@.*\..*/ will not cut it.
Try instead
([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x22)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c\\x00-\\x7f)*\\x22))*\\x40([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d))*
See the original at http://www.iamcal.com/publish/articles/php/parsing_email/
Why couldn't Bill try out his regular expressions?
His mom wouldn't let him play with matches.
Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.
Slashcode is interestingly interpreting my formatting, but you should get the gist.
$text =~ /
^[A-Z]+ # starts with at least one capital letter
\s? # followed by an optional space
\d+ # followed by one or more digits
or $text =~ /
^\d+ # starts with one or more digits
\. # followed by a single decimal
or $text =~ /
\d+ # starts with one or more digits
\s # and a space
or $text =~ /
Thesis # starts with "Thesis"
\d{4} # then four numbers - year
\s+ # separated by at least one space
[A-Z]+ # from one or more capital letters
\d+ # followed by one or more numbers
or $text =~ /
\d+ # starts with one or more digits
\- # connected with a dash
\d+ # to one or more following digits
or $text =~ /
\d+ # starts with one or more digits
# followed by a space
[A-Z]* #followed by zero or more capital letters
\d+ # followed by one or more digits
I've never found regexes to be useful at all. I prefer to write my own parsers from scratch in assembly language, or conway's game of life, if I'm feeling m/(ambitious|artistic|autistic|masochistic)/.
But even an artist gets lazy sometimes.
This regex matches a number: interger or float, scientific notation or plain, plus or minus...
[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?
Colorless green Cthulhu waits dreaming furiously.
use Regex::Common qw(URI net);
$text_with_urls =~ m/$RE{URI}/;
$text_with_ips =~ m/$RE{net}{IPv4}/;
Build it, and they will come^Hplain.
8 of 13 people found this answer helpful. Did you?
I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15
OK, you asked for stupid tricks, but this one's just plain lazy.
Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!
/I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/
A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.
Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.
Even in a simple case like this, half the fun is in explaining it. :)
my $re = '';
$re = qr/
\{ (?:
(?> [^{}]+ ) # nao-chaves
|
(??{ $re }) # sub-bloco de chaves
)* \}
factor 966971: 966971
SSN
^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$
US phone with or without parentheses
^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$
ISO Date (19th to 21st century only)
^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$
Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.
In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.
In most regex engines, you should be able to do this with backreferences. I don't use them often, but I think something like this would work:
I think the reason the example you gave using \1 didn't work is because the .* was too greedy, and ate up the rest of the pattern before the \1 got a chance to match. Also, when you're doing full line matching, it's always good to think about ^/$ and whether you're using any multiline modifiers.
have you been seen on slash?
#$%^&*(&^%{{}}{/\/\||```
(No, that's not a regex at all. And no, I don't even have a single girlfriend.)
8 of 13 people found this answer helpful. Did you?
(I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)
Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.
You are a great candidate for the Useless Use of Cat award... specially endearing is your making a comment on the few commands your line uses :D
Dear slashdot editors,
slashdot.org is not stackoverflow.com.
The articles and discussions here are not searchable in a sane way. Your recent attempts to mimic stackoverflow are just a waste of everybody's time because all those little tidbits that people post get lost in the internet noise immediately.
We know you're bit desperate for traffic these days. But this is not the way to go.
I came up with a Regex that can be used to match literally anything (yes, anything!). It is, therefore, the most flexible regex ever concocted. Here it is:
.*
Your regex doesn't allow + signs in the name part.
Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.
And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.
Jeremy P <jeremyp@example.com>
Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
>> Standing on head makes smile of frown, but rest of face also upside down.
IMHO, this is exactly the way that Slashdot should be going. Threads like this are interesting, add to the reservoirs of internet knowledge, and have the highest quality to noise ratios.
I (and I suspect many others) read Slashdot not for the latest +5 funny comment (though those can be fun to read) but to read the opinions of brilliant minds. And when those minds start trading secrets... Everyone wins.