(Useful) Stupid Regex Tricks?
careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"
To filter a string to make sure it's a valid ip address this regexp is quite useful.
/^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/
/^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/
And this one for mac addresses
(Useful) Stupid * Tricks
Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.
Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.
format c:*.*
You see yourself in digg.com. You are likely to be eaten by a grue.
-- Por mais que eu ande no vale das trevas e da morte, meu PowerMac G4 Não Travará!!!
I have used regex in the past, mainly for keeping long SQL scripts. The problem is the lack of full support for regex in most of editors. IMO the best (for windows, at least) is the EditPad Pro.
Stupid (Useful) Ask Slashdot tricks?
I'm not sure whether these are legitimate, or just a "I don't know what the hell I'm doing, so let's see if I can get someone else to show me how to do my job, under the guise of sharing information."
I'd like to say the former, but my cynicism is making me lean to the latter.....
"City hall" in German is "Rathaus" Kinda explains a few things......
Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html
Unselfish actions pay back better
I use this to remove formatting that is included in the reports spit out from the mainframe -
cat REPORT_NAME | sed 's/[^a-z0-9,.-]//gi' > REPORT.out
It uses a few commands to accomplish this but I figured I would include the entire command line for completeness. It keeps all letters, numbers, ',', '.', and '-'. If you need other characters you can always add them to the regular expression.
MS Office does support regexp while not as good as Perl regex, they are very helpful.
.bas addon for regexp, which helped me alot.
Link to and excel
Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}
http://www.tmehta.com/regexp/using_functions.htm
please validate using the rfc and not your sketchy interpretation of an e-mail address. /.*@.*\..*/ will not cut it.
Try instead
([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x22)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c\\x00-\\x7f)*\\x22))*\\x40([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d))*
See the original at http://www.iamcal.com/publish/articles/php/parsing_email/
Saw this one recently, by Andrew Savige. He did use a Perl module to generate the regex itself, but even so!
/. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)
http://search.cpan.org/dist/Acme-EyeDrops/lib/Acme/EyeDrops.pm#99_Bottles_of_Beer
(I would quote the final result but
Stuff.
Why couldn't Bill try out his regular expressions?
His mom wouldn't let him play with matches.
Example of using negative lookahead assertion to parse comma delimited data:
perl -pe 's/(^|,)\\\\N(?=,|\$)/\$1\$2/g'
(?:<thing>foo)
Where you can then access the matched substring ("foo" in this case) by the tag/label "thing" (access syntax depends on language).
It's pretty spiffy if you need order independent matching.
This what I ended-up using:
((?:http|ftp)s?://)?(((([\d]+\.)+){3}[\d]+(/[\w./]+)?)|([a-z]\w*((\.\w+)+){2,})([/][\w.~]*)*)
There may well be something more robust...
bundaegi is good for you
For that annoying user who never shuts up... /[.]//
What's that? Cat got your tongue, troll?
Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.
Slashcode is interestingly interpreting my formatting, but you should get the gist.
$text =~ /
^[A-Z]+ # starts with at least one capital letter
\s? # followed by an optional space
\d+ # followed by one or more digits
or $text =~ /
^\d+ # starts with one or more digits
\. # followed by a single decimal
or $text =~ /
\d+ # starts with one or more digits
\s # and a space
or $text =~ /
Thesis # starts with "Thesis"
\d{4} # then four numbers - year
\s+ # separated by at least one space
[A-Z]+ # from one or more capital letters
\d+ # followed by one or more numbers
or $text =~ /
\d+ # starts with one or more digits
\- # connected with a dash
\d+ # to one or more following digits
or $text =~ /
\d+ # starts with one or more digits
# followed by a space
[A-Z]* #followed by zero or more capital letters
\d+ # followed by one or more digits
to filter ask slashdot posts for 'tricks' articles.
Good people go to bed earlier.
I've never found regexes to be useful at all. I prefer to write my own parsers from scratch in assembly language, or conway's game of life, if I'm feeling m/(ambitious|artistic|autistic|masochistic)/.
But even an artist gets lazy sometimes.
perl -pe '$IFS="EOF",s/^.*$//' file_with_lots_of_rubbish Works like a charm! Especially useful for dealing with marketing papers, Czech journalism and Slashdot discussions.
Troll 2.0 Fear my asocial networking!
I like:
scalar s/\|/\|/g;
returns number of vertical bar ('|') characters in the variable $_.
mqh
I just discovered a good regexp used to check file permissions : http://thedailywtf.com/Articles/Now-I-Have-Two-Hundred-Problems.aspx
Some people, when confronted with a problem, think âoeI know, I'll use regular expressions.â Now they have two problems. -- Jamie Zawinski
Here's one I came up with recently:
If you want to get documentation out of both CWEB and Doxygen, write the Doxygen comments in the source files like @=//! Comment for Doxygen.@> to prevent ctangle from stripping the comment out, then use sed 's/@=\/.*@>//g' input.w > output.w to strip those comments out so they don't end up in the output from cweave.
Remember RFC 873!
This regex matches a number: interger or float, scientific notation or plain, plus or minus...
[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?
Colorless green Cthulhu waits dreaming furiously.
use Regex::Common qw(URI net);
$text_with_urls =~ m/$RE{URI}/;
$text_with_ips =~ m/$RE{net}{IPv4}/;
Build it, and they will come^Hplain.
stuff that matters
understand the concept?
if not, try going to this site, it looks like it might be more your speed
buhbye
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
8 of 13 people found this answer helpful. Did you?
s/bush/obama/ig
I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15
I often use sed to split a delimited line into multiple lines. E.g.:
Prov 9:8 Do not rebuke mockers or they will hate you; rebuke the wise and they will love you.
on the daily WTF: http://thedailywtf.com/Articles/Now-I-Have-Two-Hundred-Problems.aspx enjoy!
Cal Henderson's routine is the best RFC compliant regex I have ever found to verify an email address:
http://code.iamcal.com/php/rfc822/
OK, you asked for stupid tricks, but this one's just plain lazy.
Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!
/I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/
A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.
Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.
Even in a simple case like this, half the fun is in explaining it. :)
I've been trying to scrape a web page lately and have been trying to get a working regexp going without a huge amount of success....Anyone care to demonstrate their awesome skills by showing me how to write a regexp that will match words in data like this (or at least the first word in a string):
" 1. Word AnotherAgain "
The key point about the data is that there are words which start with exactly 1 capital letter, and may be seperated from the subsequent word by a number of spaces space or may run directly onto it. So the desired regexp would match Word or Another or Again depending which was the first in the data. There are also numbers e.g. "10." but it doesnt matter for my purposes whether they are discarded or matched as a word.
my $re = '';
$re = qr/
\{ (?:
(?> [^{}]+ ) # nao-chaves
|
(??{ $re }) # sub-bloco de chaves
)* \}
factor 966971: 966971
This was always useful when appropriate: /^[\w.|-]+@(?:[\w.|-]{2,63}\.)+[a-z]{2,6}$/
Validates a valid email address (rfc 5322) -- although not taking into account an IP address (user@192.168.1.2)
You're absolutely right about the crap regex support in most text editors. In my personal opinion, Dreamweaver CS3 is the best code editor I've used for regex and searching. It has standard regular expressions unlike dumbass contenders like Visual Studio or Ultraedit. It's missing some stuff, though, like named groups with named substitution, multiple line search, and (this is the worst part), the $ and ^ anchors don't seem to work. But none of that matters if your search is fucking slow (see: visual studio 2008): Dreamweaver CS3 search is very, very FAST - notepad++ simply chokes on directory searches, and all of the other lite little editors I've tried do too.
/^[01]?[-\s\.]?\(?[2-9][0-9]{2}\)?([-\s\.]|(\s-\s))?[0-9]{3}([-\s\.]|(\s-\s))?[0-9]{4}\s?(([xX]\.?|(ext|EXT)\.?|\s)?\s?(?<![0-9])[0-9]{1,4})?$/
...and lots more ways to fuck up a number (validly)
Valid Phone Number Validation that allows extensions and virtually all the common ways to list a (US) number:
valid:
333 444 5555
1-333-444-5555
333.444-555
333444555 4444
333-444-555 ext. 123
1-(340)333 5678
333 444 555 x3456
SSN
^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$
US phone with or without parentheses
^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$
ISO Date (19th to 21st century only)
^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$
I was wondering with my friend someday if it's possible with regex to select a pattern which occurs twice or more times repeatedly in single line but is separated by undefined characters. For example I want to select only lines in which the same pattern "[FB][ot]o" occurs exactly two times (in example below . is any character, for clarity):
...Foo... - is not selected
...Foo...Bto... - is not selected
...Bto...Bto... - is selected
a normal /[FB][ot]o.*[FB][ot]o/ would select the second and third case. But I only want the third case. The first occurrence would define my pattern, and second occurrence must exactly match it. Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.
#
#\ @ ? Colonize Mars
#
While I'm not providing any specific trick per say, on topic are a few useful links:
http://www.regular-expressions.info/ - this one is handy for regex info particularly in Javascript which I use so infrequently I need to know how to match, capture, substitute, etc.
http://perldoc.perl.org/perlre.html - plenty of regex info there which is Perl specific, but of course extends to many other similar implementations
I only post comments when someone on the internet is wrong.
The number one trick for regular expressions is not a regular expression at all. It is simply the habit of always using the ignore whitespace flag to format and comment your regular expressions. Code maintenance and general readability is simply a must for any real developer.
One liners are for show, not for actual usage.
Construct additional pylons!
--- Do you believe in the day?
There are no Stupid Starcraft Tricks.
My Starcraft 2 Blog
(Useful) Stupid Slashdot Tricks?
Comment removed based on user account deletion
whatever you have to validate, encode it as a form submission and bounce it against http://ask.slashdot.org/comments.pl, screen scrape the results, and if slashdot's lameness filter doesn't balk, consider it validated
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
Sure, you could try and capture every single possible character, but can you update those when you find a bug?
KISS:
/(
/(
[\w\/\!\#\$\%\^\&\*\+\-\=\?\_\`\{\|\}\~]
[\w\/\!\#\$\%\&\*\+\-\=\?\^\_\`\{\|\}
\~\.]{0,62} [\w\/\!\#\$\%\^\&\*\+\-\=\?\_\`\{\|\}\~]
@
[A-Z0-9]
[A-Z0-9\.\-]{0,246}
[A-Z0-9]\.
[A-Z]{2,6}
)/ix
In order:
Happy hunting.
Is the length of a string of ones not prime? /^1?$|^(11+?)\1+$/
is beautiful.
Perl - $Just @when->$you ${thought} s/yn/tax/ &couldn\'t %get $worse;
Here's a shell command to convert base64 encoded files into javascript strings using regular expressions in VI:
for var in *image*.b64; do vi -c ':1d|$g/^=*$/d|1,$s/^/\t\t"/|1,$s/$/" +/|$s/=*" +/" +/|$s/" +$/";/|wq' $var; done
Here's to losing my Karma Bonus again....
Does anyone know if the Luhn Algorithm can be implemented in regex only?
http://en.wikipedia.org/wiki/Luhn_algorithm
(sorry if I double post this... I swear I posted it 10 minutes ago)
Reviewing just the first hour of video games.
i tried to submit some of my Regex Tricks
but the web post complains about all the irregular characters
the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-)
/.// - wouldn't work either - it would only remove the first character:
/.*//
Interesting that today's top story at TheDailyWTF is all about regexes, too. Except there, they're showing a case when you should NOT use it. I think a few of the people posting here need to take the quote in that article to heart:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. â" Jamie Zawinski
http://thedailywtf.com/Articles/Now-I-Have-Two-Hundred-Problems.aspx
God invented whiskey so the Irish would not rule the world.
#$%^&*(&^%{{}}{/\/\||```
(No, that's not a regex at all. And no, I don't even have a single girlfriend.)
8 of 13 people found this answer helpful. Did you?
took the examples posted here and blindly ran them on your production system? how many did it as root?
Just what I needed: the killer must have followed her on vacation but to find them we have to search through 200MB of emails looking for something formatted like an address!
I have to parse files with bash sometimes, and I use these:
^# = line with a leading comment
^$ = empty line
They're simple, but work usually. You can make them a lot more bullet proof by adding in blank checking between the characters, but it seems to work.
cat httpd.conf | grep -v \^\# | grep -v \^\$ | less
makes httpd.conf a lot more readable.
Check out my sysadmin blog!
(Useful) Stupid Tricks
If RegXPChecker("^" & cmdPrefix & "rejoin (?<item>.+)", Message) Then :" & item, SendBytes, tnSocket) :" & item, SendBytes, tnSocket)
re = New Regex("^" & cmdPrefix & "rejoin (?<item>.+)")
mt = re.Match(Message)
item = mt.Groups("item").ToString
SendInfo("PART
SendInfo("JOIN
Return True
End If
Simple vb.net regex capture. same logic can be applied to other captures.
http://www.cushingproductions.com
http://www.rubular.com/ is a great site for checking your regex.
/^\s*(in|)(\S*)\s*(in|)\s*(mx|ns|a|cname)(\s*[0-9]{1,3}|)\s+(.+)$/ should parse most dns records out of a zonefile (it's by no means perfect and I did SOA records separately).
/^1?$|^(11+?)\1+$/ I like it.
Bad filename character for Windows (if it matches, the filename is invalid):
E-mail (use case insensitive):
GUID (use case insensitive):
IP on local private network:
Removes .NET named capture syntax so that a .NET Regex string can be used elsewhere (such as Javascript) (replace with nothing):
Flame away about how horrible it is that I missed some edge case that even nobody on Slashdot has ever heard of, but they work well for me and hopefully for you too.
Now, if you actually find a common case that I missed, I would appreciate the help...
Peter predicted that you would "deliberately forget" creation 2000 years ago...
Regex for this site: /\/\./
Sed is more than just RegEx, but this is the handiest collection of sed regular expressions that I have ever found:
http://sed.sourceforge.net/sed1line.txt
Nevermore.
Here is the crazy regex to detect a valid UTF-8 string. :)
:D :) /[^\x00-\x7E]/ };
/^(
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$/x
This can crash perl if the string being checked is too big.
So it's usually better to just let perl attempt to decode anything non-ascii as utf8 and see if it fails or not. (And hope all the utf8 parsing exploits have been fixed
eval { $param = decode( 'utf8', $param, Encode::FB_CROAK) if $param =~
$param = decode( 'iso-8859-1', $param, Encode::FB_CROAK) if $@; # utf8 decode of non-ascii text failed so treat as latin1
Dear slashdot editors,
slashdot.org is not stackoverflow.com.
The articles and discussions here are not searchable in a sane way. Your recent attempts to mimic stackoverflow are just a waste of everybody's time because all those little tidbits that people post get lost in the internet noise immediately.
We know you're bit desperate for traffic these days. But this is not the way to go.
I regularly do something like
$ ps -augwwx | grep per[l]
with an extra [] pair around some part of the grep expression, so it doesn't find the grep command.
Lots of points to anyone who can write the regular expression which returns the useful patterns from all these comments while filtering all the other chatter.
Why don't crappy posts get modded down? A post gets torn apart in 30 replies yet stays +n Insightful?
("1" x $n) !~ /^1?$|^(11+?)\1+$/
Backreferences are fun!
CJ
Ah, arrogance and stupidity, all in the same package. How efficient of you. -- Londo Mollari
My fav: gsub(/^ +| +$/, "", string);
No more leading or trailing blanks in one swell foop.
mark
In Java I use Hamcrest Text Patterns to make regexes more readable.
http://hamcrest-text-patterns.googlecode.com
If you've ever used git-am, and get patches from people that have whitespace at the end of the line, it will complain at you like:
.dotest/patch:10: trailing whitespace.
/^\(+.\{-}\S*\)\s\+$/\1/
$ git-am some.mbox
Requirements
warning: 1 line adds whitespace errors
You can fix those patch files with:
I use this in vim (:%s), but probably 'sed -i' would work also.
My personal favorite regex trick is the zero-width assertion. I'm particularly fond of zero-width negative lookbehind assertions. Backreferences are also cool.
Cut that out, or I will ship you to Norilsk in a box.
. Will find everything for you.
No ascii art.
[\d]{1,12}(\.\d{0,2})?([\+\-\*\/]{1}[-]?[\d]{1,12}(\.\d{0,2})?)*
a max 12 digits long number followed by optional decimal digits and the possibility to *,/,+,- any number...
all the best,
Xyon
A problem with a topic for regex examples is the Lameness Filter as it sees most regular expressions as having "'junk' characters". Prefixing the regexes with tabs in ecode tags seems to have gotten around it this time.
When used with procmail regex-filtering on the To and Cc headers, these rules match any e-mail carbon-copied to any user at example.com except user@example.com, including Bcc's that don't name only user@example.com in the To or Cc header. I find this to be effective for trapping a lot of domain-blanketing spam. Of course, this does mean you not caring about receiving e-mail shared with anyone else at your ISP. False positives are avoided by using neither a large nor local ISP.
The first two rules you adjust for the length of your actual username. The 3 should be one less than the length of your username and the 5 one more than that same length. The result is that it matches any names that are too short, too long, or don't have the right letters in the right positions.
I have another that matches anything that is to user@example.com but only if the quoted name contains any characters not in the user's full name. E.g. if the user's name is "User Name", \"[^"]*[bcdf-lopqt-z][^"]*\" <user@example\.com> won't let messages to "Sergey" <user@example.com> through.
If you're using case-sensitive regex matches, these would need to be augmented.
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
It's x-platform, Simple to use, super easy to "see" how it works. It feels good, and what's more? It does the fricken' laundry too!
How much is your data worth? Back it up now.
You can do all sorts of word puzzle games with regexps, provided you've got a decent word list to begin with. E.g., find all words that do only contain vowels: ^[aeiou]+$ (this is of course language dependent and in English would fail to recognize the -y- as a vowel, but that's not necessary in this case). Or try to find good names for that killer app you're developing. Say you call it Super Video Program. Then you could search for words containing the letters s.*v.*p.* and come up with "silver plate". Ok, not a great example, but you know what I mean. Or you want to spell "access" differently. We know that a and e can sometimes be interchanged; x, cc, ks, xc, and cs; and s, ss and th. Then you come up with a regexp like '^[ae](x|cc|ks|cs|xc|xs|qs)[ae](s|ss|th)$' and find access, axes, excess, and exes.
And you can search for those impossible entries in cross words, for which you will only need ^, . and $.
Here's a regex that will identify your mom: /^'s[:space:][mM]om$/
Colin Dean Go a year without DRM
I'd just like to add my two cents in here. If you are writing a quick and dirty one time script, go nuts with your regex. If your writing something that is going to be used long term, please for the sake of the maintainers just use string functions.
It is a hell of a lot easier to redo one string function than redo a complete regex when the data format changes. Also if you use string functions you can actually do real error reporting when things don't work the way you expect them to.
Selling software wont make you money, selling a service will.
I came up with a Regex that can be used to match literally anything (yes, anything!). It is, therefore, the most flexible regex ever concocted. Here it is:
.*
After a whole bunch of research I ended up with this for email validation:
/^[\w!#$%&\'*+\/=?^`{|}~.-]+@(?:[a-z\d][a-z\d-]*(?:\.[a-z\d][a-z\d-]*)?)+\.(?:[a-z][a-z\d-]+)$/iD
Note: the modifiers are designed to work with PHP.
Source: http://www.hm2k.com/posts/what-is-a-valid-email-address
I kept getting the response "Filter error: Please use fewer 'junk' characters."
Is there anyway to turn this off or at least bypass the filter, as regular expressions are just made up of 'junk' characters.
I wonder if askslashdot would result in an article on that? Let the +5 Funnies flow...
Usually works for me. :P
I am very small, utmostly microscopic.
I like this regex evaluator for testing things out: http://www.cuneytyilmaz.com/prog/jrx/
Really useful.
Someone here once made a regex to filter out nonauthorized BGP communities from peer announcements. It was three lines long.
Really slick asking people to post regexes on slashdot and then having all the posts get rejected due to "junk" characters"...
m/(2b)?/
One of the better ones :)
Don't panic
Your regex doesn't allow + signs in the name part.
Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.
And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.
Jeremy P <jeremyp@example.com>
Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
By Jeffrey Friedl's "Mastering Regular Expressions", O'Reilly. Read it from page 1 to the end until you understand.
Enough said.
".*"
It never misses.
To do list for Windows
Most useful regex ever: .*your\ base.*
Used in:
find / -iregex .*your\ base.* -exec chown us:us {} \;
Is the following:
(.*?)
I don't know what to call it. I don't know how to explain what it does, programmatically. All I know is that it makes it a hell of a lot easier to capture groups. For instance! HTML href-grabbing could be written this way: /href=['"]([^'"]*)['"]/
But it's much easier, I think, to write it this way: /href=['"](.*?)['"]/
This probably doesn't drive home just how handy this is. Basically, any time that want to grab a pattern from a "hole" and you know what the text surrounding the hole looks like, you can drop in (.*?)
Hmmm regex p0rn...
I frequently look for tv episodes on the net, and find myself killfiling shows thusly:
S[0-9][0-9]E[0-9][0-9] , which would filter something like friends.s05e10.[etc]
Is there a smaller way? The program I use is case-insensitive, but the only shorter way I can come up with "S[0-9]+E[0-9]+" would risk finding other sets inadvertently. (or would it?)
Any suggestions?
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
perl -wle '$_ = 1; (1 x $_) !~ /^(11+)\1+$/ && print while $_++'
Not really related to regular expressions, but sed, vim, maybe more, accept instead of a / also other signs which is easy when a / must be matched:
sed sx/home/foox/home/barx
a back reference. More at
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
I use java for programming, know nothing of Perl. In java it works fine both in regexes and replacement expressions. (I think you could try to to escape your last backslash?) Here is a regex I use to find words repeated within 30 chars.
String DOUBLE_WORD_DISTANCED_REGEX =
"(?ix) # Turn case insensitive and comments on\n" +
"\\b # Start with a boundary\n" +
"([^\\s\\p{Punct}]++) # Read somthing that is not whitespace or punctuation\n"+
"(?:\\s++) # Read - don't capture - something that is whitespace\n" +
"(?:[^.,!?:;]{1,30}) # Read - don't capture - something that is not punctuation \n" +
"(?:\\s+) # Read - don't capture - something that is whitespace\n" +
"(\\1) # Repeat the word\n" +
"\\b # End with a word boundary\n";
Given a string of 1's, matches if the length of the string is non-prime:
(This is the blog post http://weblog.raganwald.com/2008/02/so-you-think-you-know-regex-fu.html)
Took me a while to figure out how this works. This is the sort of thing that makes me smile.
I blame Bush.
Meep.
...but ./ wouldn't let me post it cause it had to many "junk" characters. I guess one man's junk is another mans regex, but it didn't seem more complicated than anyone else's expressions.
/^.*/
Ok, as the crafter of more than a few DOS batch files, I frequently like using regex expressions to make my scripts do neat things. What Regex command line applications can the Slashdot community recommend? Which ones have the most flexibility?
Thanks
Is this hi5 now?
this may be more about perl than regular expressions in general, but i've been using a function for several years to insert data into templates. the template contains tags like "${name}", the function is called with the template and a list of substitutions like ("name" => "john", "ip" => "127.0.0.1") and the return value is the "filled out" version of the template.
; ;
; ;
sub parse_string($;@)
{
my $string = shift
my %rep = @_
$string =~ s/\$\{(.+?)\}/$rep{$1}/gms
return $string
}
s/\(Useful\) Stupid (.*) Tricks(\??)/Slashdot is getting Stupid$1Tricksdot$2/
Persian Project Management Software as a Service
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
>> Standing on head makes smile of frown, but rest of face also upside down.
> One liners are for show, not for actual usage.
Mostly true. But it's always a bit of a perverse sense of satisfaction when you come out with an obscure one-liner that does exactly what you want.
Besides, I indulge myself in that way too much these days. It turns out that reading completely illegible one-liners is mostly a matter of habit.
Sometimes, I scare myself.
On a very slightly different matter, I always keep http://sed.sourceforge.net/sed1line.txt closeby, it has proven useful more than once.
This is quick-and-dirty perl (you did say stupid, right?) -- could be done better with something like template toolkit:
$ cat in.file
eval($goodbye = "hello")
You say goodbye and I say: $goodbye
$ cat xx.pl /eval\s?(.*)??/ ) {
#!/bin/perl
open(INFILE,"in.file");
while(<INFILE>) {
print "Before regex: ";
print;
if (
s/eval\s?\((.*)??\)/$1/g;
eval;
} else {
s/(\${?\w*}?)/$1/eegs;
print "After regex : ";
print;
}
}
$ perl ./xx.pl
Before regex: eval($goodbye = "hello")
Before regex: You say goodbye and I say: $goodbye
After regex : You say goodbye and I say: hello
--
What's it take to be considered a low UID?
Matthew P. Barnson
I learn what I think when I read what I write
IMHO, this is exactly the way that Slashdot should be going. Threads like this are interesting, add to the reservoirs of internet knowledge, and have the highest quality to noise ratios.
I (and I suspect many others) read Slashdot not for the latest +5 funny comment (though those can be fun to read) but to read the opinions of brilliant minds. And when those minds start trading secrets... Everyone wins.
No such thing. Stupid? Sure. Lotsofem.
But to be conisdered a trick and not just a normal use of the syntax, it is by defenition a case where you should really be writing a parser. It's not that hard, and a helluva lot easier to discover and fix bugs in.
I love regexes too. Where they're useful, which is for automating tedious but trivial matcheing. But they have done no end of harm to the competency of the average geek by allowing them to indefinitely postpone learning how to writr a simple parser, which is really something every programmer should know. I'm not even talking yacc/lexx (but yous hould know when to go read up on that too), even just a simple char by char or tokeniser based little state thing will sort out a lot of stuff in a clear, simple and above all understandable way that would require a totally unreadable mess of a rfegex that probably contains five bugs you will never spot.
Sure, it might take a page of code in stead of code in stead of a 128-long line, but you will be well into understanding it and customising it to your needs before you're even halfaw reading that monster regex.
Yes, I can write those too if I have too, and have, but it about as mainainable as self-modifying machine code. The complex cases are simply lousy at explaining what they do, and in most cases less efficient that a well written string parsiing routine. If you haven't grown out of the illusion that compact code is faster to execute or even code (let alone maintain), you probably can't write a foolproof regex for a complex syntax anyway.
sudo ergo sum
I have a logical answer. It looks good at first. You'd have to go back after seeing all
the tearing apart, which is a bother. But also, if you DO bother to go back after seeing
it shredded, and mod it down then, the meta moderators (who judge your modding) don't see all
that shredding either. So you look like a jerk who modded down a perfectly good-looking-at-first
post.
And finally the overriding question, a post that looks good at first and is then shredded
can be a vital trigger for an interesting discussion (the shredding itself).
Hide all sigs: Click HELP+Prefs (top), VIEWING (last on right), DISABLE SIGS (3rd on left) and SAVE (hidden at bottom).
I use these a lot for several RSS feeds and Torrentflux ;)
Negative Lookahead -> Match only those that exclude the string
(?!.*?720p)
Positive Lookahead -> Match only those that include the string
(?=.*PDTV)
Eg
(?=.*PDTV)(?!.*720p)^Non.Copyrighted.TV.Show.s01e[\d].*
Can add as many positive/negative look aheads as you want, very handy for weeding out the half dozen copies you'd get of each one!
# cat
Damn, my RAM is full of cats. MEOW!!
http://www.noulakaz.net/weblog/2007/03/18/a-regular-expression-to-check-for-prime-numbers/ describes a way to use regex to check if a string contains a prime number of ones. /^(11+?)\1+$/
Here is the actual regex.
Ironically, this isn't even a "regular" (in the CS sense) language at all, which just shows how powerful regex really is.
cause it's an interesting discussion of a common (mis)understanding. did you know the RFC specifies leading-zero-for-octal and leading-0x-for-hex? i knew those were commonly used conventions in some places but didn't know that included IP addresses.
if the mods do their job, the posts correcting the GP's mistaken understanding will also score high marks.
http://xkcd.com/208/
They're all stupid regex tricks, however useful they may be.
kaens.blogspot.com
sub betterchomp { $_[0] =~ s/[\n\r]//g; }
I went to the U of Iowa, which has 128.255.x.x. When I worked in IT there for a while we occasionally ran into stupid software that didn't believe our IP address was a valid one (like the first couple versions of the Solaris 2.x installer)
So please fix your regex to be [0-5] and not [0-4].
I'm not sure whether the first octet can be 255, which you allow, but I'll leave it for someone else to correct if necessary...
When the U of I hospitals got their own IP address range they were given 129.255, which I assume was because we'd already experienced whatever pain there was to be found with that 255 octet and knew how to work our way around it!
did you know the RFC specifies leading-zero-for-octal and leading-0x-for-hex?
yes, and I assume most other people who have studied IP at all did as well.
if the mods do their job
don't hold your breath.
c'mon, nobody remembered the handy skirt and shirt over at xkcd? skirt's regex, shirt's linux and regex.
not only is time travel possible, it's irrelevant.
And this proves that sed is somewhat of a mongrol (but I love it all the same): http://sed.sourceforge.net/grabbag/tutorials/hanoi.htm
First time posting a comment here so cut me some slack. I think that the regex library (http://regexlib.com/) is a very good source for regexs for all kinds of things.
Most of the regexs that have been posted here can be found there.
At a voice command control demo, one member of the audience stood up and yelled: C: ENTER DEL *.* ENTER about 6 seconds later the person demoing the SW exclaimed OMG
Email address validation is very poorly implemented on most software. Take a look at this, this is an email validation regex. You will be surprised of how complex can be:
http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
I often use something like this:
This way if I have unusual characters in filenames it handles it fine. I have unusual characters since I get document files to process from non-programmer types with 'friendly' names that contain spaces, apostrophes, parens, quotes, etc.
You can use a similar trick with sed in bash to build a escapified commandline to later eval that keeps all the arguments nicely as one argument each and prevents the shell from interpreting special characters.
It is better to use a regexp to make sure the string is in the format X.Y.Z.W; afterwards use numerical comparison to verify that the numbers are between 0 and 255, etc. The proposed regexp is long and difficult to maintain.
The saddest poem
This beautiful line of Perl code:
$line =~ s/([^\t]*)\t/$1." "x(8-length($1)%8)/ge;
replaces tabs with the appropriate number of spaces, respecting the tab stops. Its author Phiroze Parakh rocks
SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
From vi,
1,$s/\(.\)\(.\)/\2\1/g
will yield a copy of your file which looks disturbingly different.
Doing the command again, will yield the original file.
For even more confusion, try :
1,$s/\(.\)\(.\)\(.\)/\3\2\1/g
Repeat and rinse as necessary...
There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
In the borderline stupid category, but I find this quite useful.
When grepping the output from ps, bracket the first character.
ps ax | grep [s]sh
If you don't, the grep command may show up in the results.
More interesting than useful, but I think the idea of writing a regex to do integer division is awesome.
http://bmm6o.blogspot.com/2008/03/divisibility-testing-and-pattern_27.html
Your favorite
I realise nobody is still reading this, and as this is my first ever post it'll come in way below most readers' filters anyway. However, I didn't see it posted (though I'm reading at quite a high filter level too), and it's pretty useful:
To match something between quotation marks, for example, do:
m/"[^"]+"/
Which says, match a quote, followed by at least one character which isn't a quote, followed by a quote.
There are further refinements to this, and it doesn't allow for quotes within quotes, or for empty quotes (""), but it does what you want most of the time.
This and other gems are to be found in Jeffrey Friedl's Mastering Regular Expressions.
Do you know other tools?
"Use cases are fairy tales..." I. S. 2005