Slashdot Mirror


(Useful) Stupid Regex Tricks?

careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"

58 of 516 comments (clear)

  1. IP and Hardware addresses by rallymatte · · Score: 5, Insightful

    To filter a string to make sure it's a valid ip address this regexp is quite useful.
    /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

    And this one for mac addresses
    /^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/

    1. Re:IP and Hardware addresses by Poltras · · Score: 3, Insightful

      So if I get this right, 0.0.0.0 is a valid ip address? I know the real regex would take a full post, but yes, it is possible to check with a single regex is it is valid, if it makes sense (127.0.0.1, 10.*, 169.254.*, etc etc) and if it's not a broadcast or a network address (not taking netmask into account).

    2. Re:IP and Hardware addresses by Richard_J_N · · Score: 5, Informative

      Of course, you can do better still. For mac addresses, try:
          ^([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}$
      [:xdigit:] is short for hexadecimal digits, i.e. a-fA-F0-9
      We can also loop 5 times over the 'XX:' sections.

    3. Re:IP and Hardware addresses by Speare · · Score: 5, Informative

      For pretty much any useful stock problem solved by regular expressions, see Perl's Regex::Common module. A lot of these patterns are fiendishly complicated to deal with edge-cases properly.

      --
      [ .sig file not found ]
    4. Re:IP and Hardware addresses by rallymatte · · Score: 5, Funny

      Not only are you showing off with a lower member id than me, do you also have to come up with a cooler regexp than me?

    5. Re:IP and Hardware addresses by alta · · Score: 4, Funny

      I can easily beat you on the UID, but I couldn't regex the a out of an apple.

      --
      Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
    6. Re:IP and Hardware addresses by Bazzargh · · Score: 3, Informative

      /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

      Try this: /^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.|$)){4}/

      And similarly: /^(([0-9a-fA-F]{2})(:|$)){6}$/

      (term(delimiter|$)){n} is the generic stupid regex trick here. Works in perl, ymmv elsewhere.

      -Baz

    7. Re:IP and Hardware addresses by plumby · · Score: 3, Insightful

      So if I get this right, 0.0.0.0 is a valid ip address?

      If you mean "Is it an address that you can send IP traffic to?", then the answer is no. If you mean "Is it a valid value that can end up in an IP address field (e.g., in the response to the ipconfig command)?" then the answer is yes - it means that you've not got a connection.

    8. Re:IP and Hardware addresses by nschubach · · Score: 4, Interesting

      There's a really cool little "real time" regex analyzer written in Flex: (if you're not one of them scared to death by Flash content)

      http://gskinner.com/RegExr/

      Maybe you can monkey your way into "regexing" the a out of apple :p

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
    9. Re:IP and Hardware addresses by fbjon · · Score: 4, Informative

      It seems both Opera and ping in Windows interpret individual parts with leading zeros as octal. More interestingly, Opera also accepts hexadecimal. That makes constructing a regexp that validates any arbitrary IP address, and not just a valid dot-decimal, a bit more cumbersome.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    10. Re:IP and Hardware addresses by Arkaic · · Score: 3, Informative

      Also, when configuring ACLs 0.0.0.0 usually means all ip addresses.

    11. Re:IP and Hardware addresses by sqldr · · Score: 4, Funny

      Not only are you showing off with a lower member id than me

      Low ID = old fart. He may be a regexp wizard, but he probably looks like gandalf too :-D

      --
      I wrote my first program at the age of six, and I still can't work out how this website works.
    12. Re:IP and Hardware addresses by wertigon · · Score: 3, Funny

      Write a Regex for them! :D

      --
      systemd is not an init system. It's a GNU replacement.
    13. Re:IP and Hardware addresses by Dagger2 · · Score: 4, Funny

      That also fails beautifully with an address like "2001:db8:3c4d:48:a00:20ff:feb9:4c54", which is perfectly valid.

      Unless you know you're going to be dealing with numeric IPv4 addresses in a specific format, it would be best to pass them to getaddrinfo() (with AI_NUMERICHOST if you want to avoid DNS) and let somebody else worry about validating them properly.

    14. Re:IP and Hardware addresses by akozakie · · Score: 3, Informative

      According to the RFC leading zeros specify octal and 0x is hexadecimal. Both are standard, but rarely used and not all programs support them. There are even more ways to write an IP address, including dword and different mixes, but they are usually only used for obfuscation in malware.

    15. Re:IP and Hardware addresses by squallbsr · · Score: 3, Informative

      Also, typically binding a service to ip 0.0.0.0 connects it to all available interfaces on the machine.

      i.e: starting a development server for a django app on all interfaces (instead of the default 127.0.0.1)
      python manage.py -runserver 0.0.0.0:8000

      --
      Sleep: A completely inadequate substitution for Caffeine.
    16. Re:IP and Hardware addresses by L4t3r4lu5 · · Score: 4, Funny

      That last bit is the perlre for a zero-width negative look-behind assertion

      It certainly looks like English, but I have no idea what that means. Whatever it is, it sure seems to help cure insomnia.

      --
      Finally had enough. Come see us over at https://soylentnews.org/
    17. Re:IP and Hardware addresses by josecanuc · · Score: 3, Interesting

      Folks who think a low ID means a old person: get real. Slashdot hasn't been around forever. It started in 1997. Accounts were added later.

      Folks with a low ID just happened to register within the few months following the addition of accounts. Must have been 1998 or 1999. I was in college at the time. I'm currently not yet 30 years old. Is that old to you?

    18. Re:IP and Hardware addresses by kimba · · Score: 3, Insightful

      Why isn't 0.0.0.0 or 10.* a valid IP address? Since when is the definition of IP address to be unicast and globally routable?

      I'd rather take issue with the fact it completely fails on IPv6 addresses.

    19. Re:IP and Hardware addresses by Vadim+Grinshpun · · Score: 4, Funny

      Hmmm... until recently I didn't even realize that low ID's were in vogue :)

    20. Re:IP and Hardware addresses by tamyrlin · · Score: 4, Informative

      I personally like the regex-builder mode in Emacs as well. This one allows you to build a regexp while highlighting all matches in the current buffer.

      Of course, this should probably have been posted in the emacs thread earlier, but I think it is probably a good match for this thread as well :)

      To start it, just use M-x regexp-builder

    21. Re:IP and Hardware addresses by mikiN · · Score: 4, Funny

      You must be new h... (looks at PP's ID, gasps)
      Nevermind.

      --
      The Hacker's Guide To The Kernel: Don't panic()!
    22. Re:IP and Hardware addresses by Kymermosst · · Score: 3, Funny

      So, would anyone like to buy my new T-shirt, it says "There is no place like 2130706433."

      --
      "Alcohol, Tobacco, Firearms, and Explosives" should be a convenience store, not a government agency.
  2. Here's One for Slashdot Stories! by Anonymous Coward · · Score: 4, Funny

    (Useful) Stupid * Tricks

    Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.

    1. Re:Here's One for Slashdot Stories! by Malevolent+Tester · · Score: 4, Funny

      Next week: (Useful) Stupid Starcraft Tricks.

      You can assign a building, building add-on, or a group of up to 12 units to a single key. To do this, select what you want to assign, then hold down Control and select a number on the keyboard between 0-9. Then, when you want to select what you assigned, simply press the number of the group that you want. Pressing a group number twice will center the screen on the group.

      --
      If you haven't made a developer cry, you've wasted a day.
    2. Re:Here's One for Slashdot Stories! by McWilde · · Score: 4, Funny

      That doesn't look right...
      Try:

      /^\(Useful\) Stupid \w+ Tricks$/

      Also, I noticed that the previous stupid tricks stories ended with a question mark, but this one doesn't. So:

      /^\(Useful\) Stupid \w+ Tricks\??$/

      --
      Maybe
  3. New Slashot Section by Frankie70 · · Score: 5, Interesting

    Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.

  4. How about by cbiltcliffe · · Score: 3, Funny

    Stupid (Useful) Ask Slashdot tricks?

    I'm not sure whether these are legitimate, or just a "I don't know what the hell I'm doing, so let's see if I can get someone else to show me how to do my job, under the guise of sharing information."

    I'd like to say the former, but my cynicism is making me lean to the latter.....

    --
    "City hall" in German is "Rathaus" Kinda explains a few things......
    1. Re:How about by Anonymous Coward · · Score: 5, Interesting

      I actually like these. Nice little highly enriched concentrations of geekery on a single page. Think how long it might take to round up the sort of stuff that appears here by Googling.

      Turing word: insipid
      In a sentence: You find this page insipid but I find it inspiring.

    2. Re:How about by Bandman · · Score: 4, Interesting

      I like it, but I've got a bookmark folder called "Slash-doc" where I store useful threads that contain a lot of information.

      I've got a lot of threads bookmarked.

      Best Practices for Process Documentation

      How would you make a distributed Office system

      Quality Open Source / Calendar / Messaging Systems

      and some others.

      Some of the information in the threads is out of date, but the ideas are useful and interesting to read. I need to go back through Ask Slashdot and get the more recent threads that seem to act as references

  5. Regexp-based address validation by mutende · · Score: 5, Informative

    Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html

    --
    Unselfish actions pay back better
    1. Re:Regexp-based address validation by neoform · · Score: 4, Funny

      Best part of that Regex? It's easy to modify too!

      --
      MABASPLOOM!
    2. Re:Regexp-based address validation by xenocide2 · · Score: 4, Insightful

      The regex is beautiful in the sense that it lets you not be one of those assholes who refuses valid email addresses.

      --
      I Browse at +4 Flamebait

      Open Source Sysadmin

  6. Windows by jgtg32a · · Score: 3, Informative

    MS Office does support regexp while not as good as Perl regex, they are very helpful.

    Link to and excel .bas addon for regexp, which helped me alot.
    Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}

    http://www.tmehta.com/regexp/using_functions.htm

  7. is it an rfc-822 compliant e-mail address? by Anonymous Coward · · Score: 3, Interesting

    please validate using the rfc and not your sketchy interpretation of an e-mail address. /.*@.*\..*/ will not cut it.

    Try instead
    ([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x22)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c\\x00-\\x7f)*\\x22))*\\x40([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d))*

    See the original at http://www.iamcal.com/publish/articles/php/parsing_email/

    1. Re:is it an rfc-822 compliant e-mail address? by Timmmm · · Score: 3, Insightful

      Mmmmm readable.

    2. Re:is it an rfc-822 compliant e-mail address? by Ken+D · · Score: 3, Interesting

      The problem is that email addresses are not suitable for regex based validation.
      There are too many legacy formats, too many variations, that are legal addresses.

      Why, back in the old days, you could send mail to things like "bob%example.com@example.org" which would shoot the email off to example.org, who's mail server would then shoot the email off to example.com. A way to hand route your email around a broken network link in the old days. Throw in a few UUCP hops, maybe getting final delivery to a BITNET connected system. Ah, those were the days!

  8. Regex Bill by Anonymous Coward · · Score: 5, Funny

    Why couldn't Bill try out his regular expressions?

    His mom wouldn't let him play with matches.

  9. Match a library call number by Gulthek · · Score: 4, Interesting

    Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.

    Slashcode is interestingly interpreting my formatting, but you should get the gist.


    $text =~ /
            ^[A-Z]+ # starts with at least one capital letter
            \s? # followed by an optional space
            \d+ # followed by one or more digits /x
        or $text =~ /
            ^\d+ # starts with one or more digits
            \. # followed by a single decimal /x
        or $text =~ /
            \d+ # starts with one or more digits
            \s # and a space /x
        or $text =~ /
            Thesis # starts with "Thesis" .+ # with one or more characters of any kind
            \d{4} # then four numbers - year
            \s+ # separated by at least one space
            [A-Z]+ # from one or more capital letters
            \d+ # followed by one or more numbers /xi # case ignored here in case we run into THESIS or thesis
        or $text =~ /
            \d+ # starts with one or more digits
            \- # connected with a dash
            \d+ # to one or more following digits /x
        or $text =~ /
            \d+ # starts with one or more digits
              # followed by a space
            [A-Z]* #followed by zero or more capital letters
        \d+ # followed by one or more digits /x

    1. Re:Match a library call number by mgbastard · · Score: 4, Funny

      holy crap. You DOCUMENTED your regular expression? You shall be thrown into the pit!

      --
      Anyone seen my low uid? last seen 10 years ago while panning the #@$# out of Taco's 'web based discussion system'
  10. Nope, not useful by darkvizier · · Score: 4, Funny

    I've never found regexes to be useful at all. I prefer to write my own parsers from scratch in assembly language, or conway's game of life, if I'm feeling m/(ambitious|artistic|autistic|masochistic)/.

    But even an artist gets lazy sometimes.

  11. One regex to match them all by gzipped_tar · · Score: 4, Informative

    This regex matches a number: interger or float, scientific notation or plain, plus or minus...

    [-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?

    --
    Colorless green Cthulhu waits dreaming furiously.
  12. use Regex::Common; by oneiros27 · · Score: 4, Insightful

    use Regex::Common qw(URI net);
    $text_with_urls =~ m/$RE{URI}/;
    $text_with_ips =~ m/$RE{net}{IPv4}/;

    --
    Build it, and they will come^Hplain.
  13. Remove trailing whitespace by cerberusss · · Score: 3, Interesting
    To remove trailing whitespace from a textfile (vim regex, don't know if the \s will work in other regex dialects):

    /\s\+$//e

    --
    8 of 13 people found this answer helpful. Did you?
  14. Do these questions really belong here? by DerCed · · Score: 5, Informative

    I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
    http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15

  15. Be lazy! by subreality · · Score: 4, Interesting

    OK, you asked for stupid tricks, but this one's just plain lazy.

    Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!

    /I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/

    A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.

    Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.

    Even in a simple case like this, half the fun is in explaining it. :)

  16. recursive regexp to match {} block by doti · · Score: 3, Informative


          my $re = '';
            $re = qr/
                    \{ (?:
                            (?> [^{}]+ ) # nao-chaves
                    |
                            (??{ $re }) # sub-bloco de chaves
                    )* \} /xs;

    --
    factor 966971: 966971
  17. some that I've used ... by ianare · · Score: 4, Interesting

    SSN
    ^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$

    US phone with or without parentheses
    ^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$

    ISO Date (19th to 21st century only)
    ^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$

  18. Re:Not a trick, but a question. by natebarney · · Score: 4, Informative

    Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.

    In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.

  19. Re:Not a trick, but a question. by Moebius+Loop · · Score: 3, Informative

    In most regex engines, you should be able to do this with backreferences. I don't use them often, but I think something like this would work:

    /^(.*?)([FB][ot]o)((.+?)\2)+(.*?)$/

    I think the reason the example you gave using \1 didn't work is because the .* was too greedy, and ate up the rest of the pattern before the \1 got a chance to match. Also, when you're doing full line matching, it's always good to think about ^/$ and whether you're using any multiline modifiers.

    --
    have you been seen on slash?
  20. Search through phone numbers by cerberusss · · Score: 4, Funny
    This regex goes through my enormous list of girlfriends' telephone numbers, and makes a selection based on the area code I'm currently in!

    #$%^&*(&^%{{}}{/\/\||```

    (No, that's not a regex at all. And no, I don't even have a single girlfriend.)

    --
    8 of 13 people found this answer helpful. Did you?
  21. Re:99 Bottles of Beer on the wall by Culture20 · · Score: 4, Insightful

    (I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)

    Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.

  22. Re:Mainframe Formatting by msuarezalvarez · · Score: 3, Insightful

    You are a great candidate for the Useless Use of Cat award... specially endearing is your making a comment on the few commands your line uses :D

  23. (Useful) Stupid useless articles by Kent+Recal · · Score: 3, Insightful

    Dear slashdot editors,

    slashdot.org is not stackoverflow.com.
    The articles and discussions here are not searchable in a sane way. Your recent attempts to mimic stackoverflow are just a waste of everybody's time because all those little tidbits that people post get lost in the internet noise immediately.

    We know you're bit desperate for traffic these days. But this is not the way to go.

  24. The most useful regex there is! by ShatteredArm · · Score: 4, Funny

    I came up with a Regex that can be used to match literally anything (yes, anything!). It is, therefore, the most flexible regex ever concocted. Here it is:

    .*

  25. Re:email validation... FAIL by jeremyp · · Score: 3, Insightful

    Your regex doesn't allow + signs in the name part.

    Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.

    And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.

    Jeremy P <jeremyp@example.com>

    Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.

    --
    All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
  26. It must be said by IorDMUX · · Score: 3, Funny

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    -- Jamie Zawinski

    --
    >> Standing on head makes smile of frown, but rest of face also upside down.
  27. Opposite by Christopher_Olah · · Score: 4, Insightful

    IMHO, this is exactly the way that Slashdot should be going. Threads like this are interesting, add to the reservoirs of internet knowledge, and have the highest quality to noise ratios.

    I (and I suspect many others) read Slashdot not for the latest +5 funny comment (though those can be fun to read) but to read the opinions of brilliant minds. And when those minds start trading secrets... Everyone wins.