Slashdot Mirror


(Useful) Stupid Regex Tricks?

careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"

29 of 516 comments (clear)

  1. Regexp-based address validation by mutende · · Score: 5, Informative

    Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html

    --
    Unselfish actions pay back better
    1. Re:Regexp-based address validation by Daas · · Score: 2, Informative

      I matches the entire RFC, not just the you@slashie.com .

        <You @ Slashie> you@slashie.com

      Should be valid if I remember correctly.

  2. Windows by jgtg32a · · Score: 3, Informative

    MS Office does support regexp while not as good as Perl regex, they are very helpful.

    Link to and excel .bas addon for regexp, which helped me alot.
    Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}

    http://www.tmehta.com/regexp/using_functions.htm

  3. Re:IP and Hardware addresses by Richard_J_N · · Score: 5, Informative

    Of course, you can do better still. For mac addresses, try:
        ^([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}$
    [:xdigit:] is short for hexadecimal digits, i.e. a-fA-F0-9
    We can also loop 5 times over the 'XX:' sections.

  4. Re:IP and Hardware addresses by Speare · · Score: 5, Informative

    For pretty much any useful stock problem solved by regular expressions, see Perl's Regex::Common module. A lot of these patterns are fiendishly complicated to deal with edge-cases properly.

    --
    [ .sig file not found ]
  5. One regex to match them all by gzipped_tar · · Score: 4, Informative

    This regex matches a number: interger or float, scientific notation or plain, plus or minus...

    [-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?

    --
    Colorless green Cthulhu waits dreaming furiously.
  6. news for nerds. NERDS by circletimessquare · · Score: 2, Informative

    stuff that matters

    understand the concept?

    if not, try going to this site, it looks like it might be more your speed

    buhbye

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
  7. Do these questions really belong here? by DerCed · · Score: 5, Informative

    I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
    http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15

  8. Just read this by Anonymous Coward · · Score: 2, Informative
  9. RFC 822 email validation by gpuk · · Score: 2, Informative

    Cal Henderson's routine is the best RFC compliant regex I have ever found to verify an email address:

    http://code.iamcal.com/php/rfc822/

  10. recursive regexp to match {} block by doti · · Score: 3, Informative


          my $re = '';
            $re = qr/
                    \{ (?:
                            (?> [^{}]+ ) # nao-chaves
                    |
                            (??{ $re }) # sub-bloco de chaves
                    )* \} /xs;

    --
    factor 966971: 966971
  11. Re:PCRE and perl 5.10 offer "tagged" captures by gzipped_tar · · Score: 2, Informative

    That's also one of my favorite. Python has this feature too.

    --
    Colorless green Cthulhu waits dreaming furiously.
  12. Handy links by Kozz · · Score: 2, Informative

    While I'm not providing any specific trick per say, on topic are a few useful links:
    http://www.regular-expressions.info/ - this one is handy for regex info particularly in Javascript which I use so infrequently I need to know how to match, capture, substitute, etc.
    http://perldoc.perl.org/perlre.html - plenty of regex info there which is Perl specific, but of course extends to many other similar implementations

    --
    I only post comments when someone on the internet is wrong.
    1. Re:Handy links by Pope · · Score: 2, Informative

      I was recently trying to come up with a regex for some renaming file thingy recently, and I found I could easily state is pseudo-code what I wanted to do, but looking through and regex sites/tips/FAQs quickly went from "here's a very simple match test" to "going to the moon", with little in-between, which is what I was after.

      However, I eventually found Reggy for OS X, a handy little tool for testing regexes, so all was not lost.

      --
      It doesn't mean much now, it's built for the future.
  13. Re:IP and Hardware addresses by Bazzargh · · Score: 3, Informative

    /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

    Try this: /^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.|$)){4}/

    And similarly: /^(([0-9a-fA-F]{2})(:|$)){6}$/

    (term(delimiter|$)){n} is the generic stupid regex trick here. Works in perl, ymmv elsewhere.

    -Baz

  14. Blasphemy by Shohat · · Score: 2, Informative

    There are no Stupid Starcraft Tricks.

  15. Re:Not a trick, but a question. by natebarney · · Score: 4, Informative

    Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.

    In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.

  16. Re:IP and Hardware addresses by fbjon · · Score: 4, Informative

    It seems both Opera and ping in Windows interpret individual parts with leading zeros as octal. More interestingly, Opera also accepts hexadecimal. That makes constructing a regexp that validates any arbitrary IP address, and not just a valid dot-decimal, a bit more cumbersome.

    --
    True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
  17. Re:IP and Hardware addresses by Arkaic · · Score: 3, Informative

    Also, when configuring ACLs 0.0.0.0 usually means all ip addresses.

  18. Re:Not a trick, but a question. by Moebius+Loop · · Score: 3, Informative

    In most regex engines, you should be able to do this with backreferences. I don't use them often, but I think something like this would work:

    /^(.*?)([FB][ot]o)((.+?)\2)+(.*?)$/

    I think the reason the example you gave using \1 didn't work is because the .* was too greedy, and ate up the rest of the pattern before the \1 got a chance to match. Also, when you're doing full line matching, it's always good to think about ^/$ and whether you're using any multiline modifiers.

    --
    have you been seen on slash?
  19. Re:very topical! by bpjk · · Score: 2, Informative
    You didn't say if digits can be part of word that starts with a capital; I'm assuming they cannot be below, changing to accommodate that would be easy.

    This is easy once you re-word your definition of a word: in your case, a word starts with a capital followed by a run of non-capital letters.

    The regex: /[A-Z][a-z]*/

    Will match the first of such words in a string. (it will also match single-character words; change * to + if you don't want that). Make sure you're matching is case-sensitive for this to work. Many regex engines will have an abbreviation for [A-Z] and [a-z] you can use instead.

    To get the second of such words in string: /[A-Z][a-z]*[^A-Z]*([A-Z][a-z]*)/

    The second word will be in the first sub-match (\1). The [^A-Z]* will gobble up everything between the last letter of the first word and the start of the second word. If there is no second word, this match will fail.

    Repeat the first part of this (everything up to the open parenthesis) to get third word, fourth word, etc. Rather than repeating that part of the expression, you can use parenthesis and counts (usually {n,m}) for this in most engines.

  20. Re:IP and Hardware addresses by akozakie · · Score: 3, Informative

    According to the RFC leading zeros specify octal and 0x is hexadecimal. Both are standard, but rarely used and not all programs support them. There are even more ways to write an IP address, including dword and different mixes, but they are usually only used for obfuscation in malware.

  21. Re:IP and Hardware addresses by squallbsr · · Score: 3, Informative

    Also, typically binding a service to ip 0.0.0.0 connects it to all available interfaces on the machine.

    i.e: starting a development server for a django app on all interfaces (instead of the default 127.0.0.1)
    python manage.py -runserver 0.0.0.0:8000

    --
    Sleep: A completely inadequate substitution for Caffeine.
  22. OK, I'll play... by PRMan · · Score: 2, Informative

    Bad filename character for Windows (if it matches, the filename is invalid):

    [*<>=+"\\/,.:;]

    E-mail (use case insensitive):

    ^\s*[\w-~&$+']+(\.[\w-]+)*@(?<domain>[\w-]+\.)+(?<tld>[0-9]{1,3}|aero|arpa|biz|com|coop|edu|gov|info|int|museum|net|org|[a-z]{2})\s*$

    GUID (use case insensitive):

    ^\{?[0-9a-f]{8}-?([0-9a-f]{4}-?){3}[0-9a-f]{12}\}?$

    IP on local private network:

    ^127\.|^10\.|^192\.168\.|^172\.1[6-9]\.|^172.2\d\.|^172.3[01]\.|^169\.254

    Removes .NET named capture syntax so that a .NET Regex string can be used elsewhere (such as Javascript) (replace with nothing):

    \?\<\w+\>

    Flame away about how horrible it is that I missed some edge case that even nobody on Slashdot has ever heard of, but they work well for me and hopefully for you too.

    Now, if you actually find a common case that I missed, I would appreciate the help...

    --
    Peter predicted that you would "deliberately forget" creation 2000 years ago...
  23. Re:Regex Support by jonadab · · Score: 2, Informative

    Emacs has pretty good regex support, and is available for Windows. HTH.HAND.

    --
    Cut that out, or I will ship you to Norilsk in a box.
  24. Re:Not a trick, but a question. by Dracorat · · Score: 2, Informative

    The only reason /\([FB][ot]o\).*\1/ doesn't work is because you escaped the parens. It works in my compiler as /([FB][ot]o).*\1/

    HTH

  25. Re:is it an rfc-822 compliant e-mail address? by trjonescp · · Score: 2, Informative

    The link provided by the parent provides a more readable version of the same thing (written as a PHP function)

    --
    Only speak when it improves the silence.
  26. Re:IP and Hardware addresses by tamyrlin · · Score: 4, Informative

    I personally like the regex-builder mode in Emacs as well. This one allows you to build a regexp while highlighting all matches in the current buffer.

    Of course, this should probably have been posted in the emacs thread earlier, but I think it is probably a good match for this thread as well :)

    To start it, just use M-x regexp-builder

  27. Re:Your re is overrated by scotsghost · · Score: 2, Informative

    cause it's an interesting discussion of a common (mis)understanding. did you know the RFC specifies leading-zero-for-octal and leading-0x-for-hex? i knew those were commonly used conventions in some places but didn't know that included IP addresses.

    if the mods do their job, the posts correcting the GP's mistaken understanding will also score high marks.