Slashdot Mirror


(Useful) Stupid Regex Tricks?

careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"

16 of 516 comments (clear)

  1. IP and Hardware addresses by rallymatte · · Score: 5, Insightful

    To filter a string to make sure it's a valid ip address this regexp is quite useful.
    /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

    And this one for mac addresses
    /^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/

    1. Re:IP and Hardware addresses by Poltras · · Score: 3, Insightful

      So if I get this right, 0.0.0.0 is a valid ip address? I know the real regex would take a full post, but yes, it is possible to check with a single regex is it is valid, if it makes sense (127.0.0.1, 10.*, 169.254.*, etc etc) and if it's not a broadcast or a network address (not taking netmask into account).

    2. Re:IP and Hardware addresses by plumby · · Score: 3, Insightful

      So if I get this right, 0.0.0.0 is a valid ip address?

      If you mean "Is it an address that you can send IP traffic to?", then the answer is no. If you mean "Is it a valid value that can end up in an IP address field (e.g., in the response to the ipconfig command)?" then the answer is yes - it means that you've not got a connection.

    3. Re:IP and Hardware addresses by kimba · · Score: 3, Insightful

      Why isn't 0.0.0.0 or 10.* a valid IP address? Since when is the definition of IP address to be unicast and globally routable?

      I'd rather take issue with the fact it completely fails on IPv6 addresses.

  2. Re:is it an rfc-822 compliant e-mail address? by Timmmm · · Score: 3, Insightful

    Mmmmm readable.

  3. use Regex::Common; by oneiros27 · · Score: 4, Insightful

    use Regex::Common qw(URI net);
    $text_with_urls =~ m/$RE{URI}/;
    $text_with_ips =~ m/$RE{net}{IPv4}/;

    --
    Build it, and they will come^Hplain.
  4. Re:ARGH!!!! by Anonymous Coward · · Score: 2, Insightful

    So clearly, Slashdot's shit never stank?

    No, seriously, why the bitching? Did you expect the site to just keep reporting dry stories about incremental Linux kernel upgrades for its entire existence? You expected a website to never change and never update with the times? Just because it's old doesn't mean it's sacred.

  5. Re:Do these questions really belong here? by Moebius+Loop · · Score: 2, Insightful

    I like stackoverflow a lot and have been tangentially involved in other tech knowledge base-type sites, but they suffer from one typical problem.

    People who already *have* certain knowledge don't often spend much time reading sites dedicated to dispensing that information.

    --
    have you been seen on slash?
  6. Re:99 Bottles of Beer on the wall by Culture20 · · Score: 4, Insightful

    (I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)

    Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.

  7. Re:Mainframe Formatting by msuarezalvarez · · Score: 3, Insightful

    You are a great candidate for the Useless Use of Cat award... specially endearing is your making a comment on the few commands your line uses :D

  8. Re:Regexp-based address validation by xenocide2 · · Score: 4, Insightful

    The regex is beautiful in the sense that it lets you not be one of those assholes who refuses valid email addresses.

    --
    I Browse at +4 Flamebait

    Open Source Sysadmin

  9. (Useful) Stupid useless articles by Kent+Recal · · Score: 3, Insightful

    Dear slashdot editors,

    slashdot.org is not stackoverflow.com.
    The articles and discussions here are not searchable in a sane way. Your recent attempts to mimic stackoverflow are just a waste of everybody's time because all those little tidbits that people post get lost in the internet noise immediately.

    We know you're bit desperate for traffic these days. But this is not the way to go.

  10. Re:is it an rfc-822 compliant e-mail address? by ais523 · · Score: 2, Insightful

    Does that thing allow nested comments, and escaping inside them? It doesn't look like it, it isn't recursive. (I have some in the email address I typically put online, ais523(524\)(525)x)@bham.ac.uk; that could be a good test for your email client, and is useful because I've never come across a spambot that can parse it.)

    Recent versions of Perl and Python regices allow you to write recursively; that probably qualifies as a stupid regex trick, especially as it makes them more computationally powerful so they can handle things like email addresses. Or you could just sit wondering why email addresses allow nested comments anyway...

    --
    (1)DOCOMEFROM!2~.2'~#1WHILE:1<-"'?.1$.2'~'"':1/.1$.2'~#0"$#65535'"$"'"'&.1$.2'~'#0$#65535'"$#0'~#32767$#1"
  11. Re:email validation... FAIL by jeremyp · · Score: 3, Insightful

    Your regex doesn't allow + signs in the name part.

    Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.

    And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.

    Jeremy P <jeremyp@example.com>

    Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.

    --
    All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
  12. Opposite by Christopher_Olah · · Score: 4, Insightful

    IMHO, this is exactly the way that Slashdot should be going. Threads like this are interesting, add to the reservoirs of internet knowledge, and have the highest quality to noise ratios.

    I (and I suspect many others) read Slashdot not for the latest +5 funny comment (though those can be fun to read) but to read the opinions of brilliant minds. And when those minds start trading secrets... Everyone wins.

  13. Re:Validating credit card numbers by phantomfive · · Score: 2, Insightful

    Yes, actually, (despite what the other posters have said), you can, but it will be very complicated since you will be implementing something like your own multiplier in regex.

    The simplest way to do it, of course, is to just list all valid Luhn Algorithm numbers. something like (.....384848583 | 938484845 | 8383838383......). Of course, this is probably not what you are looking for, because you will be listing a lot of numbers, and if your Luhn number is too big, then it won't be in your list.

    So, as for a more general solution, it is possible because at each digit you can know whether your number matches so far or not. What you will be basically implementing is a regular expression that checks each digit and says, "does this digit move me to a state that is a valid number or an invalid number?" I could be wrong, but my initial estimate is that this will take less than a thousand states in a state machine (of course, the easiest way to do this is to design a state machine and then translate it to a regular expression).

    To give an idea of what you are up against (and to help me find the answer to your question myself!) I implemented here a simple regular expression to determine if any binary addition will have an overflow at the last digit or not:

    ((0+1)+(1|(0+11)1+)+

    You can do something similar, although much much longer, with the Luhn algorithm.

    Hope that helps.

    --
    Qxe4