Slashdot Mirror


(Useful) Stupid Regex Tricks?

careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"

7 of 516 comments (clear)

  1. New Slashot Section by Frankie70 · · Score: 5, Interesting

    Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.

  2. Re:How about by Anonymous Coward · · Score: 5, Interesting

    I actually like these. Nice little highly enriched concentrations of geekery on a single page. Think how long it might take to round up the sort of stuff that appears here by Googling.

    Turing word: insipid
    In a sentence: You find this page insipid but I find it inspiring.

  3. Match a library call number by Gulthek · · Score: 4, Interesting

    Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.

    Slashcode is interestingly interpreting my formatting, but you should get the gist.


    $text =~ /
            ^[A-Z]+ # starts with at least one capital letter
            \s? # followed by an optional space
            \d+ # followed by one or more digits /x
        or $text =~ /
            ^\d+ # starts with one or more digits
            \. # followed by a single decimal /x
        or $text =~ /
            \d+ # starts with one or more digits
            \s # and a space /x
        or $text =~ /
            Thesis # starts with "Thesis" .+ # with one or more characters of any kind
            \d{4} # then four numbers - year
            \s+ # separated by at least one space
            [A-Z]+ # from one or more capital letters
            \d+ # followed by one or more numbers /xi # case ignored here in case we run into THESIS or thesis
        or $text =~ /
            \d+ # starts with one or more digits
            \- # connected with a dash
            \d+ # to one or more following digits /x
        or $text =~ /
            \d+ # starts with one or more digits
              # followed by a space
            [A-Z]* #followed by zero or more capital letters
        \d+ # followed by one or more digits /x

  4. Be lazy! by subreality · · Score: 4, Interesting

    OK, you asked for stupid tricks, but this one's just plain lazy.

    Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!

    /I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/

    A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.

    Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.

    Even in a simple case like this, half the fun is in explaining it. :)

  5. some that I've used ... by ianare · · Score: 4, Interesting

    SSN
    ^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$

    US phone with or without parentheses
    ^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$

    ISO Date (19th to 21st century only)
    ^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$

  6. Re:IP and Hardware addresses by nschubach · · Score: 4, Interesting

    There's a really cool little "real time" regex analyzer written in Flex: (if you're not one of them scared to death by Flash content)

    http://gskinner.com/RegExr/

    Maybe you can monkey your way into "regexing" the a out of apple :p

    --
    Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
  7. Re:How about by Bandman · · Score: 4, Interesting

    I like it, but I've got a bookmark folder called "Slash-doc" where I store useful threads that contain a lot of information.

    I've got a lot of threads bookmarked.

    Best Practices for Process Documentation

    How would you make a distributed Office system

    Quality Open Source / Calendar / Messaging Systems

    and some others.

    Some of the information in the threads is out of date, but the ideas are useful and interesting to read. I need to go back through Ask Slashdot and get the more recent threads that seem to act as references