Domain: regular-expressions.info
Stories and comments across the archive that link to regular-expressions.info.
Comments · 33
-
Do they hate Perl or something else?
I write a lot of Perl, and almost all of it has regexes throughout (monitoring Oracle logs, data mining, etc.) The last time I heard, Perl and Java had the most comprehensive implementation of the Regular Expression engine of all languages. It's well known for its data munging abilities. So I wonder if it's Perl that people hate, or the regexes that Perl is supporting?
-
Re:You could speed up your current solution
Write something that uses a regular expression library (RE2 would be ideal, if your expressions are actually regular), and keeps the compiled patterns resident. Most of your time is likely spent parsing the patterns.
I'm probably going to get shat on by kids who don't know any better, but....
Use Perl. If a complex set of regular expressions is taking 15 seconds per email, then there's clearly something wrong with the implementation. I suspect you're doing too much backtracking. I've been guilty of the same in the past. In one case, simply anchoring my regular expressions to the start and end of the string reduced running time literally by two orders of magnitude. Just glom the whole message into a string and go nuts.
And before someone makes a 'write-only' joke about Perl regular expressions, I'd suggest you take a look at Perl 6 regex grammars, which provide you with the ability to lay out complex rulesets with ease - and makes them vastly easier to read.
As with any programming issue, it's horses for courses, and when it comes to parsing text with regular expressions, Perl is still at the head of its class.
-
Re:Interesting but...
I would also like to know Open Source Advocates attitude towards ???.
That's a regular expression (regexp) denoting a one to four character long string ending in a period. I'm in favour of regexps.
-
Re:That long? Optimistic, aren't we?
You can see a table of the different flavors of regex here -- VB regex is very similar to Perl and others.
I do a lot of coding in VB because of an existing codebase and foxpro databases. I had a superior attitude about it and intended to port everything to a "better" language once I put out all the fires, but as I got proficient with it ended up leaving it as is. Now I'm more productive than I was before in my C/C++/Java days.
I don't think the language is very relevant anymore. Whatever you're good at. -
Re:Guess who's not taking part?
-
Re:Finally
It's harder than you'd think.
Granted, you could just copy/paste from that page, or find a library to do it for you. But validating it with a regex, in a meaningful way, is non-trivial.
-
Re:Sure. Anybody...
.tv is a country domain who just take advantage of their suffix to make lots of money.
http://en.wikipedia.org/wiki/Tuvalu
.info... well, whatever. http://regular-expressions.info/ rocks though. -
Re:brilliant or dangerous?I'm sorry but that's just wrong.
package binary;
public class Finder {
public static int find(String[] keys, String target) {
int high = keys.length;
int low = -1;
while (high - low > 1) {
int probe = (low + high) >>> 1;
if (keys[probe].compareTo(target) > 0)
high = probe;
else
low = probe;
}
if (low == -1 || keys[low].compareTo(target) != 0)
return -1;
else
return low;
}
}This example from Beautiful Code published by O'Reilly is complicated (not really but it does require some basic comp sci knowledge you wouldn't expect a 10 year old to have). Or how about:
(?<=,|^)([^,]*)(,\1)+(?=,|$)
(example from http://www.regular-expressions.info/ which removes duplicate lines from text)
I'd argue that in both the cases the "simple" and "readable by a ten year old" versions would be much more bug prone, complex and inefficient.
Programming does require skill and some education and it is perfectly reasonable to expect that someone who is trying to understand some code is capable of spotting a binary tree search and is capable of interpreting some regex. If somone doesn't have that capacity then they should go work for HR and not expect the rest of us to dumb down to their level.
-
Re:Regex Support
Note that the person who makes EditPad also runs the website http://www.regular-expressions.info/, which happens to be a very useful tutorial and educational regexp site.
-
Re:Regexp-based address validation
I find the ones on this page more useful: http://www.regular-expressions.info/email.html
-
that expression is a run-on sentence maker
the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^) and the hyphen (-)
/.// - wouldn't work either - it would only remove the first character:
/.*// -
Handy links
While I'm not providing any specific trick per say, on topic are a few useful links:
http://www.regular-expressions.info/ - this one is handy for regex info particularly in Javascript which I use so infrequently I need to know how to match, capture, substitute, etc.
http://perldoc.perl.org/perlre.html - plenty of regex info there which is Perl specific, but of course extends to many other similar implementations -
Re:Twitter troll, mod down
s/ONE/TWO is from the text editor vi (and now vim). It's one way to replace text with other text when in escape mode. vim is a popular text editor.
Buh? s/ONE/TWO is a REGULAR EXPRESSION. It's not "from Vim". Educate yourself: http://www.regular-expressions.info/ http://en.wikipedia.org/wiki/Regular_expression
-
Re:Excellent Post
Here's what I got, so far. Sorry it's not tabbed and cross-referenced...
http://ask.slashdot.org/article.pl?sid=08/09/17/224230 -- in case anyone wants this page, too
http://www.quickref.org/
http://gotapi.com/
http://www.regular-expressions.info/ -- regular expressions
http://www.perlmonks.org/
http://www.rosettacode.org/wiki/Main_Page
http://perldoc.perl.org/
http://www.perlbuzz.com/
http://java.sun.com/reference/
http://forums.sun.com/index.jspa
http://developer.mozilla.org/ -- javascript
http://www.w3.org/MarkUp/Guide/
http://www.w3.org/MarkUp/Guide/Advanced.html
http://www.w3.org/TR/html4/
http://www.w3.org/TR/xhtml1/
http://www.w3.org/Style/Examples/007/
http://www.w3.org/Style/Examples/011/firstcss
http://www.w3.org/Style/CSS/learning
http://en.wikibooks.org/wiki/Programming:Tcl
http://www.acm.uiuc.edu/webmonkeys/book/c_guide/
http://cprogramming.com/
http://www.cplusplus.com/
http://cm.bell-labs.com/cm/cs/cbook/
http://www.parashift.com/c++-faq-lite/
http://en.wikibooks.org/
http://developer.apple.com/
http://cocoadev.com/
http://www.cocoabuilder.com/ -
Re:gExp
http://www.regular-expressions.info/
I'll just leave this here.
-
Re:Sometimes the correct answer is the simplest
So, at least something that ignored returns and whitespace and thus allowed the programmer to structure the expression hierarchically would be nice.
You mean like the
/x modifier in Perl?http://www.regular-expressions.info/freespacing.html
They'd need to be clearly differentiated from literals -- maybe even require literals to be quoted.
No. That's exactly the opposite of what we need. It should be harder for something to not be a literal. If I were designing a regexp language, I would make it so that the contents of variables are always automatically quoted---none of this \Q and \E crap. You should have to add additional stuff around a variable to get the non-quoted behavior. This is the most common bug I've found in Perl code---user-entered text treated as part of an expression when it shouldn't be.
The problem is that regular expressions are often used for things that they aren't suited for. As soon as you add look-behind and crap like that, you're already beyond usable regular expressions. At that point, you need to take a step back and look at the problem again. Chances are, you'll find that the problem would be solved better programatically with a few lines of code rather than a jumbled, cryptic regexp that no one can understand without staring at it for five minutes. (Of course, it doesn't help that half the time, there's no comment before that snarled mess of a massive, bloated regexp to tell what it does....)
Regexp is good for quick text processing. As soon as you're writing things in it that would be better served by a proper parser, you're just asking for maintenance problems of the "Oh, this declaration formatting code doesn't indent correctly when the user puts a struct nested in another struct nested in a typedef struct:" variety.
-
Re:Worst idea ever
that sucks.
http://www.regular-expressions.info/ is actually quite a useful site. -
Re:Where can I get a list of these TLD to block ou
-
Re:And Microsoft was the biggest offender.
Trying to strip out HTML you don't want users to use without mangling the output is very very hard.
Not really. Add a checkbox to enable HTML. If it's not enabled, escape those less than symbols for them -- and detect URLs, and other things.
Preview, blah blah, whatever.
Do the preview in Javascript. Not Ajax, just straight Javascript, client-side, as they type. Gracefully degrade to a preview button.
Unless you're running an intelligent auto-correcting validator like Tidy, or you're parsing the document into a valid object model and then deleting nodes that way (both quite CPU expensive options, compared to running some regular expressions against a string
Regular expressions can be both CPU intensive and wrong. Just look at a real email validator, which I would paste here, but the lameness filter won't let me.
Tell me that isn't error prone, or at least CPU-intensive. (And remember, you're dealing with individual comments, most of them short -- and it's a massively parallizable problem.)
The second reason is convenience features -- instead of making the user write
evanbd said:
, you can just have them write [quote=evanbd]It's a web site. You use HTML.[/quote], and the parser will convert that intelligently into valid HTML.It's a web site. You use HTML.
Or you could just make the blockquote by itself, and rely on the fact that a properly threaded view will show who you were replying to, anyway.
There's also many better choices for convenience, and most BBCode is going to be generated by the wanna-be-WYSIWYG buttons on the forum.
If you decide down the line that you want to change the code that's outputted for whatever reason, all you need to do is change the application logic and clear out the caches.
Or apply CSS.
And to be fair to the poster, before this new comment system, Slashdot used to say below the post box what HTML could be used.
Oh, they got rid of that? I didn't realize... I'm deliberately still using the old comment system.
-
Re:General introductions to regex?
Well, according to the same site, there is a more standards-compliant version as well (but it takes up 426 characters). They explain why the simpler regex is "good enough" in most cases.
-
Re:General introductions to regex?http://www.regular-expressions.info/ And this book doesn't seem any better than this site, which I've used as a reference for the last 3-4 years or so. Plus, there's an additional advantage to using regular-expressions.info over this book: You can't grep dead trees!.
-
Re:General introductions to regex?
You mean like:
http://www.regular-expressions.info/ -
Re:They should make me the editor
It's a regular expression substitution. s is for substitution, / is the delimiter, first set is the text to find, the second set is the test to replace. It can be followed by options and such as well (s/search/replace/i, for case insensitive search being the most common) and it is the means for doing a search and replace in vi (ESC:s/search/replace/). Every nerd should learn at least the basics of regular expressions, they are just too handy.
-
Re:Cryptic? Complex!?
Well most languages have these things called functions, you can use them to do your own string functions.
Yay! Let's reinvent the wheel by writing 10, 20, or more lines of code for something regular expressions would be able to handle in one. Furthermore, let's claim this is done for the sake of keeping the code 'pretty,' because it's far too embarrassing to admit that we don't really understand how to use regular expressions!
Other Languages have a bunch of well performing string functions so I don't need to program them myself.
Hmm, like string functions that allow the use of regular expressions to make your string manipulation quick, efficient, and useful?
Yes, regex can be an odd concept to deal with at first, as they tend to be quite a bit more succinct than the languages you're more familiar with. Are you aware, however, that regular expressions can contain comments and extra whitespace?
Maybe you're paid by the line of code, or am attempting to squeeze in every extraneous hour of programming to inflate your consultant fee. If that's the case, I would certainly recommend avoiding regular expressions; they save far too much time and work entirely too well.
-
Check All User Input w/Regular Expressions
It is well known amongst the more experienced software developers out there that all user input to ANY software system should be considered suspect and therefore must be checked for invalid inputs, boundary, and special cases. The solution has been around for decades, but it is really surprising how many developers out there have NOT heard of regular expressions or do not know how to properly use them. There are some cases, usually when widely variant free-form input is required, that are difficult to use with regular expresssions, but for the most part they have proven to be remarkably effective in my own experience and I use them regularly (pun intended) in my website and application development. If you have not gotten in on the regular expression game then consider picking up the O'Reilly Mastering Regular Expressions book or visiting Regular-Expressions.info before building your next project. The project you save might be your own!
-
Re:Personally...
Lets see where the AUTHOR of that website learned from
....
http://www.regular-expressions.info/books.html
shows that he learned form Jeff Friedl too:
http://www.regular-expressions.info/hipowls.html
Ahh, again THAT book.
jobst -
Re:Personally...
Lets see where the AUTHOR of that website learned from
....
http://www.regular-expressions.info/books.html
shows that he learned form Jeff Friedl too:
http://www.regular-expressions.info/hipowls.html
Ahh, again THAT book.
jobst -
Personally...
I just like to go to http://www.regular-expressions.info/ myself - I seem to find all the stuff I forget from time to time there...
-
Personally...
I just like to go to http://www.regular-expressions.info/ myself - I seem to find all the stuff I forget from time to time there...
-
Re:Finally a chance to user my adblocker on Google
You may find http://www.regular-expressions.info/ helpful. Adblock supports regex strings. I yearn for the day when a search engine does as well.
-
Re:Unacceptable mistakes
I didn't know about [[:alpha:]], thanks. \w varies between each implementation, apparently - this screenshot shows it matching foreign characters with accents and stuff.
Though I would use [A-Za-z0-9_] just to be on the safe side. -
Re:Regular expressions in a cookbook?
This is a cool article on catastrophic backtracking. I remember the first time that got me. It would occasionally cause severe issues on a production server we had. I swung and missed with my reg ex on that one.
-
Here is my attempt to render an explanation...
s/is/eir
It is a regex statement. Essentially, the string typically instructs a language interpreter (PERL, for instance) to search for a pattern and subsequently replace it.
In this case, it is replacing any instances of "is" with "eir"; thus, the following alteration is committed:
Before: but it's always sad to watch someone stoop to this level
After: but it's always sad to watch someone stoop to their level