Java Regular Expressions
Simon P. Chappell writes "Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling. Java was just such a language until the release of the 1.4.x series. Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries. With version 1.4.x, the corporate Java developer in the trench, received the power of regular expression pattern matching." Read the rest of Simon's review.
Java Regular Expressions
author
Mehran Habibi
pages
255 (7 page index)
publisher
Apress
rating
8/10
reviewer
Simon P. Chappell
ISBN
1590591070
summary
A great starter for using regular expressions in Java
The book seems targeted towards those who have a solid level of Java programming skills, but who have not yet used the java.util.regex package. I see two types of Java programmers who might not have used the regex package, those who do not know about regular expressions and those who know them, but have not yet used them within Java. This book should satisfy both sets of users. The first group will be benefited by the general introduction to regular expressions and the gentle introduction to using them within Java. The later group will benefit from the more advanced material in the book.
The book is nicely structured and progresses easily through its subject matter. The first chapter is an introduction to regular expressions. While this is most obviously for the readers new to the subject, it will be useful for those more experienced, because not all regex engines are created equal and this chapter lays out the particular dialect of regular expressions used by the Java 1.4.x regex engine. The second chapter introduces the object model used by java.util.regex. This gives detailed explanations of the Pattern and Matcher objects as well as the new regular expression methods added to the standard String class.
The third chapter takes the reader into advanced Regular expressions. While there is much that can be done using just the Pattern and Matcher objects, the path to the full power of regex travels through an understanding of groups (and subgroups) and qualifiers. Regex groups are hard to explain until you've seen them in action, whereupon you may find yourself wondering how you'd ever managed without them before. Mr. Habibi does an excellent job, both explaining them and introducing us to the unusual noncapturing subgroups. (I'd never heard of these before.) Qualifiers are the other side of the same coin with groups. While it's one thing to define a group and whether it's expected and to be captured, it's equally important to be able to describe the expected occurrence of those groups using qualifiers.
Chapter four tackles the interesting challenges of using regex in an object-oriented language. Mr. Habibi describes the general principles of use of regex as similar to those used with SQL through the JDBC interface. These principles are the optimisimg of connections, batching reads and writes, storing patterns externally, Just In Time compilation of patterns and remembering that not every piece of String handling code needs to be written as a regex. All very useful advice.
Chapter five is the big examples chapter. All of the examples are intended to be practical; the kind of thing you might have to address at the day job. With examples covering Zip codes, telephone numbers, dates, searching text files and even validating an EDI document, he seems to have delivered on that assertion. There are further examples in Appendix C, if the afore-mentioned patterns aren't enough.
The writing and progression of material are good. The examples are very well thought out and explained. Many of the examples are built from first principles. Mr. Habibi seems to want to not only teach you how to use regular expressions, but also how to design them. He does this by working up from an understanding of the data until he has a working regex.
While it doesn't make any promises about being an encyclopedia of regex patterns, this book does contain enough of the normal business patterns to be a useful initial reference work, before turning to the Internet to search for patterns.
If you want an encyclopedic reference work on regex, then buy Jeffery Friedl's Mastering Regular Expressions which is published by O'Reilly. This is not that book, preferring to stick with the practical usage of regex.
This is a great starter book, for developers who are new to using regular expressions in Java."
You can purchase Java Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
The book seems targeted towards those who have a solid level of Java programming skills, but who have not yet used the java.util.regex package. I see two types of Java programmers who might not have used the regex package, those who do not know about regular expressions and those who know them, but have not yet used them within Java. This book should satisfy both sets of users. The first group will be benefited by the general introduction to regular expressions and the gentle introduction to using them within Java. The later group will benefit from the more advanced material in the book.
The book is nicely structured and progresses easily through its subject matter. The first chapter is an introduction to regular expressions. While this is most obviously for the readers new to the subject, it will be useful for those more experienced, because not all regex engines are created equal and this chapter lays out the particular dialect of regular expressions used by the Java 1.4.x regex engine. The second chapter introduces the object model used by java.util.regex. This gives detailed explanations of the Pattern and Matcher objects as well as the new regular expression methods added to the standard String class.
The third chapter takes the reader into advanced Regular expressions. While there is much that can be done using just the Pattern and Matcher objects, the path to the full power of regex travels through an understanding of groups (and subgroups) and qualifiers. Regex groups are hard to explain until you've seen them in action, whereupon you may find yourself wondering how you'd ever managed without them before. Mr. Habibi does an excellent job, both explaining them and introducing us to the unusual noncapturing subgroups. (I'd never heard of these before.) Qualifiers are the other side of the same coin with groups. While it's one thing to define a group and whether it's expected and to be captured, it's equally important to be able to describe the expected occurrence of those groups using qualifiers.
Chapter four tackles the interesting challenges of using regex in an object-oriented language. Mr. Habibi describes the general principles of use of regex as similar to those used with SQL through the JDBC interface. These principles are the optimisimg of connections, batching reads and writes, storing patterns externally, Just In Time compilation of patterns and remembering that not every piece of String handling code needs to be written as a regex. All very useful advice.
Chapter five is the big examples chapter. All of the examples are intended to be practical; the kind of thing you might have to address at the day job. With examples covering Zip codes, telephone numbers, dates, searching text files and even validating an EDI document, he seems to have delivered on that assertion. There are further examples in Appendix C, if the afore-mentioned patterns aren't enough.
The writing and progression of material are good. The examples are very well thought out and explained. Many of the examples are built from first principles. Mr. Habibi seems to want to not only teach you how to use regular expressions, but also how to design them. He does this by working up from an understanding of the data until he has a working regex.
While it doesn't make any promises about being an encyclopedia of regex patterns, this book does contain enough of the normal business patterns to be a useful initial reference work, before turning to the Internet to search for patterns.
If you want an encyclopedic reference work on regex, then buy Jeffery Friedl's Mastering Regular Expressions which is published by O'Reilly. This is not that book, preferring to stick with the practical usage of regex.
This is a great starter book, for developers who are new to using regular expressions in Java."
You can purchase Java Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
However, like many things in computer science, speed gains come at a price. In this case, the regular expression language supported is not quite as rich as the JDK implementation.
Sigs cause cancer.
Me: I'll have a Grande Cafe au Lait please.
/me hands over cash, takes careful first sip.
Starbucks Employee: That'll be an hour's wages please.
Me: Thanks!
Thats when you get to see my java regular expression.
Generally it will be me wincing in pain because I just burned my tongue. Sometimes, if it's cooled enough, you'll hear a quiet "MmmMmmm" in the style of Family Guy's Herbert.
I tried to do a bit of recursion in regexes once, like ((\d+)\.)+, but that didn't work. It's too bad, because I don't think there's another way to dynamically match data in regexes. Other than this, they've served me very well all these years.
Send email from the afterlife! Write your e-will at Dead Man's Switch.
Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling.
Er, no. It is only for trivial string handling that the regex approach is useful.
For non-trivial string handling (particularly if you feel like giving the authors of erroneous strings helpful error messages!!) I'll write a proper lexical analyser and a proper parser every time.
Of course, if you're using a language that doesn't have built-in regular expressions, you might
still have good regular expression libraries available to you. Boost::Regex is a great choice
for C++, for instance.
You sir, have obviously not programmed in C++...if at all...
Are you serious? What kind of company would do that? It's madness!
dominionrd.blogspot.com - Restaurants on
My main complaint about java regexps is that all the backslashes have to be quoted with a backslash, making them completely unreadable compared to a language that supports regular expressions natively, like perl (no, a standard library is not technically native support). "\d" becomes "\\d" and so forth. Does anyone know a simple way around this? We just started using java regexp's at work, so the extra backslashes don't bother most people, but they are extremely annoying to those of us with a lot of perl experience.
P.S. How many slashdotters thought they'd be rolling in their graves by the time they heard an example of where perl is more readable than java?
This space intentionally left blank.
The missing Regular Expressions is what kept me off Java and on Perl for a looong while. I started using ORO and since their introduction into Java itself I almost completely switched over. I relly do hope Perl 6 will be released and lives up to its expectations.
Having said that I really don't see why you have to devote a complete book on regex. A small tutorial does just fine.
I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
Slightly off-topic, but...
Back when my only experience was development on Windows I was very frustrated with the lack of good string handling in Microsoft languages (VB, T-SQL). If you didn't find a third-party library you had to write a lot of expensive code to do fancy string searches. Try writing recursion in VB6 without bringing your computer to a screeching halt.
Then when I switched to linux and open source I was shocked to learn that something as useful as regex had already been around for many years. Most of the Windows developers I knew never even heard of it. It was tricky to learn but has paid off many times over in utility.
Every developer is better of for knowing it. Even if they never use regex the thought process in understanding it is quite interesting and educational.
Developers: We can use your help.
C is an incredibly powerful addition to most programmer's personal toolkit of techniques.
..oh ..we are talking about CS students who discover the joys of the likes of Java on their long path from..
...to...
...
...???
;-)
10 print "hello world"
20 goto 10
struct filter {
int (*open) (void *);
int (*close) (void *);
};
Nevermind then... come back in 10 years... (if you're still a programmer by then
Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries
Who's boneheaded enough to do this? I want to know so I can avoid buying anything from them, because their products are going to be overpriced by at least 50% due to the wasted effort.
I can understand restricting third-party libraries to those of a certain license, like BSD or LGPL, but a blanket ban without any exceptions for something as essential as regular expressions? That's just stupid.
One of the biggest advantages of Java is the enormous number of high-quality third-party libraries available.
Is this just something the submitter dreamed up to fill space, or do companies actually do this?
This space intentionally left blank.
Who are these companies and what can possibly be their justification for such a blanket policy. I can understand for some ultra-high security/uptime systems with incredibly strict standards and processes who would need to put third party code through an extensive and expensive audit. But for the rest of us? No jUnit? log4j? Is Boost allowed? Good lord, I can't imagine programming in such a world.
I hope I never work for one of these firms.
Taft
I beleive fear is the primary culprit here. Many places I've worked for/with only allow internally developed library use... And I'm sure half of it is swiped, stolen, or 'inspired' by popular, free, open source, 3rd party libraries.
Your introduction to OOL was in Java? Boy, must that have sucked, java's probably the most static and limiting OO language out there...
"The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
Java...Regular expressions? Error!!!
Regular expressions belong in a real programming language like Perl where they
seamlessly blend in with the arcane chaos that looks like when the Dyslexic Liberation Front
blew up the alphabet spaghetti factory.
Why would Java "programmers" sully their nice looking suburban code with ascii vomit?
It's just not natural.
I'm at one right now (hence why I'm posting as an AC), and my previous employer was like that as well (except we were allowed minimal use of Struts on one project). It's typical "not invented here" reasoning, usually from "software architects" convinced their own home-grown platform/library/framework is better than anything else out there.
In my experience, it leads to systems with too long of a ramp-up time for new hires to start working on and delays to tweak the library for every new thing the developers are trying to accomplish. But it doesn't matter that a simple project took months to accomplish, as long as there's a perfect (in their eyes) foundation they can sneak out the back door when they finally leave.
Somebody hasn't worked for "many companies." _Every_ company I've worked for allowed 3rd party libraries. (Sure, there are processes to make sure you don't do something stupid like ship a GPL library with a closed-source product, but that's just common sense.)
I spoke about the "regex coach" tool from http://weitz.de/regex-coach/ on my podcast (shameless plug!) http://webdevradio.com/ - it's a great tool for helping visually walk through the regex creation process, especially for complex needs.
creation science book
Save yourself $14.80 by buying the book here: Java Regular Expressions. And if you use the "secret" A9.com discount, you can save an extra 1.57%! That's a total savings of $15.20, or 38.58%!
One of the reasons we as programmers write code is to take a very complex idea, like a software application, and write something that a human engineer can understand. The KISS principle especially applies to coders.
:)
As I get older, my code has gotten more and more straightforward, cause I consider to maintainance cycle of code to be more than 95% of the puzzle. And these days, I have more than one security analyst who is not a senior software engineer poking around me code.
RegEx's are not-so-readable and not-very-maintainable programming abstracts that should be avoided whenever possible. I prefer using string manipulation abstraction classes (such as my own version of StringTokenizer). They are not as fast and furious as other methods like lexical analysis, and the code is more bloated, but the code is Straight Forward And Easy To Read. There is a power is code of this nature, and my clients have thanked me more than once to not focusing on writing "cool code" but for writing "clean and simple" code. I just tried to paste in a few ugly regex samples, but slashdot blocked me calling them "junk characters" I agree!
For example, take XPATH, this is a clean and simple way to address XML objects. Sure, there is an additional level of abstraction, but you can look at an XPATH query, even from a layman's point of view, and have a clear understanding as to what it is doing.
Horns are really just a broken halo.
Come on:
"Some String".replaceAll("Java", "Bloated piece of shit")
And FYI PatternSyntaxException is a runtime exception so no need to catch it and rethrow as a RuntimeException.
so to write it your way:
String theTruth(String s){
return Pattern.compile("Java").matcher().replaceAll(s);
}
Oh, I think you're hardly being fair to Java - your example was artificially bloated. I can easily do this in one line in Java:
Runtime.getRuntime( ).exec( "perl -e 'sub theTruth($) { shift; $_ =~ s/Java/Not so bad now/; return $_; }" );
I think you owe Java an apology.
Proud neuron in the Slashdot hivemind since 2002.
Try:Still not as compact but at least there aren't any tildes in there. I wonder if there would be a more compact way to do it. This seems terribly heavy weight for such a simple example. Oh, wait! There is!So now we compare:To:So the Java code ends up being a handful of characters longer and much easier to read. I'm not saying that Java is the ideal Regex language, but your example sucked.
Really, Java is not meant to be a string processing utility. It is honestly too slow with too much overhead for this type of functionality. Regex expressions were meant to be used in the occassional light occurrence of string processing in Java. If you are really needing some string processing, like over a large dataset, stick with something like Python which is based on C++. It is fast with some very cool tools, such as regex, dictionary use, etc. Even if you need a light GUI, you could always interlace some Python with TK.
Development notes at http://devscribbles.blogspot.com
Java is one of the closest languages to the original object-oriented language, Simula. It's kinda like a crocodile... not so much elegent or refined, but still so successful for so long that you have to wonder what it is doing right to last so long.
Fans of Ruby/Python/Smalltalk/Lisp/etc will do things like add a calculate-average method to Array and claim that is object-oriented, but it is not really. What if the array contains for instance regular expressions, wtf does an 'average' mean for that? It's a useful practice, but in theory it is nonsense. Teaching OO in a language that allows and encourages that sort of abuse just makes it harder to understand why we use objects in the first place.
I'm glad that it's there, and I suppose it was useful during my prototype phase, but a little profiling revealed that my app was spending half its time parsing input. Dumping out the input to String and sometimes char[] and doing the parsing myself in hand tooled code almost completely erased the speed hit I was taking on load.
Start Running Better Polls
Any company that doesn't allow, nay, embrace third party jarballs is missing 98% of the point of Java. The language is so-so, the built in libraries are nice, but not infinite - but the ability to load componentized, versioned, packaged third-party tools is priceless.
If I were to ask everyone to start programming in assembly language, I suspect that I would be laughed at. Yet with regular expressions that is exactly what we are doing. If you take a look at the history of regular expressions, you will find staring right back at you the guts of compiler theory with state machines, finite state automatia, etc. Instead of asking for regular expressions, programmers should be asking for higher level pattern matching facilities. Something as simple as finding the balanced parentheses in the string: (a+b)/((c-d)+e) using a regular expression is difficult. Yet there have been languages that have advanced string matching capabilities around since the 60's (start looking at Snobol -- which is still alive -- and some of it's descendants).
Slashdot: Where anecdotes and generalizations can be freely substituted for facts, logic, or intelligence
And I find it much easier to follow.
Rediculous: A word indicating the writer is ridiculously ignorant.
That depends whether searching for content in a string is "trivial" or not. More likely, it comes from only encountering complex problems in one of the two subsets and only trivial problems in the other. There are non-trivial problems in both subsets.
...
:-(
The subset of problems you use a regex for are those where there are non-trivial patterns in the text that you wish to extract. The subset of problems you use a parser/lexer for are those where there is some formal model that describes the syntax the input is expected to have.
These two problems sets do NOT often overlap. If you're using the wrong tool for the wrong problem, you're in for a world of hurt. You do NOT want to parse XML/HTML/etc. with regexes (you can do a few things, but you open yourself up to a world of well-deserved pain when you realize the true evils of nesting and how they affect regexes).
Similarly, there's no way in hell you want to search unstructured text with a parser/lexer. Yes, *unstructured* data. Programmers actually deal with that from time to time. It's when we use regexes. You know, when searching for *patterns*
I've written both. I've used both. They're both great problem solving approaches, but using the wrong one invites pain. Sure, maybe you can get by with a half-assed system that has bugs your users will never find (e.g. they won't nest anything too deeply for the XML regex to find), but it's still a bad idea.
So please, please use the right tool for the job. With my luck, I'll get stuck maintaining your code if you don't
I recently wrote a small app based on "Filter Builder" by ActiveState. It's called Pattern Sandbox and has helped me rapidly prototype regexes for both Java and Perl (because the Java dialect is very similar to Perl's). I made Pattern Sandbox because it was so annoying to write a regex, compile, get to that part of the code/interface, and then finally try it just to find that it does not work correctly so I have to repeat this process until I get it right. If you are using Java regexes on a regular basis, Pattern Sandbox or similar tools are indispensable. Try it out and feel free to give me some feedback. I hope this is not too much of a plug, but I thought it to be very appropriate.
The bottom line is quite simple; that small handful of code accumulates very quickly until it is no longer a small handful, at least when compared to something that uses better (re:shorter) identification names for libraries, and that has sufficient mechanisms to cut down on function and variable names such as scope control, (and I do admit, I absolutely love java's handling of brackets and static brackets ... you can validly place those suckers in the weirdest locations), enumerations, (which java finally added a few years back, albeit in their own ugly way) and much more-importantly, symbolic operator-overloading :-p You know I was heading there.
.NET) and toy languages (Primarily VB), if I can create a program that mimics tetris (C#.NET) in just about 7 hours, and the executible code is more than a few times larger than the source code, I call that a toy language - of course, in java, it took weeks for pacman - well, at least for us to do that in a three-student team for pac-man, but that was almost five years ago, in my first java class ever 8-B
:-p )
At any rate, yes operator-overloading can provide you with multiple ways to shoot yourself in the foot, but Java is already ready for these puppies; think interfaces! All you have to do is look at the most basic and commonly-implemented interfaces that java recognizes, and then say, "Okay, which operators should be overloaded to match these interfaces?," (i.e. the commonly-overloaded operators for queues, lists, stacks, and comparable types, etc) implement those into the virtual machine, and boom, you've got backwards-compatible operator-overloading in Java. No biggie, right? Makes SENSE, right?! Well, heck, at that point you've almost got a dynamic-by-default version of C++ without a macro preprocessor! In my book, that's progressive!
But Java's had a very crappy version of regular-expression support for ages. I wasn't able to understand it for a very long time and in fact I learned many other regular expression engines for various scripting/programming languages in far shorter periods of time (Perl, Python, PHP, Java script , egrep, f/lex for C/++, etc). But with this newfangled magic era of software libraries (Java and VB and
I was such a die-hard C++ fan then. Now I say bring on the libraries and new languages, but save the Visual Basic software for your children to play with, just like almost all of us must have done, at some point.
--I gots 99 problems but a new machine ain't one!
AMD! Asus! Whoot! 6 years!
It is just that you should not use a fork to hammer a nail.
Balancing parentheses was just the first example my teacher told the class when explaining that regular expressions were not suited for everything and that sometimes you had to use grammars.
Why can't
Apart from the fact that your code is the worst that you can write when using RegEx in Java (as pointed by another post, RTFApi doc if you want to use Java properly), it amuses me that you are complaining that Java (a language designed for using strong OO and being multiplatform) is slower than Perl (a language designed for processing regular expressions).
You could have said also that the Fire Department sucks because they are not good at catching burglars, or that the Police Department is full of losers because they can not put down a fire. Myself, I will keep using the FD to deal with fire and the PD to deal with crimes.
Why can't
Great things about the Java 1.4+ regex support, from my perspective, include that (1) it's nearly as full-featured as Perl's regexes (and thus far better than Javascript's); and (2) it's usable in web browsers and via embedded applets.
Those were both key to helping me create Regex Powertoy, a interactive visual regex tester, much like others mentioned in this discussion -- but fully implemented in a browser. It's in JavaScript and DHTML, with a Java applet for the full-featured and step-controlled regex matching -- requires FF1.5+/IE6+ & Java 1.5+.
Check it out, break it (it's still got some rough edges under heavy input), let me know how it could be improved.
Gah, and to think I passed that class :P I just hadn't realised that all that theory about automata and K* and whatnot applied to the real world!
Send email from the afterlife! Write your e-will at Dead Man's Switch.
You'd have to implement some useless ArithmeticCollection for that in Java. In these (can't talk for lisp) other languages, you just define the method and throw something when a member doesnt have the + message. How is one worst practice than the other?
private final Pattern methodPattern = Pattern.compile("^(.*) .* HTTP/.*$"); .* HTTP/(.*)$");
private final Pattern versionPattern = Pattern.compile("^.*
private final Pattern resourcePattern = Pattern.compile("^.* (.*) HTTP/.*$");
Happy days.
There was some weirdness with GCJ not behaving like Sun's Java, but that seems to have gone away with the last update to GCJ I did.
C-x C-s C-x k
Actually, you won't get any output from that, you need to hook up the InputStream from the Process object to the standard out of your own java process and run it in a separate thread or a while loop. I've also found that running interactive processes (both on windows and Unix in java 1.4) to be nearly impossible, as I can't seem to actually send data on the input stream of the other process. There are also platform dependent differences, which can be a pain. Generally I've found exec to be lacking.
If you only program in Java, and you have yet to use regexes, then I could see why you might possibly want this book. But how is it that much better than a general purpose regex book (of which there are several). I would think it would be more useful to have a book that covers regexes as a computing concept and then talks about the differences/limitations of different implementations (grep, sed, Java, JavaScript, Perl, etc.) Is Java still a big enough buzzword to sell books?
If you can read this sig, you're too close.
Too bad Perl is better at being multiplatform than Java, too — and that Ruby is better at being strongly OO than Java, despite having a strong Perl heritage.
Unfetter your ideas. Copyfree your mind.
Let's light this book on fire? What else can Java do half right that's already been perfected.
You need to create two sets of FIFOs, one for to talk to your child, and for it to talk back.
You fork, then dup2 the child's STDIN to the "far end" of the former pipe,
then you dup2 the child's STDOUT onto the "far end" of the latter pipe.
Finally, you exec() in your child.
You hold onto the two near ends and use them as seperate Input/Output streams for control.
You're going to need to:
1) Catch SIGPIPE for when the spawned process closes it's reading end of the pipe.
2) Catch SIGCHLD so you know when the process exited.
3) Set your near OutputStream to autoflush mode.
On top of all this, your remote program has to be able to work in an unbuffered mode. Most command line programs don't. They are designed to work with files, and STDIN/STDOUT that are already "in the right mode", having inherited them for a program who had them attached to a TTY.
That is probably the issue you are having.
Some programs like 'cat' have a -u option which basically sets autoflush on their end so that you receive data to read as soon as it's available, and not when the fifo decides to flush.
You can stick that into the beginning of pipeline and it should encourage the others to flow if they don't have an explict unbuffered mode themselves.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
...or like enums or any other "magic strings" that you need to make your code actually DO SOMETHING besides act as a framework passing data around...
1) POSIX classes are your friends
2) Build large regexes out of small regexes
3) Compile and name your regexes
4) Hide regex matching details inside of class methods when appropriate
I mean, what would you do if you needed a recursive decent parser? Or do we do everything via XML now?
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
...is a parser. Invented about the same time. But those are typically based on transformation rules and regular expressions to tokenize your input.
You could always build your own regular expression compiler. It's not unheard of. But I submit that the "language" is small enough that it's not worth it.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Don't make me golf that. Perl gets much more succinct, and this is just my first (somewhat lazy) attempt:
sub theTruth($) { shift; s/.+/bloated piece of crap/; $_; }If I don't have to do it as a sub, I've got this off the top of my head:
$_ = 'Java'; s/.+/bloated piece of crap/; print;Unfetter your ideas. Copyfree your mind.
. . . and, frankly, you should have some code to do something with your replacement string in the shortened Java example, else I could eliminate the print line in my shorter Perl example. In either case, the Perl example has fewer than half as many characters as the Java example, despite the fact I haven't even started throwing out whitespace that exists only for clarity purposes.
Unfetter your ideas. Copyfree your mind.
You might enjoy the novel way regular expressions are implemented in Scsh, the Scheme Shell.
http://www.scsh.net/
And now to celebrate this new-found ability to manipulate strings easily:
s/trench,/trench/;
Ah, I knew that would make me feel better.
Uh, not for regexes, but basically, from my experience, anyless less than 100 lines of Java runs faster noticeably in about 20 lines of Perl unless you're doing some strange vodoo with the Java API
Don't believe in miracles -- rely on them.
not many companies allow the use of 3rd party libraries
I assume the review author hasn't worked for many companies then. I have yet to find any company the doesn't use third party packages. Logging, XML parsing and unit testing are just the first three things that spring to mind when I consider what might require a third party package. As for the "DLL hell" that someone alleges in a post to this thread, it's virtually non-existant. You ship the third party packages with your application (as a single JAR or WAR file), and rely on the accepted good practice that people don't set a default CLASSPATH these days.
Man, that's why I don't use Java. I mean - you need a whole book to learn how to use regular expressions in Java? In Perl =~ s/hard/easy/ ;-)
Zen tips: Pay attention. Don't take it personally. Believe nothing.
Aside from Boost being horrid bloatware , what exactly is wrong with the standard POSIX regexp functions? Look up regcomp() , regexec() etc which have been part of the standard C API for years.
Isn't it creepy how D programmers, PCLinuxOS users, and Scientologists all seem to have the same bizarre sort of cultish eagerness to them?
Unfetter your ideas. Copyfree your mind.
First, take out that ($) prototype. Perl doesn't use them that way. In Perl, a prototype is not for the same purpose as they are in other languages. They're for type coercion between scalar and array contexts; in this case you're saying "if they give me an array like a stupid git, please coerce it into a scalar context for me, thanks." If they pass a 26-element array, coercing it to a scalar context ends up giving you a numerical 26. Without the prototype, you'd process the first element of the array.
Second, the default argument for s/// is the $_ variable, so you don't need to say $_ =~ s///.
Third, modifying the global $_ in a sub is a recipe for odd bugs in the caller. Localize the damage with local $_, or use a my lexical.
The return keyword in the last statement is optional. It's up to you to decide what's more readable. In a one-liner, I would omit.
[
Every Java application you will ever see has a lib directory and in it are the jar (library) files it needs. The script or shortcut you use to start the app will ensure they are on the classpath.
No set up, no messing, no conflicts with anything.
Most commercial apps (esp. on Linux) will come with their own JRE too.
I figured I badly mangled that Perl one-liner. Oh well. When you get stuck programming in Java, repetition and redundancy gets all too normal.
"Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques"
Can you cite a source?
Final 2006 "Proof of Global Warming" US Hurricane Count -> 0
The C based approach is necessary because it's a "unix thing" and the issues you have with external process + (x language) are OS-dependant, not language dependant.
/some/fifo | unbuffer od -t x1a | less
I don't know what the equivalent to "dup2" is in java. Ultimately it's the system call you want your language to use to make the rubber meet the road. I'm sure there's a POSIX class or something you can leverage.
(Example: In perl you'd use open with the ">=" prefix. But that lulls you into a false sense of portability. I prefer to "use POSIX qw(dup2)" and just dup2 directly.)
And I noticed that "cat -u" is useless on linux after submitting the post. Instead, check out "Expect" and the utility programs that come with it; specifically "unbuffer". It takes it's arguments and runs then with the stdout flushed for you. Unfortunately you have to use it in each stage of your pipeline. So like:
unbuffer tail -f
I thought maybe you only had to do the first one to "prime the pump", but I was wrong. The only one you don't have to do is the last one.
And in your case, since you are the final reader (and you already autoflush your writing pipe), you don't need unbuffer since you are already doing it, so to speak.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON