Slashdot Mirror


Java Regular Expressions

Simon P. Chappell writes "Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling. Java was just such a language until the release of the 1.4.x series. Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries. With version 1.4.x, the corporate Java developer in the trench, received the power of regular expression pattern matching." Read the rest of Simon's review. Java Regular Expressions author Mehran Habibi pages 255 (7 page index) publisher Apress rating 8/10 reviewer Simon P. Chappell ISBN 1590591070 summary A great starter for using regular expressions in Java

The book seems targeted towards those who have a solid level of Java programming skills, but who have not yet used the java.util.regex package. I see two types of Java programmers who might not have used the regex package, those who do not know about regular expressions and those who know them, but have not yet used them within Java. This book should satisfy both sets of users. The first group will be benefited by the general introduction to regular expressions and the gentle introduction to using them within Java. The later group will benefit from the more advanced material in the book.

The book is nicely structured and progresses easily through its subject matter. The first chapter is an introduction to regular expressions. While this is most obviously for the readers new to the subject, it will be useful for those more experienced, because not all regex engines are created equal and this chapter lays out the particular dialect of regular expressions used by the Java 1.4.x regex engine. The second chapter introduces the object model used by java.util.regex. This gives detailed explanations of the Pattern and Matcher objects as well as the new regular expression methods added to the standard String class.

The third chapter takes the reader into advanced Regular expressions. While there is much that can be done using just the Pattern and Matcher objects, the path to the full power of regex travels through an understanding of groups (and subgroups) and qualifiers. Regex groups are hard to explain until you've seen them in action, whereupon you may find yourself wondering how you'd ever managed without them before. Mr. Habibi does an excellent job, both explaining them and introducing us to the unusual noncapturing subgroups. (I'd never heard of these before.) Qualifiers are the other side of the same coin with groups. While it's one thing to define a group and whether it's expected and to be captured, it's equally important to be able to describe the expected occurrence of those groups using qualifiers.

Chapter four tackles the interesting challenges of using regex in an object-oriented language. Mr. Habibi describes the general principles of use of regex as similar to those used with SQL through the JDBC interface. These principles are the optimisimg of connections, batching reads and writes, storing patterns externally, Just In Time compilation of patterns and remembering that not every piece of String handling code needs to be written as a regex. All very useful advice.

Chapter five is the big examples chapter. All of the examples are intended to be practical; the kind of thing you might have to address at the day job. With examples covering Zip codes, telephone numbers, dates, searching text files and even validating an EDI document, he seems to have delivered on that assertion. There are further examples in Appendix C, if the afore-mentioned patterns aren't enough.

The writing and progression of material are good. The examples are very well thought out and explained. Many of the examples are built from first principles. Mr. Habibi seems to want to not only teach you how to use regular expressions, but also how to design them. He does this by working up from an understanding of the data until he has a working regex.

While it doesn't make any promises about being an encyclopedia of regex patterns, this book does contain enough of the normal business patterns to be a useful initial reference work, before turning to the Internet to search for patterns.

If you want an encyclopedic reference work on regex, then buy Jeffery Friedl's Mastering Regular Expressions which is published by O'Reilly. This is not that book, preferring to stick with the practical usage of regex.

This is a great starter book, for developers who are new to using regular expressions in Java."

You can purchase Java Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

181 comments

  1. When speed matters by SIGALRM · · Score: 3, Informative
    there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries
    For those who can utilize third-party libs, consider evaluating this DFA/NFA automaton, a regexp package that is significantly faster than java.util.regex.

    However, like many things in computer science, speed gains come at a price. In this case, the regular expression language supported is not quite as rich as the JDK implementation.
    --
    Sigs cause cancer.
    1. Re:When speed matters by Anonymous Coward · · Score: 0, Troll

      And after t>10y of development, Java "Five" 1.5 even has support for the amazing 30 year old sprintf-technology. Hooray for the Java-monkeys! Hooray for the enterprise!

    2. Re:When speed matters by Anonymous Coward · · Score: 0, Redundant
      For those who can utilize third-party libs, consider evaluating this DFA/NFA automaton, a regexp package that is significantly faster than java.util.regex.


      Crikey, I just finished this guy's course on regular languages in June. Using that package as part of the exercises, no less.

      I don't think I have anything to add, but just felt the need to comment.
    3. Re:When speed matters by Anonymous Coward · · Score: 1, Informative

      And var-args! But not in that sane way that just adds more data to the stack, instead it wastes gobs of heap space and requires allocation/deallocation because making real var-args might involve thought!

      And, to promote object-oriented programming, the printf functionality is all located in a final class, so you can't inherit the printf functionality in any other class! Instead you have to wrap another object! Yay object oriented design!

      One of my favorite features coming up in Java 6 is the support for scripting languages. It's getting added in exactly the same way regular expressions were: as an external library. Now, instead of having to waste 200MB on a JRE, you'll get to waste 300MB! Yay, Java!

    4. Re:When speed matters by SSCGWLB · · Score: 0, Flamebait

      use a real language, hopefully one with a footprint smaller then Jabba the Hutt.

    5. Re:When speed matters by Ryan+Amos · · Score: 3, Insightful
      When speed matters

      ...you don't use Java.

      (I know, let the flames commence! :)

    6. Re:When speed matters by pjt33 · · Score: 0, Redundant
      In this case, the regular expression language supported is not quite as rich as the JDK implementation.
      Since Java's regular expressions are provably not regular, this seems fair.
    7. Re:When speed matters by The+Snowman · · Score: 5, Informative

      Here is the class I assume the parent is referencing: Formatter class.

      Essentially what happens is you don't have C-style varargs, the JRE silently creates an array for you when you pass the arguments. This doesn't waste "gobs of heap space" like the parent says, it uses the same amount as it would using the stack. Remember, these are objects, and Java never passes objects by value -- always by refence. So each argument wastes one machine word (usually 32 bits). Whoop de fucking doo. And, since it uses references, the only allocation/deallocation is the temporary array. And in 1.5 if not previous versions, this is very very fast. With a JIT compiler you'll hardly notice it. I do agree that the decision to make the class "final" is shitty, but honestly, I don't see how subclassing it would be a huge advantage. It would be like subclassing the java.lang.String class. Sure, you could add some nifty stuff, but it's not a big deal.

      As a person who earns his living off of J2EE, I know its strengths and weaknesses. I am not a fanboy, however. I am more than willing to give Java hell when it deserves it. I think string handling in general is not as well-organized or easy to use as it could be, but it is certainly capable. I rarely use sprintf() style string formatting anyway, even in C++. I find it much easier to use iostreams, which are typesafe and almost as fast as sprintf(). In Java I just use string concatenation, and the formatting classes when I need it. It isn't perfect, but it works well enough and sure isn't slow.

      --
      24 beers in a case, 24 hours in a day. Coincidence? I think not!
    8. Re:When speed matters by CompSciStud4U · · Score: 5, Informative

      I'll take the bait. When Java was introduced in 1995 almost all compiler research had been on static compilation, such as in C or Fortran. When the popularity of Java started to rise a lot of research effort, such as at IBM, was switched over to Just In Time (JIT) compilers. This was a pretty raw field at the time so the Java was horribly slow compared to C.

      Fast forward 11 years and the situation is quite different. I'm not sure about the Java compiler that comes distributed with the SDK, but a JIT compiler and virtual machine from another commerical sourse (I'll just stick with IBM) is now incredibly optimized compared to 1995. Large amounts of research have been done to catch up with the fact that statically compiled languages had a 30+ year headstart. And JIT compiled languages could one day be faster than a statically compiled one due to new dynamic compilation techniques that use system resource data, such as cache misses, collected by the VM to continuously reoptimize portions of the byte code.

      And even the overhead of garbage collection may soon be lowered dramatically due to research at the University of Massachusetts http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf

      I'm not going to say that Java is faster than C (or in this case Perl, a language specifically designed for parsing regular expressions), but the speed gap between the two is constantly closing to the point where it doesn't really matter that much anymore.

    9. Re:When speed matters by Anonymous Coward · · Score: 0

      the point of regex is not speed.

      Nor is regex or speed the point of Java.

      But regex enhances string handling no matter what the language.

      Whats with creating a language flame war when regex is just a feature?

    10. Re:When speed matters by nickos · · Score: 1

      Anyone know if there's a Java implementation of "structural regular expressions" as seen in the Sam editor on Plan 9?

    11. Re:When speed matters by Barrel+O'Lard · · Score: 1

      Jabba the Hut has feet?

      --
      Sig-O-Matic: License expired
    12. Re:When speed matters by vadim_t · · Score: 1

      Well, I'd love to see at least one application written in Java that is fast, but so far I haven't seen any.

      I've got a dual Athlon MP2000+, and Azureus still is horribly slow compared to everything else I run on it.

    13. Re:When speed matters by rjhubs · · Score: 1

      you mean a free software application most likely developed by some guy in his garage runs slow!?!?! heaven forbid!

    14. Re:When speed matters by Marc2k · · Score: 1

      Are you using the default Sun HotSpot JVM? If so, you're not following the criteria provided by the parent poster. Lots of people can write their own C compilers, but they're not going to be as time- or space-optimized as gcc or Intel's compilers.

      --
      --- What
    15. Re:When speed matters by brianary · · Score: 1

      I'm not sure I'd call it horribly slow, but yes, noticably slower than the non-Java stuff developed by some guy in his garage.

      Plus, have you ever written regular expressions to match Windows file paths in Java? "^C:\\\\Windows\\\\system32\\\\.*\\.dll$" Sheesh! I haven't seen so many backslashes since The Hills Have Eyes.

    16. Re:When speed matters by Anonymous Coward · · Score: 0

      Yes, but not nearly as slow as Eclipse.

    17. Re:When speed matters by Pieroxy · · Score: 1

      Azureus is slow? Azureus is Java? So Java is slow.

      Thanks for your insight.

  2. Regular Expression? by silicon-pyro · · Score: 2, Funny

    Me: I'll have a Grande Cafe au Lait please.

    Starbucks Employee: That'll be an hour's wages please.

    Me: Thanks! /me hands over cash, takes careful first sip.

    Thats when you get to see my java regular expression.

    Generally it will be me wincing in pain because I just burned my tongue. Sometimes, if it's cooled enough, you'll hear a quiet "MmmMmmm" in the style of Family Guy's Herbert.

    1. Re:Regular Expression? by eln · · Score: 1

      You can afford a Starbucks coffee with only an hour's wages? You must be rich!

    2. Re:Regular Expression? by apotheon · · Score: 1

      Maybe he meant "before taxes".

      --
      Unfetter your ideas. Copyfree your mind.
  3. Recursion? by Poromenos1 · · Score: 1

    I tried to do a bit of recursion in regexes once, like ((\d+)\.)+, but that didn't work. It's too bad, because I don't think there's another way to dynamically match data in regexes. Other than this, they've served me very well all these years.

    --
    Send email from the afterlife! Write your e-will at Dead Man's Switch.
    1. Re:Recursion? by SIGALRM · · Score: 5, Interesting
      Regular expressions aren't really meant for recursive solutions, but if we have recursive regular expressions, we can define our balanced-paren expression like this: first match an opening paren; then match a series of things that can be non-parens or an another balanced-paren group; then a closing paren. Turned into Perl code, this becomes:

      $paren = qr/(([^()]+|(??{ $paren }))*)/x;
      When this is run on some text like
      (lambda (x) (append x '(hacker)))
      the following happens: we see our opening paren, so all is well. Then we see some things which are not parens (lambda ) and all is still well. Now we see (, which definitely is a paren. Our first alternative fails, we try the second alternative. Now it's finally time to interpolate what's inside the double-secret operator, which just happens to be $paren. And what does $paren tell us to match? First, an open paren - ooh, we seem to have one of those handy. Then some things which are not parens, such as x, and then we can finish this part of the match by matching a close paren. This polishes off the sub-expression, so we can go back to looking for more things that aren't parens, and so on.
      --
      Sigs cause cancer.
    2. Re:Recursion? by addaon · · Score: 3, Informative

      Of course, things like those presented are not regular expressions, no matter how loose perl might be with the term.

      --

      I've had this sig for three days.
    3. Re:Recursion? by kfg · · Score: 2, Funny

      Be kind to your parens, though they don't deserve it. . .

      KFG

    4. Re:Recursion? by Reverend528 · · Score: 2, Informative
      I tried to do a bit of recursion in regexes once, like ((\d+)\.)+, but that didn't work.

      By definition, Regular Expressions are limited to regular languages, thus can be expressed by Finite Automata. This prohibits them from supporting recursion, but generally makes them easy to optimize.

    5. Re:Recursion? by Anonymous Coward · · Score: 4, Informative

      Regular expressions are only for regular languages. They are the simplest type of language and use a simple state machine (automaton) to do their language recognition.
      Context free languages may have recursion. They use a state machine (pushdown automaton) and a stack to recognize thier languages.
      http://en.wikipedia.org/wiki/Context-free_language
      This also contains links to other families of language and info on the automaton that can recognize them.
      Welcome to Theory of Computing!

    6. Re:Recursion? by Anonymous Coward · · Score: 0

      Though to be fair, the most common implementations of RegExs can express more than a proper regular language. IIRC, Perl backreferences would not be allowed in a strict implementation of regular expressions.

    7. Re:Recursion? by Almahtar · · Score: 1

      Isn't that the purpose of a Grammar?

    8. Re:Recursion? by Anonymous Coward · · Score: 0
      I may be misunderstanding your comment about recursion, but the regexp you gave
      ((\d+)\.)+
      matches exactly what one would expect - more than one digit followed by a period at least once.

      You can verify that Java regexps support this at http://www.fileformat.info/tool/regex.htm and similarly so does Perl (unsuprisingly since Java regexps extremely close to Perl 5 in functionality and feature support).
    9. Re:Recursion? by Abcd1234 · · Score: 1

      And people wonder what a computing science degree is useful in the real world...

    10. Re:Recursion? by Anonymous Coward · · Score: 0

      Considering all regular languages are context free http://en.wikipedia.org/wiki/Context-free_grammar, I fail to see the connection between your two statements.

      You should also realize that by definition, a context-free grammar/language ensures that it is possible to write a single regular expression to syntactically validate what the grammar can validate.

      Proof:
      Context-free means that any production can be replaced by its definition (a sequence of non-terminals and terminals) and still specify the same grammar (as opposed to something like requiring n a's followed by n b's followed by n c's which requires semantic actions which are lost when a production is replaced by its definition).

      All non-terminals can be expanded to their definition while still specifying the same grammar, thus any non-terminal can be expressed as a sequence of nonterminals.

      All terminals are reqular expression, thus any non-terminal can be specified as a sequence of regular expressions (assuming the expressitivity of regular expressions is powerful enough.

      Think of a simple example (using pseudo-lex & yacc syntax) to validate simple arbitrary length addition/subtraction:

      \d+ { return NUMBER_TOK; }
      [+] { return ADD_TOK; }
      [-] { return MINUS_TOK; }

      YACC:

      Expression: Expression Operator Expression
              | NUMBER_TOK

      Operator: ADD_TOK | MINUS_TOK

      Now doing the expansion:

      Expression: NUMBER_TOK | NUMBER_TOK (ADD_TOK|MINUS_TOK) NUMBER_TOK

      Once again:

      Expression: \d+|\d+([+]|[-])|\d+

      Thus the regular expression \d+|\d+([+]|[-])|\d+ is the exact same as the provided CFG.

    11. Re:Recursion? by Abcd1234 · · Score: 1

      Considering all regular languages are context free http://en.wikipedia.org/wiki/Context-free_grammar, I fail to see the connection between your two statements.

      Perhaps you should have read that page more closely. Or maybe taken a class in theory of computation.

      Regular grammars are *not* the same thing as regular *languages*, which are what is under discussion here.

      First off, it is true that regular *grammars* can express context-free *languages*. Of course, this also means that they can express regular languages, as regular languages are a proper subset of context-free languages. However, one only needs a subset of regular grammars (only left-regular or right-regular rules are needed) in order to express a regular expression.

      However, in order to express a context-free language, you need both left-regular *and* right-regular rules. As such, it is most certainly *not* true that a regular expression can be used to express a context-free language.

      To swipe an example from Wikipedia (though we covered this in my theory of computation class, as well), try converting this to a proper regular expression:

      A -> xAy

      This grammar generates all strings with n x's followed by n y's.

      Good luck!

    12. Re:Recursion? by glwtta · · Score: 0

      Regular expressions are only for regular languages.

      Not true - Perl is an awesome language.

      --
      sic transit gloria mundi
    13. Re:Recursion? by Poromenos1 · · Score: 1

      I don't know if I made a mistake I can't see, but that RE should match 312.132.123.123. if I was king of the world, only, for the reason the other posters have so eloquently explained, it doesn't. It's not \d+\.+ (is that what you meant?).

      --
      Send email from the afterlife! Write your e-will at Dead Man's Switch.
    14. Re:Recursion? by Anonymous Coward · · Score: 0

      Regular expressions are only for regular languages.

      Wrong (in practice, not in theory). Since just about all modern regex implementations allow for backreferencing (eg. m/(.*) foo \1/), non-regular languages can be recognized. This is really old news.
      No, I don't have a CS degree. I just know my formal language theory.

    15. Re:Recursion? by TwentyLeaguesUnderLa · · Score: 1

      Um, I'm not sure what you were trying, but I typed in that exact regular expression and it worked exactly how you said it should work, it matched 312.123.321.123. just fine. When you put in ^ and $ at the beginning and end of the regexp, respectively, is when it starts refusing things like "123.1"

    16. Re:Recursion? by timeOday · · Score: 1

      No, they're better. Being able to count is a good thing.

    17. Re:Recursion? by SauroNlord · · Score: 0

      I wouldn't trust any non-computer-science / mathematician / engineer to build my space crafts, nuclear reactors, x-rays..etc... You need to be able to prove that your algorithm meets the run time, and space complexity required to be a 100% flawless algorithm.

    18. Re:Recursion? by Anonymous Coward · · Score: 0

      Ok, thanks for correcting me. I'm taking a compilers course right now but I haven't been paying attention to the theory part of the course since midterms (too much coursework and not enough time and I don't believe that it goes in depth about context-free languages and regular languages). I guess time to study before my final next week.

    19. Re:Recursion? by fishbowl · · Score: 1


      >>Regular expressions are only for regular languages.

      >Not true - Perl is an awesome language.

      No disagreement, but many perl "regular" expressions are not regular.

              "'[R]egular expressions' [...] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes" (or "regexen", when I'm in an Anglo-Saxon mood)."

      - Larry Wall

      --
      -fb Everything not expressly forbidden is now mandatory.
  4. Wrong way round by Tim+Ward · · Score: 1, Interesting

    Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling.

    Er, no. It is only for trivial string handling that the regex approach is useful.

    For non-trivial string handling (particularly if you feel like giving the authors of erroneous strings helpful error messages!!) I'll write a proper lexical analyser and a proper parser every time.

    1. Re:Wrong way round by SIGALRM · · Score: 1
      For non-trivial string handling (particularly if you feel like giving the authors of erroneous strings helpful error messages!!) I'll write a proper lexical analyser and a proper parser every time.
      You can outfit a regexp functor with error message handling, or exceptions, and if your project is embedded (certainly not trivial) or performance-dependent, I'm not sure that I'd write a lex/parser "every time". I guess it boils down to this: "trivial string handling" is semantic nonsense.
      --
      Sigs cause cancer.
    2. Re:Wrong way round by smallfries · · Score: 2, Informative

      I'm not sure if you got the parents point (apologies if you did). By trivial string handling he's talking about recursive structures, and the erroneous strings he's mentioning are probably programs as input to a compiler. The 'non-trivial' strings are the class of strings that you would need a full grammar in order to parse, rather than a reg-exp. But yeah, not every time - horses for courses and all that.

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    3. Re:Wrong way round by smittyoneeach · · Score: 3, Insightful

      I would assert that if your input data are sufficiently irregular that you require a parser/lexical analyzer, you may have exceeded the bounds of "regular" expressions.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    4. Re:Wrong way round by fm6 · · Score: 1

      "Trivial" is relative. The sort of string processing you do with String.indexOf and other simple matching functions is trivial compared to what you do with regular expressions. Besides, once you've started using lexers and parsers, you've graduated from "string handling" to artificial linguistics.

    5. Re:Wrong way round by Abcd1234 · · Score: 1

      Absolutely. Why this should be surprising, I don't know. The very nature of DFAs is that they don't support counting. Thus, the minute you find yourself dealing with recursion (ie, tags, brackets, etc), regular expressions break down.

      However, if you're just doing vanilla text parsing with data that's not overly complex, regexs are an absolute godsend, and are far easier to use than a full lexer/parser package.

    6. Re:Wrong way round by iggymanz · · Score: 1

      not to mention, most lexers are going to use regular expressions to identify tokens for the higher-order parsing operations. So RE are a good first step for anyone getting into lexer/parser wares anyway.

  5. Library Support is also nice by Anonymous Coward · · Score: 0

    Of course, if you're using a language that doesn't have built-in regular expressions, you might
    still have good regular expression libraries available to you. Boost::Regex is a great choice
    for C++, for instance.

  6. Re:Java sucks by Anonymous Coward · · Score: 0

    You sir, have obviously not programmed in C++...if at all...

  7. Not many companies allow 3rd party libraries? by LadyLucky · · Score: 2, Funny

    Are you serious? What kind of company would do that? It's madness!

    --
    dominionrd.blogspot.com - Restaurants on
    1. Re:Not many companies allow 3rd party libraries? by Canthros · · Score: 2, Informative

      It does, however, simplify the legal mess involved.

      --
      Canthros
    2. Re:Not many companies allow 3rd party libraries? by Anonymous Coward · · Score: 0

      I have to agree. 3rd party libraries are one of Java's greatest strengths. You can get libraries under BSD or LGPL licences for just about anything. In the pre-1.4 days, it was the only way to do things.

    3. Re:Not many companies allow 3rd party libraries? by Fnkmaster · · Score: 1

      Shhh, please don't tell those companies why they can't compete with my company.

  8. My main complaint by kbielefe · · Score: 4, Informative

    My main complaint about java regexps is that all the backslashes have to be quoted with a backslash, making them completely unreadable compared to a language that supports regular expressions natively, like perl (no, a standard library is not technically native support). "\d" becomes "\\d" and so forth. Does anyone know a simple way around this? We just started using java regexp's at work, so the extra backslashes don't bother most people, but they are extremely annoying to those of us with a lot of perl experience.

    P.S. How many slashdotters thought they'd be rolling in their graves by the time they heard an example of where perl is more readable than java?

    --
    This space intentionally left blank.
    1. Re:My main complaint by Kesch · · Score: 4, Funny
      P.S. How many slashdotters thought they'd be rolling in their graves by the time they heard an example of where perl is more readable than java?


      I'm still amazed to find 'readable' and 'regular expressions' in the same context.
      --
      If this signature is witty enough, maybe somebody will like me.
    2. Re:My main complaint by Pxtl · · Score: 3, Interesting

      Well, does Java have a facility similar to C#'s @strings? In C#, a string prefixed with @ is literal, much like Python's """ strings - no escape characters. Very handy for regular expressions.

      In general, C#'s regular expression package is very nice, except for the whole "groups" and "captures" thing.

    3. Re:My main complaint by happyfrogcow · · Score: 2, Insightful

      two slashes "\\" is nothing. the real PITA begins when you need to do "\\\\"

      effing java.

    4. Re:My main complaint by bigbadbuccidaddy · · Score: 1

      I haven't tried this, but I suppose you could stick the regex's in a .properties file.

    5. Re:My main complaint by Anonymous Coward · · Score: 0

      Python solves this problem by providing "raw strings". Does Java have something equivalent? It should...

    6. Re:My main complaint by masklinn · · Score: 3, Informative

      Actually, Python's literal strings are NOT """

      .

      """ is for multiline strings (' and " only accept one-line strings or antislash linebreak escapers), literal python strings are rawstrings and created by prefixing any string (be it ', " or """) by the "r" character (as in r"this is a raw strings" "but this is not).

      --
      "The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
    7. Re:My main complaint by _xeno_ · · Score: 4, Informative

      Backslashes in a .properties file have to be escaped with (guess what?) a backslash.

      So it, unfortunately, solves nothing.

      If you don't mind XML, you can use the XML properties format, but you're still adding a lot of extra code just so you don't have to deal with escape characters. There's, unfortunately, no good solution in Java. (There are no raw strings in Java.)

      --
      You are in a maze of twisty little relative jumps, all alike.
    8. Re:My main complaint by s2jcpete · · Score: 1

      Sure, load them from a properties file, db, xml or whatever you like. You only need the extra slashes to get around the compiler. You dont need them if you load them from someplace at runtime.

    9. Re:My main complaint by value_added · · Score: 1

      My main complaint about java regexps is that all the backslashes have to be quoted with a backslash, making them completely unreadable compared to a language that supports regular expressions natively, like perl ...

      You're asking about Java regexps, but similar problems extend to other languages where the the syntax, features and usage are different enough so that anyone with a basis in Perl is similarly annoyed, if not dumbfounded by the awkwardness and limitations. Any systems administrator will tell you sed and awk are still alive and well and very much in use, as other shell tools, even where Perl is available. Then there's .NET, C#, Python and PCRE, which you mentioned. I use Vim, for example, all day long and still trip up on the differences and find myself using workarounds where I can use Perl directly.

    10. Re:My main complaint by Deef · · Score: 2, Interesting

      I sometimes do this:

      Pattern foo = Pattern.compile("c:/foo/bar".replace('/','\\'));

      or just put the above in a library method that does it automatically:

      Pattern foo = PatternUtils.compile("c:/foo/bar");

      which is handy if other replacements are made by that library method also:

      Pattern foo = PatternUtils.compile("({number}):{number}:({identi fier})-{number}");

    11. Re:My main complaint by Anonymous Coward · · Score: 0

      There's, unfortunately, no good solution in Java.

      And so say all of us.

    12. Re:My main complaint by sinewalker · · Score: 1
      Nothing in actual Java or the Sun libraries fixes this gripe. But, have a look at the Jakarta Commons project's org.apache.commons.lang.StringEscapeUtils class, particularly the ScringEscapeUtils.escapeJava() methods.

      It may be helpful, I haven't tried it. Would be particularly interesting to see if it'll correctly convert, say "\t" into "\\t" instead of a TAB. If it does, then you could use it to wrap the strings for the regexp pattern methods.

      --
      “Our opponent is an alien starship packed with nuclear bombs. We have a protractor.” — Neal Stepnenso
    13. Re:My main complaint by owlstead · · Score: 1

      Put them in comments and use a tool to generate and test regular expressions. For the Eclipse IDE, there is QuickREx, it includes a paste function that automatically escapes the escapes.

      http://eclipse-plugins.2y.net/eclipse/rating_detai ls_plugin.jsp?plugin_id=964

      A good idea is to include the regular expressions in a comment as well. Most of the time creating and testing a regular expression takes most of the time anyway. If you really hate the escaped regular expressions, just put them in a resource (e.g. a .ini file containing Java properties) instead of source code.

    14. Re:My main complaint by sinewalker · · Score: 1
      Okay, I've tested: should be good to use. Try this code out (sorry about the indenting, I can't figure out how to get slashdot to do
       style HTML...  also, the string should be "\tXXX" in both cases, I don't know why slashdot has put that space in it in the call to escapeJava()...):

      import org.apache.commons.lang.*;

      public class EscapeTest{
      public static void main(String [] a){
      System.out.println(StringEscapeUtils.escapeJava("\ tXXX"));
      System.out.println("\tXXX");
      }
      }
      </blockquote>
      --
      “Our opponent is an alien starship packed with nuclear bombs. We have a protractor.” — Neal Stepnenso
    15. Re:My main complaint by Chris+Pimlott · · Score: 2, Informative
      If you use Jakarta Commons-Configuration, there's basically no extra code to use XML configuration files.

      For example, the regex defined here:
      <foo>
          <bar>
              <regex>...</regex>
          </bar>
      </foo>
      becomes simply "foo.bar.regex", just like a standard properties file.
    16. Re:My main complaint by cnettel · · Score: 1
      Overkill, noun:

      1. Beating something that's already dead.

      2. Using an Apache-licensed software package and creating an external file dependency to solve the fact that your language doesn't support raw strings.

    17. Re:My main complaint by sco08y · · Score: 1

      Does anyone know a simple way around this?

      I've done this with C on Windows when I had one library that borked whenever you tried to use / in pathnames.

      Pick unicode characters for your special strings, e.g. . Next, map some handy keystroke to that in your editor. Then write a script to replace that with a standard Java string. Since it's not standard java, give it a special extension and add the script and extension to your makefile or ant or whatever you use.

    18. Re:My main complaint by romanr · · Score: 1

      Yes, but perl compensates for this readability by making the rest of your program look like line noise :)

    19. Re:My main complaint by Chris+Pimlott · · Score: 1

      Yes, Java doesn't support raw strings. Oops. So using a liberally-licensed library to get around it is overkill?

  9. Perl by SpaghettiPattern · · Score: 1

    The missing Regular Expressions is what kept me off Java and on Perl for a looong while. I started using ORO and since their introduction into Java itself I almost completely switched over. I relly do hope Perl 6 will be released and lives up to its expectations.

    Having said that I really don't see why you have to devote a complete book on regex. A small tutorial does just fine.

    --

    I hadn't the slightest objection to his spending his time planning massacres for the bourgeoisie... (P.G. Wodehouse)
    1. Re:Perl by tcopeland · · Score: 1

      > Having said that I really don't see why you
      > have to devote a complete book on regex.
      > A small tutorial does just fine

      I think it depends on how deep you want to go into regular expressions. Mastering Regular Expressions by Jeffrey Friedl is almost 500 pages but is an excellent treatment of the subject - by the time you're done reading it you'll feel comfy even with such madness as negative lookbehind.

  10. Microsoft and regex by truthsearch · · Score: 3, Interesting

    Slightly off-topic, but...

    Back when my only experience was development on Windows I was very frustrated with the lack of good string handling in Microsoft languages (VB, T-SQL). If you didn't find a third-party library you had to write a lot of expensive code to do fancy string searches. Try writing recursion in VB6 without bringing your computer to a screeching halt.

    Then when I switched to linux and open source I was shocked to learn that something as useful as regex had already been around for many years. Most of the Windows developers I knew never even heard of it. It was tricky to learn but has paid off many times over in utility.

    Every developer is better of for knowing it. Even if they never use regex the thought process in understanding it is quite interesting and educational.

    1. Re:Microsoft and regex by keltex · · Score: 1

      Microsoft's DotNet (including VB.NET) has had native regular expressions since 1.0 (circa 2002). Also dotnet has the @ string literal prefix (such as @"\d{1,4}") that eliminates the double-backslash issues.

    2. Re:Microsoft and regex by Anonymous Coward · · Score: 0

      Then when I switched to linux and open source I was shocked to learn that something as useful as regex had already been around for many years. Most of the Windows developers I knew never even heard of it. It was tricky to learn but has paid off many times over in utility.

      Many years? More like many decades.

    3. Re:Microsoft and regex by the-matt-mobile · · Score: 1

      VB has had regular expressions available to it for about 8 years now. In VBScript, they're built-in, and in VB you just make a COM reference to "Microsoft VBScript Regular Expressions 5.5". See this article for details - http://support.microsoft.com/default.aspx?scid=kb; en-us;818802

      And don't let the date of the article fool you - though it was written in 2006, you've been able to use regexes in VB since the late 90's. That being said, I've always found Perl's implementation to be faster and easier to use.

  11. Yeah... yeah... by Anonymous Coward · · Score: 0

    C is an incredibly powerful addition to most programmer's personal toolkit of techniques.

    ..oh ..we are talking about CS students who discover the joys of the likes of Java on their long path from..

    10 print "hello world"
    20 goto 10


    ...to...

     
    struct filter {
        int (*open) (void *);
        int (*close) (void *); ...
    };

    ...???

    Nevermind then... come back in 10 years... (if you're still a programmer by then ;-)

    1. Re:Yeah... yeah... by Anonymous Coward · · Score: 0
      struct filter {
              int (*open) (void *);
              int (*close) (void *); ...
      };


      I hope you are kidding...
  12. What? by avalys · · Score: 2, Interesting

    Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries
    Who's boneheaded enough to do this? I want to know so I can avoid buying anything from them, because their products are going to be overpriced by at least 50% due to the wasted effort.

    I can understand restricting third-party libraries to those of a certain license, like BSD or LGPL, but a blanket ban without any exceptions for something as essential as regular expressions? That's just stupid.

    One of the biggest advantages of Java is the enormous number of high-quality third-party libraries available.

    Is this just something the submitter dreamed up to fill space, or do companies actually do this?

    --
    This space intentionally left blank.
    1. Re:What? by masklinn · · Score: 1

      One of the biggest advantages of Java is the enormous number of high-quality third-party libraries available...

      ... that make up for the lack of high-quality useful first-party packages.

      --
      "The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
    2. Re:What? by FooAtWFU · · Score: 1
      I'm working at a three-letter acronym this summer for an internship, doing development on another three-letter acronym. We use a third-party open-source (GPL- hey, that's a TLA too! or three!) somethingorother - not strictly a library, not really Java, but, well, kinda similar... anyway. We can't ship this open source somethingorother with the product, or our lawyers will explode (no, that's not a good thing; when lawyers explode, they get everywhere). Apparently, we can't even mirror the somethingorother. In our case, we've decided that the customer will have to obtain, set up and install the somethingorother themselves, though this is nontrivial and a little tedious for our particular scenario.

      Last assignment I did use Java. With regular expressions! Ended up targetting an embedded J2ME device with only Java 1.3. Oops. Had to yank them out.

      (And no, don't simply tell us to get better lawyers. If these ones aren't good enough, well, you're not going to just be able to go off and find better ones.)

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    3. Re:What? by VGR · · Score: 1
      Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries
      Who's boneheaded enough to do this? I want to know so I can avoid buying anything from them, because their products are going to be overpriced by at least 50% due to the wasted effort.

      It's DLL Hell all over again. Every time you use a third-party library, the user has to make sure it's installed. And in the classpath, unless they installed it as root/admin and placed it in the JRE's extensions directory.

      If your program is going to be in a shared JVM with many other programs (which is common in some government systems), all of whom are contributing their own "favorite" third-party jars, it's very easy to end up with multiple versions of the same third-party library. Now, ideally, third-party vendors would gracefully evolve their own libraries so there are no runtime conflicts across versions, but... well, even my mentioning that probably has a lot of readers snickering derisively.

      There are ways to ameliorate this, most notably Java Web Start and specifying the required extensions (and versions) in a jar's manifest, but few software authors seem to want to bother with those.

      I realize that the other extreme is Not Invented Here Syndrome, but I often encounter people who are far too quick to jump on a third-party solution when there is an adequate but maybe not as whiz-bang-kewl solution built into Java. Premature optimization and all that.

      What it comes down to is that any external dependency is a burden and makes installation significantly more of a chore. Dependencies are a big deal and developers need to consider the penalties they bring, not just the benefits.

      A final consideration is that Sun APIs undergo considerably more review than most third-party APIs. Read the progress of any JSR at jcp.org to see what I mean. API usability has a direct effect on productivity.

      --
      The Internet is full. Go away.
    4. Re:What? by kalirion · · Score: 2, Funny

      Pssst, Mr. Secrecy, your blog is showing.

    5. Re:What? by heinousjay · · Score: 1

      I hope you're not a Java programmer. If you are, I recommend you read up on java.lang.ClassLoader and start realizing just how it can solve the problem you're worried about, completely.

      --
      Slashdot - where whining about luck is the new way to make the world you want.
    6. Re:What? by chorltonian · · Score: 1
      It's DLL Hell [wikipedia.org] all over again. Every time you use a third-party library, the user has to make sure it's installed. And in the classpath, unless they installed it as root/admin and placed it in the JRE's extensions directory. If your program is going to be in a shared JVM with many other programs (which is common in some government systems), all of whom are contributing their own "favorite" third-party jars, it's very easy to end up with multiple versions of the same third-party library. Now, ideally, third-party vendors would gracefully evolve their own libraries so there are no runtime conflicts across versions, but... well, even my mentioning that probably has a lot of readers snickering derisively.
      Actually, shared JVM environments, i.e. J2EE application servers, have multiple class loaders so that component processes (EJBs or Webapps) can satisfy their own dependencies at runtime, via the manifest. Its a non-argument to say that "few software authors want to bother with those", the fact is the facility is there and its not difficult to use. Having multiple versions of the same library installed is no more significant than the amount of additional disk space required. This does not correspond in any way with "DLL hell", the situation in Windows where multiple native-code applications have differing version requirements of specific shared libraries (DLLs) but are forced to use the same version by the operating environment.
    7. Re:What? by VGR · · Score: 1

      A new ClassLoader was the first solution considered, obviously.

      Guess what? Any ClassLoader is required to query its parent before it attempts to resolve a class. And no, you can't get around it (for security reasons which become obvious if you think about it).

      If a particular version is already in the class path at JVM startup, you can't override it.

      --
      The Internet is full. Go away.
    8. Re:What? by VGR · · Score: 1

      I agree completely. J2EE provides an excellent solution to this.

      But the government systems to which I referred are not J2EE environments (yet).

      I also agree that J2EE and manifests are not difficult to use, but that doesn't seem to be the prevailing opinion among most of the other developers I meet.

      --
      The Internet is full. Go away.
    9. Re:What? by nodrogluap · · Score: 1

      java -Xbootclasspath/p: :>

      Prepends the given libraries in front of bootstrap class path

    10. Re:What? by VGR · · Score: 1

      When I stated it is a "shared JVM" I was implying that it is a JVM whose command-line invocation parameters are not under our control.

      --
      The Internet is full. Go away.
  13. Wha-wha-what? by mrtrumbe · · Score: 1
    ...and not many companies allow the use of 3rd party libraries.

    Who are these companies and what can possibly be their justification for such a blanket policy. I can understand for some ultra-high security/uptime systems with incredibly strict standards and processes who would need to put third party code through an extensive and expensive audit. But for the rest of us? No jUnit? log4j? Is Boost allowed? Good lord, I can't imagine programming in such a world.

    I hope I never work for one of these firms.

    Taft

    1. Re:Wha-wha-what? by FooAtWFU · · Score: 1

      If you're developing software for someone else and you use a third-party library, you need to a) ship it with your product or b) require the user get it separately. The latter is a hassle (and hassling customers is bad). The former will make your lawyers explode.

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    2. Re:Wha-wha-what? by mrtrumbe · · Score: 1
      Licensing is a valid concern, but one that most third party libraries handle quite neatly. Also, the vast majority of third party libraries I personally use are open source (for instance, junit, log4j, boost, anything from apache, etc.). I wonder how prevelant this is...

      Taft

    3. Re:Wha-wha-what? by JoshDM · · Score: 2, Insightful

      ...and not many companies allow the use of 3rd party libraries.
      Who are these companies and what can possibly be their justification for such a blanket policy.

      Actually there are a number of firms that contain multitudes of red tape that disable their employees from getting anything done without the barest of tools. I have witnessed major separations of "church and state" with these larger companies. This includes the company that did not allow the developers access to the servers, resulting in a system administrator who refused to allow a Java web server more powerful than JServ because he didn't know how to properly install Apache/Tomcat/JBoss/Whatever on Linux.

      More recently, it's a concern with larger companies that want "someone to blame" and "someone to call for support." These places use "Websphere" instead of "Eclipse and Tomcat" or "Oracle JDeveloper" instead of "Borland JBuilder". Wherever there is a "free" version of something that is supported by a community effort, there is a "pay" edition of that same item (usually 1-2 versions behind the curve) hosted by a company that sells support and takes the blame.

  14. Fear! by weasello · · Score: 1

    I beleive fear is the primary culprit here. Many places I've worked for/with only allow internally developed library use... And I'm sure half of it is swiped, stolen, or 'inspired' by popular, free, open source, 3rd party libraries.

    1. Re:Fear! by Anonymous Coward · · Score: 0

      Get over yourself. Coding and development policies have been around a much longer time than the emergence of OSS concerns. You OSS fags don't have any real basis in reality.

    2. Re:Fear! by weasello · · Score: 1

      I meant to say "open, free, OR 3rd party" as in, all-inclusive of non-company-made libraries. Guess I should have hit "preview."

    3. Re:Fear! by Dark_MadMax666 · · Score: 1

      "Half"??? Jeez man you probably work with some uber coders. Code and places I seen are more like 90% of wrappers around stolen code and 3d party libraries. And of course the thin veener of "proprietary buisness logic" and UI -usually the most horrible and ugly parts of applications.

  15. Re:Java sucks by masklinn · · Score: 0, Flamebait

    Your introduction to OOL was in Java? Boy, must that have sucked, java's probably the most static and limiting OO language out there...

    --
    "The way we can tell it's C# instead of Haskell is because it's nine lines instead of two." -- wadler
  16. Does not compute by Anonymous Coward · · Score: 0

    Java...Regular expressions? Error!!!

    Regular expressions belong in a real programming language like Perl where they
    seamlessly blend in with the arcane chaos that looks like when the Dyslexic Liberation Front
    blew up the alphabet spaghetti factory.

    Why would Java "programmers" sully their nice looking suburban code with ascii vomit?

    It's just not natural.

  17. I've worked for two by Anonymous Coward · · Score: 0
    Is this just something the submitter dreamed up to fill space, or do companies actually do this?

    I'm at one right now (hence why I'm posting as an AC), and my previous employer was like that as well (except we were allowed minimal use of Struts on one project). It's typical "not invented here" reasoning, usually from "software architects" convinced their own home-grown platform/library/framework is better than anything else out there.

    In my experience, it leads to systems with too long of a ramp-up time for new hires to start working on and delays to tweak the library for every new thing the developers are trying to accomplish. But it doesn't matter that a simple project took months to accomplish, as long as there's a perfect (in their eyes) foundation they can sneak out the back door when they finally leave.

  18. "not many companies allow ... 3rd party libraries" by jbellis · · Score: 1

    Somebody hasn't worked for "many companies." _Every_ company I've worked for allowed 3rd party libraries. (Sure, there are processes to make sure you don't do something stupid like ship a GPL library with a closed-source product, but that's just common sense.)

  19. regex coach by mgkimsal2 · · Score: 3, Informative

    I spoke about the "regex coach" tool from http://weitz.de/regex-coach/ on my podcast (shameless plug!) http://webdevradio.com/ - it's a great tool for helping visually walk through the regex creation process, especially for complex needs.

    1. Re:regex coach by sickofthisshit · · Score: 2, Informative

      This tool, by the way, was written in Common Lisp, using Edi's own library

      CL-PPCRE - portable Perl-compatible regular expressions for Common Lisp

      A library which typically outperforms Perl's own regex engine.

  20. Save $14.80 by buying the book here! by Anonymous Coward · · Score: 0, Informative

    Save yourself $14.80 by buying the book here: Java Regular Expressions. And if you use the "secret" A9.com discount, you can save an extra 1.57%! That's a total savings of $15.20, or 38.58%!

  21. RegEx not so maintainable... by Heembo · · Score: 2, Interesting

    One of the reasons we as programmers write code is to take a very complex idea, like a software application, and write something that a human engineer can understand. The KISS principle especially applies to coders.

    As I get older, my code has gotten more and more straightforward, cause I consider to maintainance cycle of code to be more than 95% of the puzzle. And these days, I have more than one security analyst who is not a senior software engineer poking around me code.

    RegEx's are not-so-readable and not-very-maintainable programming abstracts that should be avoided whenever possible. I prefer using string manipulation abstraction classes (such as my own version of StringTokenizer). They are not as fast and furious as other methods like lexical analysis, and the code is more bloated, but the code is Straight Forward And Easy To Read. There is a power is code of this nature, and my clients have thanked me more than once to not focusing on writing "cool code" but for writing "clean and simple" code. I just tried to paste in a few ugly regex samples, but slashdot blocked me calling them "junk characters" I agree! :)

    For example, take XPATH, this is a clean and simple way to address XML objects. Sure, there is an additional level of abstraction, but you can look at an XPATH query, even from a layman's point of view, and have a clear understanding as to what it is doing.

    --
    Horns are really just a broken halo.
    1. Re:RegEx not so maintainable... by Abcd1234 · · Score: 1

      RegEx's are not-so-readable and not-very-maintainable programming abstracts that should be avoided whenever possible.

      If a regex isn't quickly comprehensible to you, either a) the regex is badly written, or b) you need more practice with regex's.

      Seriously, it's very rare for me to come across a regex I'm unable to comprehend. And for more complex ones, Perl certainly allows you to intersperse the regex with comments (I don't recall if Java allows this, though it does support a significant subset of Perl regex syntax).

    2. Re:RegEx not so maintainable... by Anonymous Coward · · Score: 0

      I haven't done Java regexes, but in Perl there are ways to comment them (using the /x operator). Obviously an uncommented regex can be tough to decipher, but a commented regex should be fine.

    3. Re:RegEx not so maintainable... by mongus · · Score: 2, Insightful

      I used to think the same thing. Back in '99 a guy I was working with would produce a regex and I had no idea what that strange looking thing did. I got a book on Perl and spent quite a bit of time wrapping my head around regular expressions. That's probably the only thing I retained from Perl because I really don't like the language. I started using the ORO package in Java to do regular expressions and switched to the standard library when it was introduced in 1.4. Java's syntax is nearly identical to Perl's.

      If you'll take the time to understand them you'll never go back to parsing strings yourself. They can make your code MUCH easier to maintain. There is a steep learning curve but they're well worth learning. Your code will be much more readable with a regular expression instead of lines and lines of code. Debugging is much easier too.

      Maybe you should give the reviewed book a shot. I can't comment on it as I've never read it but I do highly recommend learning regular expressions.

    4. Re:RegEx not so maintainable... by UtucXul · · Score: 1
      RegEx's are not-so-readable and not-very-maintainable programming abstracts that should be avoided whenever possible.
      I've found my regex stuff has changed as I've done more too, but I don't think they are bad things. What I learned is by adding a comment or two and maybe splitting up a monster regex into a 2 line thing (which I know hurts my perl credibility) can make complex a complex regex into something totallly readable later.

      So don't blame the very powerfull tool. Blame the misuse of it.

  22. Re:Java sucks by vingilot · · Score: 2, Informative

    Come on:
    "Some String".replaceAll("Java", "Bloated piece of shit")

    And FYI PatternSyntaxException is a runtime exception so no need to catch it and rethrow as a RuntimeException.

    so to write it your way:

    String theTruth(String s){
            return Pattern.compile("Java").matcher().replaceAll(s);
    }

  23. Re:Java sucks by computational+super · · Score: 3, Funny

    Oh, I think you're hardly being fair to Java - your example was artificially bloated. I can easily do this in one line in Java:

    Runtime.getRuntime( ).exec( "perl -e 'sub theTruth($) { shift; $_ =~ s/Java/Not so bad now/; return $_; }" );

    I think you owe Java an apology.

    --
    Proud neuron in the Slashdot hivemind since 2002.
  24. Re:Java sucks by Derkec · · Score: 4, Informative
    You don't have to throw anything there, you should just have one clear return in your method. You also probably should't be compiling your pattern every time.

    Try:
    private static final Pattern pattern = null;
     
    static {
      try { pattern = Pattern.compile("Java"); } catch (PatternSytaxException pse) {;}
    }
     
    public String theTruth(String string) {
      Matcher matcher = pattern.matcher(string);
      return matcher.replaceAll("something I don't know jack shit about");
    }
    Still not as compact but at least there aren't any tildes in there. I wonder if there would be a more compact way to do it. This seems terribly heavy weight for such a simple example. Oh, wait! There is!
    public String theTruth(String string) {
      return string.replaceAll("Java", "this is really easy");
    }
    So now we compare:
    public String theTruth(String s) { return s.replaceAll("Java", "this is easy") };
    To:
    sub theTruth($) { shift; $_ =~ s/Java/Bloated piece of shit/; return $_; }
    So the Java code ends up being a handful of characters longer and much easier to read. I'm not saying that Java is the ideal Regex language, but your example sucked.
  25. Java for String Processing by Kazrael · · Score: 0, Offtopic

    Really, Java is not meant to be a string processing utility. It is honestly too slow with too much overhead for this type of functionality. Regex expressions were meant to be used in the occassional light occurrence of string processing in Java. If you are really needing some string processing, like over a large dataset, stick with something like Python which is based on C++. It is fast with some very cool tools, such as regex, dictionary use, etc. Even if you need a light GUI, you could always interlace some Python with TK.

    --
    Development notes at http://devscribbles.blogspot.com
  26. Re:Java sucks by Anonymous Coward · · Score: 0

    Java is one of the closest languages to the original object-oriented language, Simula. It's kinda like a crocodile... not so much elegent or refined, but still so successful for so long that you have to wonder what it is doing right to last so long.

    Fans of Ruby/Python/Smalltalk/Lisp/etc will do things like add a calculate-average method to Array and claim that is object-oriented, but it is not really. What if the array contains for instance regular expressions, wtf does an 'average' mean for that? It's a useful practice, but in theory it is nonsense. Teaching OO in a language that allows and encourages that sort of abuse just makes it harder to understand why we use objects in the first place.

  27. java.util.regex speed sucks by Soong · · Score: 1

    I'm glad that it's there, and I suppose it was useful during my prototype phase, but a little profiling revealed that my app was spending half its time parsing input. Dumping out the input to String and sometimes char[] and doing the parsing myself in hand tooled code almost completely erased the speed hit I was taking on load.

    --
    Start Running Better Polls
    1. Re:java.util.regex speed sucks by Heembo · · Score: 1

      HOORAY - another reason to avoid the ugliness and maintance nightmare that is RegEx. Thank you for your wise post. You are a paragon of godly inspired wisdom, my Java son! :)

      --
      Horns are really just a broken halo.
    2. Re:java.util.regex speed sucks by mongus · · Score: 1

      I'm curious, were you reusing Patterns or were you creating a new one each time you wanted to use it? They're pretty expensive to create.

    3. Re:java.util.regex speed sucks by Soong · · Score: 1

      Some of each. String.split(String) is awfully convenient, and awfully wasteful because of the regex backend. Other places where I had more complex regex I did break them out and keep the patterns. I still wound up replacing most of those uses also.

      --
      Start Running Better Polls
  28. 3rd party libraries by Julian+Morrison · · Score: 1

    Any company that doesn't allow, nay, embrace third party jarballs is missing 98% of the point of Java. The language is so-so, the built in libraries are nice, but not infinite - but the ability to load componentized, versioned, packaged third-party tools is priceless.

    1. Re:3rd party libraries by shelterpaw · · Score: 0

      When I worked at DevX, are primary focus was reviewing third party products and selling third party development tools to corporations.

  29. Why regular expressions... by cwills · · Score: 1

    If I were to ask everyone to start programming in assembly language, I suspect that I would be laughed at. Yet with regular expressions that is exactly what we are doing. If you take a look at the history of regular expressions, you will find staring right back at you the guts of compiler theory with state machines, finite state automatia, etc. Instead of asking for regular expressions, programmers should be asking for higher level pattern matching facilities. Something as simple as finding the balanced parentheses in the string: (a+b)/((c-d)+e) using a regular expression is difficult. Yet there have been languages that have advanced string matching capabilities around since the 60's (start looking at Snobol -- which is still alive -- and some of it's descendants).

    1. Re:Why regular expressions... by Abcd1234 · · Score: 1

      Something as simple as finding the balanced parentheses in the string: (a+b)/((c-d)+e) using a regular expression is difficult.

      It's not difficult. It's impossible. Perhaps you should start off by using the right tool for the right job.

    2. Re:Why regular expressions... by cgibbard · · Score: 1

      Personally, I like monadic parser combinators, like those provided by the Parsec library for Haskell. You can parse arbitrary context free grammars, and even many sensible context-sensitive ones with little difficulty, you get to write your parsers in the language (Haskell) and you get meaningful parse error reports for free.

      A major downside to the approach is that Parsec itself lacks a symmetric choice combinator, having only left-biased conjunction, together with a combinator which causes a parser not to consume input when it fails. Though other libraries, like Koen Claessen's ReadP rectify this, the associated performance costs tend to be higher.

      I tend to use Parsec even for some tasks where many people would use regular expressions. It might not be quite as fast as statically building your parser, but it's possible to get really quite decent performance out of it, and the convenience level is quite high.

      Another interesting thing to look at are arrow-based parser combinators, like PArrows -- these allow for a greater level of optimisation at runtime, so you can get really good performance while allowing for things like symmetric choice. They also can allow for cool features like the ability to inspect the parser and emit code in various languages for that parser. (The one I linked to has the ability to compile parsers to JavaScript code in fact.) The downside is that arrows tend to be a little more inconvenient to program with than monads.

      While all these libraries are in Haskell, there's no strict reason that the technique couldn't work in another language. The only trouble is that most other languages haven't jumped on the monad bandwagon yet, so programming with monads in something like Java can be somewhat awkward (though one could make the claim that this isn't only true of monads. ;) However, it can be done in Java as well as in Python and (very roughly, not quite monadic) in C
    3. Re:Why regular expressions... by nuzak · · Score: 1

      > Something as simple as finding the balanced parentheses in the string: (a+b)/((c-d)+e) using a regular expression is difficult.

      It's in fact impossible in true regular expressions since it requires you to maintain a stack.

      > Yet there have been languages that have advanced string matching capabilities around since the 60's (start looking at Snobol -- which is still alive -- and some of it's descendants).

      Advanced matching is coming in Perl6 (which is runnable right now, http://www.pugscode.org./ Along with syntax noise that makes perl5 look like python in comparison... alas. I do love perl but as they say, the goggles, they do nothing. At any rate, care to let us know what some of Snobol's modern descendants are?

      --
      Done with slashdot, done with nerds, getting a life.
    4. Re:Why regular expressions... by cwills · · Score: 1
      To name one, the most direct descendant of Snobol is Icon, though to some even that is "old", there is an OO version of Icon, Unicon, being actively developed, as well as an implementation of Icon that "compiles" down to the Java bytecode (jcon).

      Yes -- the point was that a regular expression doesn't handle such things as a searching for balanced parentheses. However even old Snobol had the facility for dealing with balanced parentheses without getting into full grammars and parsers

      --- [ full snobol example] ---
      s = '((abc) def)'
      s '(' bal . data ')'
      output = data
      end

      ---

      would produce

      (abc) def

      Another tact is the Parse statement from the Rexx language. Rexx's approach is why using a swiss army knife when a butter knife is all that is needed. Rexx's paring templates are not as advanced as regular expressions, nor do they approach the power of Snobol's, however they are fairly easy to read (and to write), and for about (wild guess) 90% of the tasks are sufficient. Yes -- there is that 10%, and for that 10% one ends up writting specific pattern matching code.

      The point that I was trying to make is that there is a lot of effort in adding or enhancing regular expressions to everything. Instead there should be more effort in taking pattern matching to a much higher level, and I don't mean simply adding "parser generating support".

  30. Re:Java sucks by Senzei · · Score: 0, Flamebait
    Fans of Ruby/Python/Smalltalk/Lisp/etc will do things like add a calculate-average method to Array and claim that is object-oriented, but it is not really.
    Without knowing all of the languages in that list I feel safe, because of the languages I do know, in asserting that you have no fucking clue what you are talking about.
    --
    Slashdot: Where anecdotes and generalizations can be freely substituted for facts, logic, or intelligence
  31. Re:Java sucks: MODS, that was funny by Bill+Kilgore · · Score: 1

    And I find it much easier to follow.

    --
    Rediculous: A word indicating the writer is ridiculously ignorant.
  32. Use the right tool for the job... by Anonymous Coward · · Score: 0

    That depends whether searching for content in a string is "trivial" or not. More likely, it comes from only encountering complex problems in one of the two subsets and only trivial problems in the other. There are non-trivial problems in both subsets.

    The subset of problems you use a regex for are those where there are non-trivial patterns in the text that you wish to extract. The subset of problems you use a parser/lexer for are those where there is some formal model that describes the syntax the input is expected to have.

    These two problems sets do NOT often overlap. If you're using the wrong tool for the wrong problem, you're in for a world of hurt. You do NOT want to parse XML/HTML/etc. with regexes (you can do a few things, but you open yourself up to a world of well-deserved pain when you realize the true evils of nesting and how they affect regexes).

    Similarly, there's no way in hell you want to search unstructured text with a parser/lexer. Yes, *unstructured* data. Programmers actually deal with that from time to time. It's when we use regexes. You know, when searching for *patterns* ...

    I've written both. I've used both. They're both great problem solving approaches, but using the wrong one invites pain. Sure, maybe you can get by with a half-assed system that has bugs your users will never find (e.g. they won't nest anything too deeply for the XML regex to find), but it's still a bad idea.

    So please, please use the right tool for the job. With my luck, I'll get stuck maintaining your code if you don't :-(

  33. Rapid Java Regex Prototyping by cerelib · · Score: 1

    I recently wrote a small app based on "Filter Builder" by ActiveState. It's called Pattern Sandbox and has helped me rapidly prototype regexes for both Java and Perl (because the Java dialect is very similar to Perl's). I made Pattern Sandbox because it was so annoying to write a regex, compile, get to that part of the code/interface, and then finally try it just to find that it does not work correctly so I have to repeat this process until I get it right. If you are using Java regexes on a regular basis, Pattern Sandbox or similar tools are indispensable. Try it out and feel free to give me some feedback. I hope this is not too much of a plug, but I thought it to be very appropriate.

    1. Re:Rapid Java Regex Prototyping by cowboy76Spain · · Score: 1

      Well, I usually just text my regexp (Java or Perl) in a mini program with just the regexp and the strings to match and when I decide it is the right one I just copy it. Unless it offers some additional help (like a message telling "we didn't accept String "myStringToCheck" because the third character could have been only "s"; the parser was processing the "mys+" element).

      --
      Why can't /. have a rich-text editor? Editing your own HTML is so XXth century.
    2. Re:Rapid Java Regex Prototyping by Abcd1234 · · Score: 2, Insightful

      Am I the only one that finds it quite easy to get regexs right just by, you know, typing them in? If a regex fails for me, 99% of the time, it's because my input data is in a different format from what I expected. But I've almost never needed any kind of "explorer" tool... that smacks of "tweak it until it works", which is never a good idea, IMHO...

    3. Re:Rapid Java Regex Prototyping by nuzak · · Score: 1

      > Am I the only one that finds it quite easy to get regexs right just by, you know, typing them in?

      Nope. But I develop spam filter rules all the live long day. These sometimes demand 10 or so very hairy regexes (zero-width assertions and all) all fire in conjunction, then they have to be tweaked slightly to work whenever the spam mutates slightly. You have no idea how convenient it is to have a tool like Pattern Sandbox that will light up the matches when you incrementally tweak a rule expression so you know your new rule is good.

      (Anyone suggesting better ways of fighting spam is welcome to it, just realize I'm not telling the whole story. In the end it still comes down to matching as many of the Nine Billion names of \/!AGR4 as you can)

      --
      Done with slashdot, done with nerds, getting a life.
    4. Re:Rapid Java Regex Prototyping by Anonymous Coward · · Score: 0

      Depends on the regexp, but in general, if you can type in a nontrivial chunk of code and have it all Just Work the first time, then yes, you're the only one.

      Incremental development has been around for decades. You can try to make it sound bad by calling it "tweak it until it works", but that doesn't change the fact that it's how most good systems are built. It's how all open-source software is done, for example. (Paul Graham observed this by saying that open-source software is higher quality because it admits the possibility of errors.)

      REs or not, who are you that you can write perfect code the first time?

  34. Re:Java sucks by Jtoxification · · Score: 1

    The bottom line is quite simple; that small handful of code accumulates very quickly until it is no longer a small handful, at least when compared to something that uses better (re:shorter) identification names for libraries, and that has sufficient mechanisms to cut down on function and variable names such as scope control, (and I do admit, I absolutely love java's handling of brackets and static brackets ... you can validly place those suckers in the weirdest locations), enumerations, (which java finally added a few years back, albeit in their own ugly way) and much more-importantly, symbolic operator-overloading :-p You know I was heading there.

    At any rate, yes operator-overloading can provide you with multiple ways to shoot yourself in the foot, but Java is already ready for these puppies; think interfaces! All you have to do is look at the most basic and commonly-implemented interfaces that java recognizes, and then say, "Okay, which operators should be overloaded to match these interfaces?," (i.e. the commonly-overloaded operators for queues, lists, stacks, and comparable types, etc) implement those into the virtual machine, and boom, you've got backwards-compatible operator-overloading in Java. No biggie, right? Makes SENSE, right?! Well, heck, at that point you've almost got a dynamic-by-default version of C++ without a macro preprocessor! In my book, that's progressive!

    But Java's had a very crappy version of regular-expression support for ages. I wasn't able to understand it for a very long time and in fact I learned many other regular expression engines for various scripting/programming languages in far shorter periods of time (Perl, Python, PHP, Java script , egrep, f/lex for C/++, etc). But with this newfangled magic era of software libraries (Java and VB and .NET) and toy languages (Primarily VB), if I can create a program that mimics tetris (C#.NET) in just about 7 hours, and the executible code is more than a few times larger than the source code, I call that a toy language - of course, in java, it took weeks for pacman - well, at least for us to do that in a three-student team for pac-man, but that was almost five years ago, in my first java class ever 8-B

    I was such a die-hard C++ fan then. Now I say bring on the libraries and new languages, but save the Visual Basic software for your children to play with, just like almost all of us must have done, at some point. :-p )

    --
    --I gots 99 problems but a new machine ain't one!
    AMD! Asus! Whoot! 6 years!
  35. Maybe... by cowboy76Spain · · Score: 1

    It is just that you should not use a fork to hammer a nail.

    Balancing parentheses was just the first example my teacher told the class when explaining that regular expressions were not suited for everything and that sometimes you had to use grammars.

    --
    Why can't /. have a rich-text editor? Editing your own HTML is so XXth century.
  36. Re:Java sucks by cowboy76Spain · · Score: 2, Insightful

    Apart from the fact that your code is the worst that you can write when using RegEx in Java (as pointed by another post, RTFApi doc if you want to use Java properly), it amuses me that you are complaining that Java (a language designed for using strong OO and being multiplatform) is slower than Perl (a language designed for processing regular expressions).

    You could have said also that the Fire Department sucks because they are not good at catching burglars, or that the Police Department is full of losers because they can not put down a fire. Myself, I will keep using the FD to deal with fire and the PD to deal with crimes.

    --
    Why can't /. have a rich-text editor? Editing your own HTML is so XXth century.
  37. Topical plug: Regex Powertoy by gojomo · · Score: 1

    Great things about the Java 1.4+ regex support, from my perspective, include that (1) it's nearly as full-featured as Perl's regexes (and thus far better than Javascript's); and (2) it's usable in web browsers and via embedded applets.

    Those were both key to helping me create Regex Powertoy, a interactive visual regex tester, much like others mentioned in this discussion -- but fully implemented in a browser. It's in JavaScript and DHTML, with a Java applet for the full-featured and step-controlled regex matching -- requires FF1.5+/IE6+ & Java 1.5+.

    Check it out, break it (it's still got some rough edges under heavy input), let me know how it could be improved.

  38. Theory of Computing by Poromenos1 · · Score: 1

    Gah, and to think I passed that class :P I just hadn't realised that all that theory about automata and K* and whatnot applied to the real world!

    --
    Send email from the afterlife! Write your e-will at Dead Man's Switch.
  39. Re:Java sucks by Anonymous Coward · · Score: 0

    You'd have to implement some useless ArithmeticCollection for that in Java. In these (can't talk for lisp) other languages, you just define the method and throw something when a member doesnt have the + message. How is one worst practice than the other?

  40. HTTP by ilikejam · · Score: 1
    I'm building a toy Java webserver, so I needed a way to parse HTTP request lines...

    private final Pattern methodPattern = Pattern.compile("^(.*) .* HTTP/.*$");
    private final Pattern versionPattern = Pattern.compile("^.* .* HTTP/(.*)$");
    private final Pattern resourcePattern = Pattern.compile("^.* (.*) HTTP/.*$");

    Happy days.
    There was some weirdness with GCJ not behaving like Sun's Java, but that seems to have gone away with the last update to GCJ I did.

    --
    C-x C-s C-x k
    1. Re:HTTP by Anonymous Coward · · Score: 0

      Umm... am I the only one that notices you can just combine all tree like this: ^(.*) (.*) HTTP/(.*)$ or does Java only support one captured match (shudders)? I'm glad I spend most of my time in Perl...

    2. Re:HTTP by ilikejam · · Score: 1

      I get paid by the line.

      --
      C-x C-s C-x k
  41. Re:Java sucks by philci52 · · Score: 1

    Actually, you won't get any output from that, you need to hook up the InputStream from the Process object to the standard out of your own java process and run it in a separate thread or a while loop. I've also found that running interactive processes (both on windows and Unix in java 1.4) to be nearly impossible, as I can't seem to actually send data on the input stream of the other process. There are also platform dependent differences, which can be a pain. Generally I've found exec to be lacking.

  42. 255 pages about Java regex? by JourneyExpertApe · · Score: 1

    If you only program in Java, and you have yet to use regexes, then I could see why you might possibly want this book. But how is it that much better than a general purpose regex book (of which there are several). I would think it would be more useful to have a book that covers regexes as a computing concept and then talks about the differences/limitations of different implementations (grep, sed, Java, JavaScript, Perl, etc.) Is Java still a big enough buzzword to sell books?

    --
    If you can read this sig, you're too close.
  43. Re:Java sucks by apotheon · · Score: 1

    Too bad Perl is better at being multiplatform than Java, too — and that Ruby is better at being strongly OO than Java, despite having a strong Perl heritage.

    --
    Unfetter your ideas. Copyfree your mind.
  44. "Matcher" as in... by Big+Stick · · Score: 1

    Let's light this book on fire? What else can Java do half right that's already been perfected.

  45. Communicating to an external process... by Ayanami+Rei · · Score: 1

    You need to create two sets of FIFOs, one for to talk to your child, and for it to talk back.
    You fork, then dup2 the child's STDIN to the "far end" of the former pipe,
    then you dup2 the child's STDOUT onto the "far end" of the latter pipe.
    Finally, you exec() in your child.
    You hold onto the two near ends and use them as seperate Input/Output streams for control.

    You're going to need to:
    1) Catch SIGPIPE for when the spawned process closes it's reading end of the pipe.
    2) Catch SIGCHLD so you know when the process exited.
    3) Set your near OutputStream to autoflush mode.

    On top of all this, your remote program has to be able to work in an unbuffered mode. Most command line programs don't. They are designed to work with files, and STDIN/STDOUT that are already "in the right mode", having inherited them for a program who had them attached to a TTY.

    That is probably the issue you are having.

    Some programs like 'cat' have a -u option which basically sets autoflush on their end so that you receive data to read as soon as it's available, and not when the fifo decides to flush.
    You can stick that into the beginning of pipeline and it should encourage the others to flow if they don't have an explict unbuffered mode themselves.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:Communicating to an external process... by philci52 · · Score: 1

      Interesting... Seems like you are talking about more that just Runtime.exec() in Java, but sending input to other processes in general, regardless of the language. I'm making this assumption based on the 'C' based approach of your comments.

      I'll have to try the cat -u, although on my maching (linux) the man page for cat specifies that -u is ignored.

  46. Treat regexes like bitmasks... by Ayanami+Rei · · Score: 1

    ...or like enums or any other "magic strings" that you need to make your code actually DO SOMETHING besides act as a framework passing data around...

    1) POSIX classes are your friends
    2) Build large regexes out of small regexes
    3) Compile and name your regexes
    4) Hide regex matching details inside of class methods when appropriate

    I mean, what would you do if you needed a recursive decent parser? Or do we do everything via XML now?

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  47. So what you want... by Ayanami+Rei · · Score: 1

    ...is a parser. Invented about the same time. But those are typically based on transformation rules and regular expressions to tokenize your input.

    You could always build your own regular expression compiler. It's not unheard of. But I submit that the "language" is small enough that it's not worth it.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:So what you want... by cnettel · · Score: 1

      While technically correct, I think it's more to the point to say that parsers are based on tree structures or expansion rules. True transformation rules are so much more powerful, but with far less appealing computational properties.

  48. Re:Java sucks by apotheon · · Score: 1

    Don't make me golf that. Perl gets much more succinct, and this is just my first (somewhat lazy) attempt:

    sub theTruth($) { shift; s/.+/bloated piece of crap/; $_; }

    If I don't have to do it as a sub, I've got this off the top of my head:

    $_ = 'Java'; s/.+/bloated piece of crap/; print;
    --
    Unfetter your ideas. Copyfree your mind.
  49. Re:Java sucks by apotheon · · Score: 1

    . . . and, frankly, you should have some code to do something with your replacement string in the shortened Java example, else I could eliminate the print line in my shorter Perl example. In either case, the Perl example has fewer than half as many characters as the Java example, despite the fact I haven't even started throwing out whitespace that exists only for clarity purposes.

    --
    Unfetter your ideas. Copyfree your mind.
  50. Scsh allows you to embed regexs within regexs by Anonymous Coward · · Score: 0

    You might enjoy the novel way regular expressions are implemented in Scsh, the Scheme Shell.

    http://www.scsh.net/

  51. and now, to celebrate... by adrianmonk · · Score: 1

    And now to celebrate this new-found ability to manipulate strings easily:

    s/trench,/trench/;

    Ah, I knew that would make me feel better.

    1. Re:and now, to celebrate... by jilles · · Score: 1
      --

      Jilles
    2. Re:and now, to celebrate... by adrianmonk · · Score: 1
      You mean like Matcher.replace(String s)?

      Actually I meant like correcting the punctuation error with the newfound power of regular expressions. The fact that I used a sed-like (or perl-like) expression was just incidental and was only because that's the syntax I knew off the top of my head.

  52. Re:Java sucks by kuitang · · Score: 1

    Uh, not for regexes, but basically, from my experience, anyless less than 100 lines of Java runs faster noticeably in about 20 lines of Perl unless you're doing some strange vodoo with the Java API

    --
    Don't believe in miracles -- rely on them.
  53. Third party packages by LizardKing · · Score: 1

    not many companies allow the use of 3rd party libraries

    I assume the review author hasn't worked for many companies then. I have yet to find any company the doesn't use third party packages. Logging, XML parsing and unit testing are just the first three things that spring to mind when I consider what might require a third party package. As for the "DLL hell" that someone alleges in a post to this thread, it's virtually non-existant. You ship the third party packages with your application (as a single JAR or WAR file), and rely on the accepted good practice that people don't set a default CLASSPATH these days.

  54. A whole book on this? by hutchike · · Score: 1

    Man, that's why I don't use Java. I mean - you need a whole book to learn how to use regular expressions in Java? In Perl =~ s/hard/easy/ ;-)

    --
    Zen tips: Pay attention. Don't take it personally. Believe nothing.
  55. Boost? No thanks by Viol8 · · Score: 1

    Aside from Boost being horrid bloatware , what exactly is wrong with the standard POSIX regexp functions? Look up regcomp() , regexec() etc which have been part of the standard C API for years.

    1. Re:Boost? No thanks by Anonymous Coward · · Score: 0

      The reg*() functions are POSIX, not standard C.

    2. Re:Boost? No thanks by Viol8 · · Score: 1

      For all intents and purposes the POSIX API *is* the C API given that C was
      developed on Unix. If other OSes don't support various portions of it then
      thats a failing on their part, but on OSes that do support it theres no reason
      to use Boost unless you really like obfuscated code.

  56. Re:JAVA SUCKS by apotheon · · Score: 1

    Isn't it creepy how D programmers, PCLinuxOS users, and Scientologists all seem to have the same bizarre sort of cultish eagerness to them?

    --
    Unfetter your ideas. Copyfree your mind.
  57. Re:Java sucks by Speare · · Score: 1
    sub theTruth($) { shift; $_ =~ s/Java/Bloated piece of shit/; return $_; }

    First, take out that ($) prototype. Perl doesn't use them that way. In Perl, a prototype is not for the same purpose as they are in other languages. They're for type coercion between scalar and array contexts; in this case you're saying "if they give me an array like a stupid git, please coerce it into a scalar context for me, thanks." If they pass a 26-element array, coercing it to a scalar context ends up giving you a numerical 26. Without the prototype, you'd process the first element of the array.

    Second, the default argument for s/// is the $_ variable, so you don't need to say $_ =~ s///.

    Third, modifying the global $_ in a sub is a recipe for odd bugs in the caller. Localize the damage with local $_, or use a my lexical.

    The return keyword in the last statement is optional. It's up to you to decide what's more readable. In a one-liner, I would omit.

    sub deAcronymify { local $_ = shift; s/\bPERL\b/Perl/; $_ }
    --
    [ .sig file not found ]
  58. Moronic Gibberish by Anonymous Coward · · Score: 0
    Seriously have any of these people actually used Java?

    It's DLL Hell all over again. Every time you use a third-party library, the user has to make sure it's installed. And in the classpath, unless they installed it as root/admin and placed it in the JRE's extensions directory.


    Every Java application you will ever see has a lib directory and in it are the jar (library) files it needs. The script or shortcut you use to start the app will ensure they are on the classpath.
    No set up, no messing, no conflicts with anything.

    Most commercial apps (esp. on Linux) will come with their own JRE too.
  59. Re:Java sucks by Anonymous Coward · · Score: 0

    I figured I badly mangled that Perl one-liner. Oh well. When you get stuck programming in Java, repetition and redundancy gets all too normal.

  60. Your friends != MostProgrammers by ArtStone · · Score: 1

    "Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques"

    Can you cite a source?

    --
    Final 2006 "Proof of Global Warming" US Hurricane Count -> 0
  61. Yes. by Ayanami+Rei · · Score: 1

    The C based approach is necessary because it's a "unix thing" and the issues you have with external process + (x language) are OS-dependant, not language dependant.
    I don't know what the equivalent to "dup2" is in java. Ultimately it's the system call you want your language to use to make the rubber meet the road. I'm sure there's a POSIX class or something you can leverage.
    (Example: In perl you'd use open with the ">=" prefix. But that lulls you into a false sense of portability. I prefer to "use POSIX qw(dup2)" and just dup2 directly.)

    And I noticed that "cat -u" is useless on linux after submitting the post. Instead, check out "Expect" and the utility programs that come with it; specifically "unbuffer". It takes it's arguments and runs then with the stdout flushed for you. Unfortunately you have to use it in each stage of your pipeline. So like:

    unbuffer tail -f /some/fifo | unbuffer od -t x1a | less

    I thought maybe you only had to do the first one to "prime the pump", but I was wrong. The only one you don't have to do is the last one.

    And in your case, since you are the final reader (and you already autoflush your writing pipe), you don't need unbuffer since you are already doing it, so to speak.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:Yes. by philci52 · · Score: 1

      Thanks for the info. Seems like all I was missing was tacking a new line character and then a flush of the my output stream (ie the input stream of the exec'd process).

      FYI, in java there is no "posix" class. The only way to interact with another process is by using Runtime.exec() and then using the Process Object ( http://java.sun.com/j2se/1.4.2/docs/api/java/lang/ Process.html ) returned to interact with the process. Interacting properly actually requires starting 3 threads. I still seem to have a problem getting the prompt to be sent over the stdout from the other process, say for example, if I launch ssh or ftp. Its not that its buffered, it just never shows up. I guess its not entirely necessary. Ahh well. Thanks for the help.