Slashdot Mirror


Java Static Analysis And Custom Bug Detectors

An anonymous reader writes "Java static analysis and custom bug detectors can be a very cost-effective way to improve software quality. By creating a detector for a known bug pattern, we can search for that bug pattern not only in the current code base for a specific project, but in any project, current or future. This article looks at how static analysis tools can change the way you manage software quality."

24 of 157 comments (clear)

  1. PMD and JLINT by Hoolala · · Score: 4, Interesting

    We develop Java-based vertical products and we have found PMD and JLINT when integrated with an appropriate development process, can be highly effective in preventing serious bugs. That said, both PMD and JLINT incorporates "religious" issues, and it is important to determine what the religious issues are and steer clear of them lest the good rules get lost among the non-essential (from project perspective) rules.

  2. FindBugs is awesome by tcopeland · · Score: 5, Informative

    As the lead guy on a "competing" static analysis framework - PMD - I can say that FindBugs is definitely a great piece of work. It catches all sorts of complicated problems with concurrency, does forwards/backwards data flow analysis, etc, etc. It's pretty sweet. Dr Pugh, who runs the project at the University of Maryland, did a JavaPosse interview that's some more good info on the project and where it's going.

    Of course, if you really want to do source code analysis (vs bytecode analysis, which is what FindBugs does), then go for PMD, and [plug] get the book! [/plug]

  3. Searching for future bugs? by noidentity · · Score: 2, Funny

    "By creating a detector for a known bug pattern, we can search for that bug pattern not only in the current code base for a specific project, but in any project, current or future."

    Does it require a 1.21 gigawatt lightning bolt to power the future search feature?

  4. Re:Why not use OCaml or Haskell? by Anonymous Coward · · Score: 2, Insightful

    The point is not to use another piece of software/language to solve bugs. In the industry, you're not always given the choice and, yet even worst, people who make decisive choices are not always knowledgeable to make proper decisions (hello, managers!). We get to deal with the choosen tools and that's it.

    Your alternatives are probablement better in respect of quality due to their formalism but you'd have to convince your manager or boss to sign up with these solutions. Might be easier to stick with Java and use a Java bugs finder tools.

  5. Not just bugs, but lint for bad practices... by Speare · · Score: 2, Insightful

    You should run whatever LINT-like tools you can find. Developers should agree as a group on what warnings are spurrious and what warnings are legitimate, and adjust any lint policy configurations to suit.

    You can also find far more than simple bugs, but you can decide on best practices and consistency standards which should be adhered also. These can vary in importance, but it really helps for a clean and searchable codebase. For a trivial example, if coding in C, decide as a group whether to use *p = '\0' or *p = 0 when writing into a char string. For a more involved example, regularly scan the codebase for regular expressions like (>)\s*(8|16|24) to find possible Intel/PPC endian issues lurking where you don't expect it.

    The adage goes, if you find you're doing something more than once, see if you can automate it, so you can pay more attention to the things which can't be automated. This goes for coding and debugging too.

    --
    [ .sig file not found ]
  6. Re:Why not use OCaml or Haskell? by Umbral+Blot · · Score: 2

    You do realize that for the most part Java has static typing? (For example compare its types to those in Lisp, Lua, ect)

  7. are these actually worthwhile? by buddyglass · · Score: 2, Insightful

    Not having used any static analysis tools, but having worked on several java projects, I question how useful these tools are. In my experience, most bugs that could be detected by static analysis are usually caught relatively quickly anyway. The trickiest (and potentially most damaging) ones are usually non-general enough to slip past a general-purpose tool. Am I mistaken?

    1. Re:are these actually worthwhile? by ashtophoenix · · Score: 2

      Although what you say may be true in that complex or not so straight-forward bugs are more difficult to find, I have seen many a time myself or another developer struggling with a bug for a good 2-3 hours and as soon as another set of eyes looks at the code, he/she points out something obvious, simple, that was wrong with the code. This I have seen more than a few times, surely. So I think there's still merit to a static analyzer/bug-finder, even though it may not be able to find complex problems.

      --
      Life is about being a Phoenix!
    2. Re:are these actually worthwhile? by Llywelyn · · Score: 2, Informative

      "In my experience, most bugs that could be detected by static analysis are usually caught relatively quickly anyway."

      In my experience the *opposite* is true, at least for code that I am not writing myself.

      For instance, since I started using FindBugs on our project (which is fairly large and complex as these things go, with ~5 development teams working on it and with many threads running at the same time), I've caught several potential deadlock issues that would have probably been uncaught until a deadlock happened (most likely after this is deployed), a small host of synchronization (e.g., inconsistent synchronization) and locking problems (e.g., running a bit of code outside of a try block but after the lock is acquired), some memory/performance problems (e.g., inner classes that should have been declared static), and other things of that nature.

      I might, if I went through all of the code by hand, catch all of these issues and a few more, but a tool such as FindBugs gives me a better idea of where to look, and allows me to quickly make a bunch of useful changes without combing through each file that uses synchronization by hand. Sure, a dedicated review of the code would be best, but these are usually changes I can make quickly and easily, and some of these problems might have been difficult to find otherwise (e.g., inconsistent synchronization).

      There is also the benefit in that, while giving me an idea of where to look, it helps me catch other issues that FindBugs does not directly detect.

      --
      Integrate Keynote and LaTeX
  8. Another nifty static analysis project by tcopeland · · Score: 2, Informative

    ...is Sun's Jackpot, headed up by Tom Ball. What's neat about Jackpot is that it does problem fixing, too, using a domain specific language. From the interview:

        $object.show() => $object.setVisible(true) ::
                $object instanceof java.awt.Component;

    Feeding that DSL snippet to Jackpot will transform all Component.show() calls to Component.setVisible(true). Very, very cool stuff. Of course, you don't always want to make the transformation, but in the cases where you do, Jackpot looks like a great solution.

  9. Re:Why not use OCaml or Haskell? by bunions · · Score: 3, Funny

    OCaml and Haskell you say? Excellent. I was looking around for a magical silver bullet the other day, and these look like just the ticket!

    --
    there is no need to sign your posts. this isn't usenet. your username is right there above your post. stop it.
  10. doesn't findbugs do this by josepha48 · · Score: 2, Interesting

    I think findbugs does this. I've started using it and it found lots of bugs in my code. As a result I have learned a few things about java, just by using it and fixing my bugs.

    --

    Only 'flamers' flame!
    Does slashdot hate my posts?

  11. Re:What a strange thing from IBM by TheSunborn · · Score: 2

    Well, java only allow casts that can be legal, and it return null if the cast fails. So if you have public class A { }
    public class B { }
    public void f(B myB) { A myA=(B)myB; }
    This will not compile at all, because there is no way to cast a B to an A.

    If you have
    public class Base { }
    public class Derived extends Base { }
    public void g(Base myBase) { Derived myDerived=(Derived)myBase; }
    The function g is ok, and if myBase is infact an instance of Derived, then all is ok. If it is NOT an instance of Derived, then myDerived vil be null, and any code that use myDerived will cast null pointer exception. You can't cheat the java type system.

    This is unlike c++, which will accept any cast you ask it to do.

  12. Re:That's great... by Decaff · · Score: 4, Informative

    ... until Sun relases a new JRE and all your old aplications do not work at all anymore when users install the new JRE. Unmaintained applications die altogether or require constantly uninstalling and installing various JREs to run them as well as new ones. That's the biggest bug of all in Java and makes any bug tracking useless, and programming in Java pointless.

    C/C++ applications tend to work for decades and can be written to be far more reliably cross-platform.


    Odd. I have found exactly the opposite. Java is very well know for the excellence of its backward compatibility, and to say 'all your old applications don't work anymore' is just plain false. Java would not have had the huge success it has had if this were not the case, so your statement is plainly wrong.

    On the other hand, C/C++ version bugs are well known and well documented - just think of the issues involved with gcc versions and linux kernel compilations. I have a very simple C++ app that compiled and ran fine on one version of gcc, but broke on another.

    If you simply exchange C/C++ for Java, and vice versa, throughout your post, it then makes sense.

  13. Re:Why not use OCaml or Haskell? by IamTheRealMike · · Score: 2, Informative

    The type system of Haskell doesn't let you prove anything radically more interesting than that of Java or C++ to be honest. Also Haskell mixes up a bunch of other random ideas with that type system so you have to take the bad from the good - eg lazyness and the unusual syntax.

  14. Re:What a strange thing from IBM by maraist · · Score: 2, Informative

    You can't cheat the java type system.
    You're kidding right.

    Bar b = new Bar();
    Foo f = (Foo)(Object)b;

    Works just fine for me... Until you get the ClassCast Runtime Exception.

    Now you might call this a contrived example.. Except that it's not.

    How many thousands of function calls take Serializable or worse "Object" as a parameter? Virtually every IPC related activity does at some point. That includes all of j2ee, which are considered "enterprise" level coding frameworks.

    Generics was a step in the right direction with compile-time enforcement of "many" of these opaque "Object" APIs.. But It definitely didn't penetrate some of the more important areas; just collections (which was at least the most [mis]used form of generalized types).

    But Generics doesn't have any means of enforcement.

    Collection myFoos = new ArrayList();
    Collection myUnsafeFoos = myFoos;
    Bar bar;
    myUnsafeFoos.add(bar);
    Foo foo = myFoos.iterate().next();

    will throw a ClassCast Runtime Exception.

    Now it's semi-trivial to write collections to enforce type-safety (just like synchornization).. But this is as effective as cooperative multi-tasking was in the 90's at reducing bugs.

    Java has a lot of historical baggage that keeps it from being a refined and bug-resistant language.. And the proliferation of XML-configured reflective programming is really getting out of hand. That being said, I am not aware of any other development platform that is as versatile. .NET had a triffle of potential (being a rewrite of Java), except that it's got Big [Corporate] Brother to keep it from reaching it's full potential.

    --
    -Michael
  15. No libraries? by SuperKendall · · Score: 2, Insightful

    until Sun relases a new JRE and all your old aplications do not work at all anymore when users install the new JRE.

    That hasn't been a problem so far as even Java 1.1 applications will still work just fine today in Java 1.5.

    That is because unlike other languages, Java has taken a lot of care to keep things working through revisions. Libraries going into disuse are deprecated, not removed - so you have a long time while a library or method call still exists before going away.

    Even the design of Java's Generics system was made so that older code would be able to work in harmony with it.

    When you're thinking of language revisions breaking code, you must thinking of C#... it's easy to get confused since it's a direct clone of Java.

    C/C++ applications tend to work for decades and can be written to be far more reliably cross-platform.

    They tend to work for decades because they are still running on the same 386 box they were originally installed on (a testament to Linux). Now if you are trying to move that complex C/C++ app to a more recent platform, like say Fedora 5 - all the sudden you glib isn't quite what the program was expecting.

    C/C++ is great for cross platform compatibility along with performance, but you should not pretend that it doesn't take work to maintain and keep things in sync with librarires and system calls.

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  16. Re:What a strange thing from IBM by NoOneInParticular · · Score: 2, Informative
    ... and given that the many useful classes give stuff back in the form of Object, this is all very helpful ...

    In any case, C++ has all but abandoned the C-style form of casting, which forms the syntactical basis for Java's casting mechansims: currently C++ sports dynamic_cast (Java-style cast with dynamic type check, returns 0 if the cast fails), static_cast (does not do type checking, but still does a basic compile time check like java. It is present if there's no way that the cast can fail; at least if the programmer thinks that is the case), and reinterpret_cast (interprets bit_patterns as anything you want, the most liberal form of casting).

    As usual in C++, you don't pay for what you don't use, if you don't need a runtime type-check, the language doesn't insist you should use one.

  17. Re:What a strange thing from IBM by roman_mir · · Score: 2, Informative

    Could you expand upon what you mean? I'm not sure I understand you. What is ATM? I haven't heard of it before. - Accelerating (universal) Turing Machine, ATM is a class of TM that is capable of solving complex problems, more precisely ATM = {| M is a TM and M accepts w} This means that this machine will test input w on ALL possible Turing Machine configurations M with the assumption of finding a Halt (accept/reject state.) ATM is undecidable and I am not going into Oracle TM, which could supposedely decide ATM. ATM cannot decide HALT, that's the main point.

    proving termination of reasonably coded functions is quite practical for everyday programs you write - agreed, that is what complexity and descrete math is all about. But automatic induction will take the same amount of time to run as the actual code that is being tested, which means that for all inputs, there is no polynomial solution. Besides, real-life code may depend on states of other external components, such as user input/databases/network input/interrupts etc., which just multiply the number of total possible inputs.

    If it's hard to prove (or impossible) to prove that a function terminates, how can you yourself as a programmer be sure it terminates (you must have some idea why it does if you wrote it)? For this reason, it really isn't that important that the halting problem is undecidable. - I, as a programmer have an understanding of the base case and of the induction, but in reality there can always be an input to the function that will go out of the boundaries of the function. You believe that such input is possible to find with an automated induction machine, I know it is possible to find, but I know that there is no feasible solution for all inputs. Basically your inference engine will have to use heuristics to rule out less likely input subtrees, but this means that there is no guarantee that the engine has covered every single possibility.

    I understand that we can write code to detect some deadlocks and some infinite loops in compile-time. I also understand that the code that detects dead-locks and infinite loops in runtime always works better, because it can catch conditions, for which the input could not be tested by a compiler.

    --
    Again practically speaking I would rather see people write good unit-tests, and this will catch much more problems than these bug-detectors.

    If these bug-detectors actually become good enough to be incorporated into compilers, then go nuts, use the compiler directive to try and find these bugs. But again, on my projects I wouldn't recommend going with bug-detectors over unit-tests and given the simple fact that projects have limited resources (limited time, money and people) there is always a compromise that needs to be made.

  18. Re:Why a seperate tool? by roman_mir · · Score: 2, Informative

    Of-course unit-tests can't find all race conditions. But this bug-detector won't find all race conditions either.

    Again, I wouldn't bother with it in most real-life situtations. We all have deadlines and resource limitations. Besides Java is mostly used on the back-end today and it is mostly used within some J2EE container. Manual thread management should be avoided as a matter of principle in these situations and the resources that are shared must be thread safe. The best thing to do is to avoid complexity where possible rather than try and solve an already existing problem.

    However I had to work on various projects where I had to manipulate threads and shared resources by myself outside of any container. More than that, I had to coordinate Java threads with C++ threads that were used as middle-tier and connected driver functionality of C code to Java front-end. And in that situation there was no way any automated tool could help me with all the complications, I just had to think my way through the problems and debug them and as I was debugging them I created the necessary unit-tests. The bottom line is I think these tools are too primitive for really complex situations and at the same time they are too much for most of coding that is done in Java and J2EE. So again, I would rather see people write and maintain good unit-tests.

  19. Re:That's great... by Jagungal · · Score: 2, Funny

    Clearly you are a youngin that has been indoctrinated by educators because it seems that java is all they will teach nowadays. I fail to understand why. I assume its laziness, lack of funding and loss of touch with the real world.

    I find it amusing that all of the people posting about their positive experiences with Java have user id's less than 50000, meaning they have been around here quite a while.

      The trolls are all anonymously sprouting FUD .... maybe the anonymous Trolls need to come out of thier holes and visit the real world now and then.

  20. Re:Why a seperate tool? by roman_mir · · Score: 2, Insightful

    I mean, it can find bugs without all this tedious unit-test writing. - this 'tedious unit-test writing' is the only way to make sure that the business rules are asserted. Concurrency problem is a non-problem for most applications. In case of Java, J2EE container handles the concurrency issues. Configuring another tool just for the sake of configuring just another tool is a waste of resources and cannot be always justified. I would rather see my people spend more time writing unit-tests than configure pointless tools.

    Good day.

  21. Custom bug detector I wrote for FindBugs last week by wpugh · · Score: 2, Interesting

    As an example of turning bug instances into bug patterns, I always read through the list of bugs fixed in each version of the jdk1.6.0 builds. In build 89, a bug was fixed in the serialization of ArrayBlockingQueue.

    I wrote a FindBugs bug detector to look for similar cases: a class with transient fields, but no readObject or readResolve method to restore the field. I had to tune the detector a bit (for example, raise the priority if it is set to a non-default value in the constructor). I'm still doing some tuning, but at the moment the new detector reports warnings in 47 jdk 1.6 b89 classes, 18 of which are confirmed to be bugs. This took me a total of 5 hours of work.

    Bugs listed below (these have been reported to Sun); this detector isn't in the current 1.0 release of FindBugs, but is available is the latest CVS snapshot, and will be in the next release.

      Bill Pugh

    -----

    java.security.Timestamp and java.security.CodeSigner:
            they have a transient myhash field used to cache the hashCode that is
            initialized to -1. If you serialize/deserialize one of these
            and invoke hashCode on the result, you'll get an incorrect hashCode of 0.

    javax.management.AttributeList
            has a transient boolean field tainted. If you add something other than an Attribute
            to an AttributeList, serialize/deserialize it, and then invoke asList(), you get back
            a List that contains something that isn't an Attribute. If you call asList() on
              the original AttributeList, you get an exception.

    javax.management.relation.RoleList
    javax.management.relation.RoleUnresolvedList
            problems isomorphic to the above problem

    sun.util.BuddhistCalendar
            has a transient field yearOffset that is initialized in the constructor. If you
            serialize/deserialize a BuddhistCalendar, you get back a broken BuddhistCalendar
            that computes dates incorrectly (off by 543 years)

    javax.swing.DefaultDesktopManager
            has a transient field floatingItems that is initialized to an empty array of Rectangles, and
            it sure looks like the code is assuming that floatingItems is assumed to be nonnull, so
            if you serialize/deserialize it, it will be broken (of course, I can never be sure if
            anybody seriously intends for awt/swing objects to be serialized.

    com.sun.rowset.CachedRowSetImpl
    com.sun.rowset.FilteredRowSetImpl
    com.sun.rowset.JdbcRowSetImpl
    com.sun.rowset.JoinRowSetImpl
    com.sun.rowset.WebRowSetImpl
    com.sun.rowset.internal.CachedRowSetReader
    com.sun.rowset.internal.CachedRowSetWriter
    com.sun.rowset.internal.InsertRow
    com.sun.rowset.internal.SyncResolverImpl
    com.sun.rowset.internal.WebRowSetXmlReader
    com.sun.rowset.internal.WebRowSetXmlWriter
    com.sun.rowset.providers.RIOptimisticProvider
            all initialize in their constructors transient fields pointing to resource bundles
                    for providing localized error messages, and the resource bundle will be null if the
            an object is deserialized and serialized.

    javax.smartcardio.CommandAPDU
            has 3 transient fields (nc, ne and dataOffset) that are computed by the call to parse in the constructor
            from the apdu array. However, if the object is serialized/deserialized, the fields will have their
            default values.

  22. Re:What a strange thing from IBM by maraist · · Score: 2, Interesting

    That's not the correct way to be doing things, anyway. Try this instead

    You missed the part where I compared it to cooperative multi-tasking.. You are wrapping the collection at constructor time, BUT half (and I do mean half) of the time you don't have control over the constructor to an object.. Especially if you are writing middle-ware code, which is most of what Java does - at least good application designs write most of their code in the form of middle-ware.

    Take Sort for example... It can't depend on the fact that even though it asks for a generified collection that there is any sort of type-safety involved.. The only thing it can do is pre-validate the data-type of the existing items in the collection that is passed to it. But that's a performance hit, and the ONLY thing that this will do is produce a more meaningful runtime exception.. i.e. instead of an exception in the comparator you get one in the sorter with an explicit "element in collection of type X was really Y" RuntimeException.

    Caches, Marshellers / Serializers, IPC services, persistence sercies, etc. They all are middleware applications which throw really really confusing errors sometimes because they aren't passed the expected classes. And when I say confusing, I mean they don't often throw class-cast exceptions, but instead meta-data mismatch exceptions.. But that leads you to believe that you've missed an attribute in the XML configuration instead of the fact that you've adding an object of the wrong type to the middle-layer.

    Ideally, generics would be fully enforced by the VM. What we currently have (even with the spettering of Collections.unmodifiable,synchronized,checked etc are weak-enforcement which at best provides spot cleanness of code.. But any static anylizer tool could have detected innappropriate local bugs.. The more critical bugs are inter-moule bugs. And APIs of that sort tend to be littered with innappropriate parameter-checking. If you don't think this is a problem, then what is the number 1 security loop-hole in most older C libraries? gets(). This function (for performance purposes) didn't verify the size of the string so it allowed the over-flowing of the buffer and overwriting of user-space memory. Most other languages handle strings in a less performant, but more robust manner, so this type of bug has mostly dissapeared.

    In java, with the advent of dynamic proxies and aspect-oriented-programming, the situation is even worse, because inter-module libraries can proxy objects which don't even match the appropriate prototype/interface. So you actually wouldn't get a class-cast-exception, but instead an arbitrary exception (most likely an NPE) inside the InvocationHandler.

    This is mostly a rant, but it's based on my growing frustration with the lack of type-safety in java frameworks... Yes, you're certainly free to not use those frameworks.. But with the increasing movement into container-managed services (tomcat, jboss, or even spring/pico-container), this type-looseness is becoming a growing problem.

    --
    -Michael