Java Static Analysis And Custom Bug Detectors

← Back to Stories (view on slashdot.org)

Java Static Analysis And Custom Bug Detectors

Posted by ryuzaki0 on Monday July 3, 2006 @04:06AM from the developerworks-ftl dept.

An anonymous reader writes "Java static analysis and custom bug detectors can be a very cost-effective way to improve software quality. By creating a detector for a known bug pattern, we can search for that bug pattern not only in the current code base for a specific project, but in any project, current or future. This article looks at how static analysis tools can change the way you manage software quality."

9 of 157 comments (clear)

Min score:

Reason:

Sort:

FindBugs is awesome by tcopeland · 2006-07-03 04:18 · Score: 5, Informative

As the lead guy on a "competing" static analysis framework - PMD - I can say that FindBugs is definitely a great piece of work. It catches all sorts of complicated problems with concurrency, does forwards/backwards data flow analysis, etc, etc. It's pretty sweet. Dr Pugh, who runs the project at the University of Maryland, did a JavaPosse interview that's some more good info on the project and where it's going.

Of course, if you really want to do source code analysis (vs bytecode analysis, which is what FindBugs does), then go for PMD, and [plug] get the book! [/plug]

--
The Army reading list
Another nifty static analysis project by tcopeland · 2006-07-03 05:02 · Score: 2, Informative

...is Sun's Jackpot, headed up by Tom Ball. What's neat about Jackpot is that it does problem fixing, too, using a domain specific language. From the interview:
$object.show() => $object.setVisible(true) :: $object instanceof java.awt.Component;
Feeding that DSL snippet to Jackpot will transform all Component.show() calls to Component.setVisible(true). Very, very cool stuff. Of course, you don't always want to make the transformation, but in the cases where you do, Jackpot looks like a great solution.

--
The Army reading list
Re:That's great... by Decaff · 2006-07-03 05:22 · Score: 4, Informative

... until Sun relases a new JRE and all your old aplications do not work at all anymore when users install the new JRE. Unmaintained applications die altogether or require constantly uninstalling and installing various JREs to run them as well as new ones. That's the biggest bug of all in Java and makes any bug tracking useless, and programming in Java pointless.

C/C++ applications tend to work for decades and can be written to be far more reliably cross-platform.

Odd. I have found exactly the opposite. Java is very well know for the excellence of its backward compatibility, and to say 'all your old applications don't work anymore' is just plain false. Java would not have had the huge success it has had if this were not the case, so your statement is plainly wrong.

On the other hand, C/C++ version bugs are well known and well documented - just think of the issues involved with gcc versions and linux kernel compilations. I have a very simple C++ app that compiled and ran fine on one version of gcc, but broke on another.

If you simply exchange C/C++ for Java, and vice versa, throughout your post, it then makes sense.
Re:Why not use OCaml or Haskell? by IamTheRealMike · 2006-07-03 06:02 · Score: 2, Informative

The type system of Haskell doesn't let you prove anything radically more interesting than that of Java or C++ to be honest. Also Haskell mixes up a bunch of other random ideas with that type system so you have to take the bad from the good - eg lazyness and the unusual syntax.
Re:What a strange thing from IBM by maraist · 2006-07-03 06:03 · Score: 2, Informative

You can't cheat the java type system.
You're kidding right.

Bar b = new Bar();
Foo f = (Foo)(Object)b;

Works just fine for me... Until you get the ClassCast Runtime Exception.

Now you might call this a contrived example.. Except that it's not.

How many thousands of function calls take Serializable or worse "Object" as a parameter? Virtually every IPC related activity does at some point. That includes all of j2ee, which are considered "enterprise" level coding frameworks.

Generics was a step in the right direction with compile-time enforcement of "many" of these opaque "Object" APIs.. But It definitely didn't penetrate some of the more important areas; just collections (which was at least the most [mis]used form of generalized types).

But Generics doesn't have any means of enforcement.

Collection myFoos = new ArrayList();
Collection myUnsafeFoos = myFoos;
Bar bar;
myUnsafeFoos.add(bar);
Foo foo = myFoos.iterate().next();

will throw a ClassCast Runtime Exception.

Now it's semi-trivial to write collections to enforce type-safety (just like synchornization).. But this is as effective as cooperative multi-tasking was in the 90's at reducing bugs.

Java has a lot of historical baggage that keeps it from being a refined and bug-resistant language.. And the proliferation of XML-configured reflective programming is really getting out of hand. That being said, I am not aware of any other development platform that is as versatile. .NET had a triffle of potential (being a rewrite of Java), except that it's got Big [Corporate] Brother to keep it from reaching it's full potential.

--
-Michael
Re:What a strange thing from IBM by NoOneInParticular · 2006-07-03 06:05 · Score: 2, Informative

... and given that the many useful classes give stuff back in the form of Object, this is all very helpful ...
In any case, C++ has all but abandoned the C-style form of casting, which forms the syntactical basis for Java's casting mechansims: currently C++ sports dynamic_cast (Java-style cast with dynamic type check, returns 0 if the cast fails), static_cast (does not do type checking, but still does a basic compile time check like java. It is present if there's no way that the cast can fail; at least if the programmer thinks that is the case), and reinterpret_cast (interprets bit_patterns as anything you want, the most liberal form of casting).
As usual in C++, you don't pay for what you don't use, if you don't need a runtime type-check, the language doesn't insist you should use one.
Re:are these actually worthwhile? by Llywelyn · 2006-07-03 07:01 · Score: 2, Informative

"In my experience, most bugs that could be detected by static analysis are usually caught relatively quickly anyway."

In my experience the *opposite* is true, at least for code that I am not writing myself.

For instance, since I started using FindBugs on our project (which is fairly large and complex as these things go, with ~5 development teams working on it and with many threads running at the same time), I've caught several potential deadlock issues that would have probably been uncaught until a deadlock happened (most likely after this is deployed), a small host of synchronization (e.g., inconsistent synchronization) and locking problems (e.g., running a bit of code outside of a try block but after the lock is acquired), some memory/performance problems (e.g., inner classes that should have been declared static), and other things of that nature.

I might, if I went through all of the code by hand, catch all of these issues and a few more, but a tool such as FindBugs gives me a better idea of where to look, and allows me to quickly make a bunch of useful changes without combing through each file that uses synchronization by hand. Sure, a dedicated review of the code would be best, but these are usually changes I can make quickly and easily, and some of these problems might have been difficult to find otherwise (e.g., inconsistent synchronization).

There is also the benefit in that, while giving me an idea of where to look, it helps me catch other issues that FindBugs does not directly detect.

--
Integrate Keynote and LaTeX
Re:What a strange thing from IBM by roman_mir · 2006-07-03 08:33 · Score: 2, Informative

Could you expand upon what you mean? I'm not sure I understand you. What is ATM? I haven't heard of it before. - Accelerating (universal) Turing Machine, ATM is a class of TM that is capable of solving complex problems, more precisely ATM = {| M is a TM and M accepts w} This means that this machine will test input w on ALL possible Turing Machine configurations M with the assumption of finding a Halt (accept/reject state.) ATM is undecidable and I am not going into Oracle TM, which could supposedely decide ATM. ATM cannot decide HALT, that's the main point.

proving termination of reasonably coded functions is quite practical for everyday programs you write - agreed, that is what complexity and descrete math is all about. But automatic induction will take the same amount of time to run as the actual code that is being tested, which means that for all inputs, there is no polynomial solution. Besides, real-life code may depend on states of other external components, such as user input/databases/network input/interrupts etc., which just multiply the number of total possible inputs.

If it's hard to prove (or impossible) to prove that a function terminates, how can you yourself as a programmer be sure it terminates (you must have some idea why it does if you wrote it)? For this reason, it really isn't that important that the halting problem is undecidable. - I, as a programmer have an understanding of the base case and of the induction, but in reality there can always be an input to the function that will go out of the boundaries of the function. You believe that such input is possible to find with an automated induction machine, I know it is possible to find, but I know that there is no feasible solution for all inputs. Basically your inference engine will have to use heuristics to rule out less likely input subtrees, but this means that there is no guarantee that the engine has covered every single possibility.

I understand that we can write code to detect some deadlocks and some infinite loops in compile-time. I also understand that the code that detects dead-locks and infinite loops in runtime always works better, because it can catch conditions, for which the input could not be tested by a compiler.

--
Again practically speaking I would rather see people write good unit-tests, and this will catch much more problems than these bug-detectors.

If these bug-detectors actually become good enough to be incorporated into compilers, then go nuts, use the compiler directive to try and find these bugs. But again, on my projects I wouldn't recommend going with bug-detectors over unit-tests and given the simple fact that projects have limited resources (limited time, money and people) there is always a compromise that needs to be made.

--
You can't handle the truth.
Re:Why a seperate tool? by roman_mir · 2006-07-03 09:01 · Score: 2, Informative

Of-course unit-tests can't find all race conditions. But this bug-detector won't find all race conditions either.

Again, I wouldn't bother with it in most real-life situtations. We all have deadlines and resource limitations. Besides Java is mostly used on the back-end today and it is mostly used within some J2EE container. Manual thread management should be avoided as a matter of principle in these situations and the resources that are shared must be thread safe. The best thing to do is to avoid complexity where possible rather than try and solve an already existing problem.

However I had to work on various projects where I had to manipulate threads and shared resources by myself outside of any container. More than that, I had to coordinate Java threads with C++ threads that were used as middle-tier and connected driver functionality of C code to Java front-end. And in that situation there was no way any automated tool could help me with all the complications, I just had to think my way through the problems and debug them and as I was debugging them I created the necessary unit-tests. The bottom line is I think these tools are too primitive for really complex situations and at the same time they are too much for most of coding that is done in Java and J2EE. So again, I would rather see people write and maintain good unit-tests.

--
You can't handle the truth.