Java Static Analysis And Custom Bug Detectors
An anonymous reader writes "Java static analysis and custom bug detectors can be a very cost-effective way to improve software quality. By creating a detector for a known bug pattern, we can search for that bug pattern not only in the current code base for a specific project, but in any project, current or future. This article looks at how static analysis tools can change the way you manage software quality."
We develop Java-based vertical products and we have found PMD and JLINT when integrated with an appropriate development process, can be highly effective in preventing serious bugs. That said, both PMD and JLINT incorporates "religious" issues, and it is important to determine what the religious issues are and steer clear of them lest the good rules get lost among the non-essential (from project perspective) rules.
http://buddytrace.com/
As the lead guy on a "competing" static analysis framework - PMD - I can say that FindBugs is definitely a great piece of work. It catches all sorts of complicated problems with concurrency, does forwards/backwards data flow analysis, etc, etc. It's pretty sweet. Dr Pugh, who runs the project at the University of Maryland, did a JavaPosse interview that's some more good info on the project and where it's going.
Of course, if you really want to do source code analysis (vs bytecode analysis, which is what FindBugs does), then go for PMD, and [plug] get the book! [/plug]
The Army reading list
We develop feminine-based vertical products and we have found PMS and JOINT when integrated with an appropriate development process, can be highly effective in preventing serious flows. That said, both PMS and JOINT incorporates "religious" issues, and it is important to determine what the religious issues are and steer clear of them lest the good rules get lost among the non-essential (from project perspective) rules.
While there may be various reasons why one would rather go with Java, if fairly high-quality software is needed, OCaml and Haskell may be just the solution. This is especially so when formally-verified software would be excessively expensive.
Even if static typing does limit flexibility somewhat, and does put more responsibility on the developer to ensure that their programs type correctly, doing so often leads directly to fewer problems. Plus you get the benefit of typing problems being discovered at compile time, rather than at runtime (where a user will see it, rather than a developer).
Due to coming out of various branches of mathematics, the type systems of languages like Haskell and OCaml are far superior to that of Java. It's almost a case where it's pointless to bother with Java static analysis tools, when using a language with proper static typing takes care of all that immediately at compile-time.
"By creating a detector for a known bug pattern, we can search for that bug pattern not only in the current code base for a specific project, but in any project, current or future."
Does it require a 1.21 gigawatt lightning bolt to power the future search feature?
It's called "lint".
Didn't Java become legacy software sometime in the late 90's? Who in their right mind would even start a new Java project nowadays?
The detector code is 3x the size of the error. Though this is an example, what would it look like in real life apps, when the bug detector code is 100s or thousands of lines of bug prone code itself?
I wouldn't trust it. The people who write bad code will write bad debug detection code... especially if they are coders who have been outsourced and do not care if it works or not.
You should run whatever LINT-like tools you can find. Developers should agree as a group on what warnings are spurrious and what warnings are legitimate, and adjust any lint policy configurations to suit.
You can also find far more than simple bugs, but you can decide on best practices and consistency standards which should be adhered also. These can vary in importance, but it really helps for a clean and searchable codebase. For a trivial example, if coding in C, decide as a group whether to use *p = '\0' or *p = 0 when writing into a char string. For a more involved example, regularly scan the codebase for regular expressions like (>)\s*(8|16|24) to find possible Intel/PPC endian issues lurking where you don't expect it.
The adage goes, if you find you're doing something more than once, see if you can automate it, so you can pay more attention to the things which can't be automated. This goes for coding and debugging too.
[
One would think that out of all people, IBM staff would be familiar with the ATM or the Halting Problem. I think that the bugs that are really important to find, are those that are not feasible to find with automated tools, and the bugs that this article is talking about are the simples ones.
Also wouldn't this 'static bug detection' be unnecessary if Java was a strong typed language? The idea of casting is of-course a powerful one, but it is this idea that is probably responsible for the most non-business related bugs in the code. This, and null pointers of-course (a strange name for an exception in a language that uses no pointers.)
In any case, I would rather see people do something than nothing, so I guess bug detectors better than no bug detectors, but in reality I would rather have the developers write good unit-tests.
You can't handle the truth.
Not having used any static analysis tools, but having worked on several java projects, I question how useful these tools are. In my experience, most bugs that could be detected by static analysis are usually caught relatively quickly anyway. The trickiest (and potentially most damaging) ones are usually non-general enough to slip past a general-purpose tool. Am I mistaken?
And compile at the highest warning level. Code is not done until it compiles cleanly.
And always remember - "size_t" is "size_t" - NOT "unsigned int". You are NOT smarter than the defined types that are STANDARD and IMPLEMENTATION-DEFINED. Think so? Try compiling your code as a 64-bit application.
... until Sun relases a new JRE and all your old aplications do not work at all anymore when users install the new JRE. Unmaintained applications die altogether or require constantly uninstalling and installing various JREs to run them as well as new ones. That's the biggest bug of all in Java and makes any bug tracking useless, and programming in Java pointless.
C/C++ applications tend to work for decades and can be written to be far more reliably cross-platform.
Java sucks bad, face it.
...is Sun's Jackpot, headed up by Tom Ball. What's neat about Jackpot is that it does problem fixing, too, using a domain specific language. From the interview:
::
$object.show() => $object.setVisible(true)
$object instanceof java.awt.Component;
Feeding that DSL snippet to Jackpot will transform all Component.show() calls to Component.setVisible(true). Very, very cool stuff. Of course, you don't always want to make the transformation, but in the cases where you do, Jackpot looks like a great solution.
The Army reading list
I think findbugs does this. I've started using it and it found lots of bugs in my code. As a result I have learned a few things about java, just by using it and fixing my bugs.
Only 'flamers' flame!
Does slashdot hate my posts?
until Sun relases a new JRE and all your old aplications do not work at all anymore when users install the new JRE.
That hasn't been a problem so far as even Java 1.1 applications will still work just fine today in Java 1.5.
That is because unlike other languages, Java has taken a lot of care to keep things working through revisions. Libraries going into disuse are deprecated, not removed - so you have a long time while a library or method call still exists before going away.
Even the design of Java's Generics system was made so that older code would be able to work in harmony with it.
When you're thinking of language revisions breaking code, you must thinking of C#... it's easy to get confused since it's a direct clone of Java.
C/C++ applications tend to work for decades and can be written to be far more reliably cross-platform.
They tend to work for decades because they are still running on the same 386 box they were originally installed on (a testament to Linux). Now if you are trying to move that complex C/C++ app to a more recent platform, like say Fedora 5 - all the sudden you glib isn't quite what the program was expecting.
C/C++ is great for cross platform compatibility along with performance, but you should not pretend that it doesn't take work to maintain and keep things in sync with librarires and system calls.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
There seems to be nice ones for Java and C. But does anyone knows what is out there for C++, appart of the commercial service of Coverity?
If this bug detection is as good as they say, it should be part of the compiler. If it is not good enough to be part of the compiler, I wouldn't bother with it.
You can't handle the truth.
I've used lint4j, pmd, checstyle and indeed findbugs. These tools are very useful. The biggest problem is finding the time to fix the issues. It's tempting to skip the minor issues but then this is where you need to be strong.
I'd recommend that serious java developers integrate the above mentioned tools into their nightly builds and treat the identified issues as real bugs.
Jilles
IntelliJ Idea (http://www.jetbrains.com/), a java IDE, has had "custom bug detectors" in it for a while. And you can add your own, via the plugin api, and you can select which ones you want on/off, and its part of the tool, like it should be. You get a GUI for fixing it, that matches compiler based syntax errors, etc. Makes me wonder if IntelliJ features came first, or this open-source project did. Anyone know?
As an example of turning bug instances into bug patterns, I always read through the list of bugs fixed in each version of the jdk1.6.0 builds. In build 89, a bug was fixed in the serialization of ArrayBlockingQueue.
I wrote a FindBugs bug detector to look for similar cases: a class with transient fields, but no readObject or readResolve method to restore the field. I had to tune the detector a bit (for example, raise the priority if it is set to a non-default value in the constructor). I'm still doing some tuning, but at the moment the new detector reports warnings in 47 jdk 1.6 b89 classes, 18 of which are confirmed to be bugs. This took me a total of 5 hours of work.
Bugs listed below (these have been reported to Sun); this detector isn't in the current 1.0 release of FindBugs, but is available is the latest CVS snapshot, and will be in the next release.
Bill Pugh
-----
java.security.Timestamp and java.security.CodeSigner:
they have a transient myhash field used to cache the hashCode that is
initialized to -1. If you serialize/deserialize one of these
and invoke hashCode on the result, you'll get an incorrect hashCode of 0.
javax.management.AttributeList
has a transient boolean field tainted. If you add something other than an Attribute
to an AttributeList, serialize/deserialize it, and then invoke asList(), you get back
a List that contains something that isn't an Attribute. If you call asList() on
the original AttributeList, you get an exception.
javax.management.relation.RoleList
javax.management.relation.RoleUnresolvedList
problems isomorphic to the above problem
sun.util.BuddhistCalendar
has a transient field yearOffset that is initialized in the constructor. If you
serialize/deserialize a BuddhistCalendar, you get back a broken BuddhistCalendar
that computes dates incorrectly (off by 543 years)
javax.swing.DefaultDesktopManager
has a transient field floatingItems that is initialized to an empty array of Rectangles, and
it sure looks like the code is assuming that floatingItems is assumed to be nonnull, so
if you serialize/deserialize it, it will be broken (of course, I can never be sure if
anybody seriously intends for awt/swing objects to be serialized.
com.sun.rowset.CachedRowSetImpl
com.sun.rowset.FilteredRowSetImpl
com.sun.rowset.JdbcRowSetImpl
com.sun.rowset.JoinRowSetImpl
com.sun.rowset.WebRowSetImpl
com.sun.rowset.internal.CachedRowSetReader
com.sun.rowset.internal.CachedRowSetWriter
com.sun.rowset.internal.InsertRow
com.sun.rowset.internal.SyncResolverImpl
com.sun.rowset.internal.WebRowSetXmlReader
com.sun.rowset.internal.WebRowSetXmlWriter
com.sun.rowset.providers.RIOptimisticProvider
all initialize in their constructors transient fields pointing to resource bundles
for providing localized error messages, and the resource bundle will be null if the
an object is deserialized and serialized.
javax.smartcardio.CommandAPDU
has 3 transient fields (nc, ne and dataOffset) that are computed by the call to parse in the constructor
from the apdu array. However, if the object is serialized/deserialized, the fields will have their
default values.
This would be a nice server side. Wouldn't it be wonderful to have something server side that would check new configurations in cases like new wrodpress setups, or other software. It would be nice to have something test lamp, database queries made in intial set up. Would be great to be able to test different versions of software along with current databse and server configurations. Might save some stupid hard to figure out errors that are easily seeable from a little java type script like this one.
Nice one! I'll be adding it to IntelliJ IDEA on the train to work this morning.