Do Static Source Code Analysis Tools Really Work?
jlunavtgrad writes "I recently attended an embedded engineering conference and was surprised at how many vendors were selling tools to analyze source code and scan for bugs, without ever running the code. These static software analysis tools claim they can catch NULL pointer dereferences, buffer overflow vulnerabilities, race conditions and memory leaks. Ive heard of Lint and its limitations, but it seems that this newer generation of tools could change the face of software development. Or, could this be just another trend? Has anyone in the Slashdot community used similar tools on their code? What kind of changes did the tools bring about in your testing cycle? And most importantly, did the results justify the expense?"
They're not perfect, and won't catch everything, but they do work. Combined with unit testing, you can get a very low bug rate. Many of these (for Java, at least) are open source, so the expense in negligible.
Yes, some static analysis tools really work. FindBugs works well for Java. Fortify has had good success finding security vulnerabilities. These tools take static checking just a step beyond what's offered by a compiler, but in practice that's very useful.
I found even tools like lint or splint can catch interesting bugs but not nearly as many as the time I enabled GCC's type checking for varargs on a software project a company I work for was developing.
I forgot to answer your other question.
Since we've had the tool for a while and have fixed most of the bugs it has found, we are required to run static analysis on new code for the latest release now (i.e. we should not be dropping any new code that has any error in it found via static analysis).
Just like code reviews, unit testing, etc., it has proved useful and was added to the software development process.
Terrorist, bomb, al Qaeda, nuclear, yellowcake, kill, assassinate. Carnivore is dead... long live Echelon.
The Astrée static analyser (based on abstract interpretation) proved the absence of run-time errors in the primary control software of the Airbus A380.
Add me to the Yes column
We use them (PMD and FindBugs) for eliminating code that is perfectly valid, yet has bitten us in the past. Two Java examples are unsynchronized access to a static DateFormat object and using the commons IOUtils.copy() instead of IOUtils.copyLarge().
Most tools are easy to add to your build cycle and repay that effort after the first violation
Sigh. That bug wasn't from fixing the use of uninitialized memory, it was from being overzealous and "fixing" a second (valid, not flagged as bad by Valgrind) use of the the same function somewhere near the first use.
I think the actual details weren't very widely reported anyway. Apparently two statements were removed; one read from uninitialised memory, but the other was completely valid. Since the second one was responsible for most of the randomness, removing it reduced the keyspace to the point where it can be brute forced.
Is this what you were talking about?
http://it.slashdot.org/article.pl?sid=08/01/11/1818241
- doug
IIRC, they donated the results of their analysis, not the actual tool itself. If somebody *did* actually donate a free license to use their software to those projects, I missed it and would love a link to the details. :)
[b.belong('us') for b in bases if b.owner() == 'you']
We have had presentations from both Coverity and Klocwork at my workplace. I'm not entirely fond of them, but they're wayyyyy better than 'lint'. :) I much prefer using "Purify" whenever possible, since run-time analysis tends to produce fewer false-positives.
My comments would be:
(1) Klockwork & Coverity tend to produce a lot of "false positives". And by a lot, I mean, *A LOT*. For every 10000 "critical" bugs reported by the tool, only a handful may be really worth investigating. So you may spend a fair bit of time simply weeding through what is useful and what isn't.
(2) They're expensive. Coverity costs $50k for every 500k lines of code per year... We have a LOT more code than this. For the price, we could hire a couple of guys to run all of our tools through Purify *and* fix the bugs they found. Klocwork is cheaper; $4k per seat, minimum number of seats.
(3) They're slow. It takes several days running non-stop on our codebase to produce the static analysis databases. For big projects, you'll need to set aside a beefy machine to be a dedicated server. With big projects, there will be lots of bug information, so the clients tend to get bogged down, too.
In short: It all depends on how "mission critical" your code is; is it important, to you, to find that *one* line of code that could compromise your system? Or is your software project a bit more tolerant? (e.g., If you're writing nuclear reactor software, it's probably worthwhile to you to run this code. If you're writing a video game, where you can frequently release patches to the customer, it's probably not worth your while.)
Yes, only getpid(). Idiotic.
I've used the above two tools ... the IntelliJ IDEA IDE for Java development, and the Visual Studio plug-in Resharper for C# development ... and can't imagine living without them.
Of course, they provide a heck of a lot more than just static code analysis, but the ability to see all syntax errors in real time, and all logic errors (like potential null-references, dead code, unnecessary 'else' statements, etc, etc) saves way too much time, and has, in my experience, resulted in much better, more solid code. When you add on all the intelligent refactoring, vastly improved code navigation, and customizable code-generation features of these utilities, it's a no-brainer.
I wouldn't program without them.
- Spryguy
There are three kinds of people in this world: those that can count and those that can't
I believe Gendarme might be of some use. Just don't invoke the Portability assemblies and I can't see why it'd fail.
"You can either have software quality or you can have pointer arithmetic, but you cannot have both at the same time."
It is only one source for the entropy pool and the SSL "fix" was a Debian maintainer running valgrind on OpenSSL, finding a piece of code where uninitialized memory was accessed, "fixed" it and a "similar piece" and accidently removed all entropy from the pool. The result of that is, that all ssh-keys and ssl-certs created on Debian in the last 20 months are to be considered broken. (Debian Wiki SSLkeys on the scope and what to do)
FindBugs is becoming increasingly widespread on Java projects, for example. I found that between it and JLint I could identify a substantial chunk of problems caused by inexperienced programmers, poor design, hastily written code, etc. JLint was particularly nice for potential deadlocks, while FindBugs was good for just about everything else.
For example:
At least in the Java world, I wish more people would use them. It would make my job so much easier.
My experience in the Python world is that pylint is less interesting than FindBugs: many of the more interesting bugs are hard problems in a dynamically typed language and so it has more "religious style issues" built in that are easier to test for. It still provides a great deal of useful output once configured correctly, and can help enforce a consistent coding standard.
Integrate Keynote and LaTeX
My experience has been that while in the hands of people who know what they're doing, they're a nice tool to have, well, beware managers using their output as metrics. And beware even more a consultant with such a tool that he doesn't even understand.
// ignore }}" blocks a thousand times over in each "finally" block, when you can write it once and just call the method in your finally block. This tool had a trouble understanding that it _is_ all right. Unless it saw the "connection.close()" right there, in the finally block, it didn't count.
The thing is, these tools produce
A) a lot of "false positives", code which is really OK and everyone understand why it's ok, but the tool will still complain, and
B) usually includes some metrics of dubious quality at best, to be taken only as a signal for a human to look at it and understand why it's ok or not ok.
E.g., ne such tool, which I had the misfortune of sitting through a salesman hype session of, seemed to be really little more than a glorified grep. It really just looked at the source text, not at what's happening. So for example if you got a database connection and a statement in a "try" block, it wanted to see the close statements in the "finally" block.
Well, applied to an actual project, there was a method which just closed the connection and the statements supplied as an array. Just because, you know, it's freaking stupid to copy-and-paste cute little "if (connection != null) { try { connection.close(); } catch (SQLException e) {
Other examples include more mundane stuff like the tools recommending that you synchronize or un-synchronize a getter, even when everyone understands why it's OK for it to be as it is.
E.g., a _stateless_ class as a singleton is just an (arguably premature and unneded) speed optimization, because some people think they're saving so much by a singleton instead of the couple of cycles it takes to do a new on a class with no members and no state. It doesn't really freaking matter if there's exactly one of it, or someone gets a copy of it. But invariably the tools will make an "OMG, unsynchronized singleton" fuss, because they don't look deep enough to see if there's actually some state that must be unique.
Etc.
Now taken as something that each developper understands, runs on his own when he needs it, and uses his judgment of each point, it's a damn good thing anyway.
Enter the clueless PHB with a metric and chart fetish, stage left. This guy doesn't understand what those things are, but might make it his personal duty to chart some progress by showing how much fewer warnings he's got from the team this week than last week. So useless man-hours are spent on useless morphing perfectly good code, into something that games the tool. For each 1 real bug found, there'll be 100 harmless warnings that he makes it his personal mission to get out of the code.
Enter the snake-oil vendor's salesman, stage right. This guy only cares about selling some extra copies to justify his salary. He'll hype to the boss exactly the possibility to generate such charts (out of mostly false positives) and manage by such charts. If the boss wasn't already in a mind to do that management anti-pattern, the salesman will try to teach him to. 'Cause that's usually the only advantage that his expensive tool has over those open source tools that you mention.
I'm not kidding. I actually tried to corner one into;
Me: "ok, but you said not everything it flags there is a bug, right?"
Him: "Yes, you need to actually look at them and see if they're bugs or not."
Me: "Then what sense does it make to generate charts based on wholesale counting entities which may, or may not be bugs?"
Him: "Well, you can use the charts to see, say, a trend that you have less of them over time, so the project is getting better."
Me: "But they may or may not be actual bugs. How do you know if this week's mix has more or less actual bugs than last weeks, regardless of wh
A polar bear is a cartesian bear after a coordinate transform.
* I really like Insure, but it is difficult to set up on a system composed of many shared libraries. However, there are some bugs that really need run-time analysis to catch.
Short version:
There are real bugs, with huge consequences, that can be detected with static analysis.
The tools are easy to find and worth the price, depending on the customer base you have.
In the end, that cannot detect "all" bugs that could arise in the code.
Worth it?
Only you can decide, but after a few sessions learning why tools flag suspect code, if you take those suggest to heart, you will be a better coder.
The linux kernel developers use a tool originally written by Linux Torvalds for static analysis - sparse.
http://www.kernel.org/pub/software/devel/sparse/
Sparse has some features targeted at kernel development - for instance spotting mixing up kernel and user space pointers and a system of code annotations.
I haven't used it but I do see on the kernel mailing list that it regularly finds bugs.
Every man for himself, all in favour say "I"
Static analysis tools are common in the open source world. The lint name is well known enough that many projects make theirs a pun on it, ala lintian. A few years ago a local root exploit in X was discovered by running these sorts of checks. But generally, static analysis tools require human review -- with large code bases they generate large numbers of false positives, especially the dumber ones. This leads to trouble for perfectionists, a common trait among software developers interested in bug fix analysis. For example, the recent massive Debian vulnerability was caused by an overzealous developer trying to fix static analysis flags. One of these flags was valid, one was not, and removing both removed nearly all entropy from the RNG.
In the more general sense, static analysis cannot find all bugs. There's a trivial proof: a program stuck in an infinite loop is a bug, but finding all such loops would solve the halting problem. Handling interrupts and the like also causes reasoning problems, as it's very hard, if not computationally intractable, to prove multi-threaded software is safe. So static analysis won't rid the embedded world of watchdog timers and other software failure recovery crap.
I Browse at +4 Flamebait
Open Source Sysadmin
True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
It's not "error free", it's _run-time error free_. Which according to the GP's link means that no undefined behaviour according to the C standard or user added asserts may happen.
So for example, the program won't ever divide by zero or overflow an integer while summing.