Do Static Source Code Analysis Tools Really Work?

← Back to Stories (view on slashdot.org)

Do Static Source Code Analysis Tools Really Work?

Posted by CmdrTaco on Monday May 19, 2008 @04:19AM from the if-you're-stupid-they-do dept.

jlunavtgrad writes "I recently attended an embedded engineering conference and was surprised at how many vendors were selling tools to analyze source code and scan for bugs, without ever running the code. These static software analysis tools claim they can catch NULL pointer dereferences, buffer overflow vulnerabilities, race conditions and memory leaks. Ive heard of Lint and its limitations, but it seems that this newer generation of tools could change the face of software development. Or, could this be just another trend? Has anyone in the Slashdot community used similar tools on their code? What kind of changes did the tools bring about in your testing cycle? And most importantly, did the results justify the expense?"

18 of 345 comments (clear)

Min score:

Reason:

Sort:

In Short, Yes by Nerdfest · 2008-05-19 04:24 · Score: 5, Informative

They're not perfect, and won't catch everything, but they do work. Combined with unit testing, you can get a very low bug rate. Many of these (for Java, at least) are open source, so the expense in negligible.
1. Re:In Short, Yes by Goaway · 2008-05-19 04:49 · Score: 5, Insightful
  
  You don't need to be perfect to be useful.
2. Re:In Short, Yes by FBSoftware · 2008-05-19 05:13 · Score: 5, Interesting
  
  Yes, I use the formal methods based SPARK tools (www.sparkada.com) for Ada software. In my experience, the Examiner (static analyzer) is always right (> 99.44% of the time) when it reports a problem or potential for runtime exception. Even without SPARK, the Ada language requires that the compiler itself accomplish quite a bit of static analysis. Using Ada, its less likely you will need third-party static analysis tool - just use a good compiler like GNAT.
3. Re:In Short, Yes by Entrope · 2008-05-19 05:24 · Score: 5, Informative
  
  My group at work recently bought one of these. They catch a lot of things that compilers don't -- for example, code like this:
  
  int array[4], count, ii; scanf("%d", &count); for (ii = 0; ii < count; ++ii) { scanf("%d", &array[ii]); }
  
  .. where invalid input causes arbitrarily bad behavior. They also tend to be better at inter-procedural analysis than compilers, so they can warn you that you're passing a short literal string to a function that will memcpy() from the region after that string. They do have a lot of false positives, but what escapes from compilers to be caught by static analysis tools tend to be dynamic behavior problems that are easy to overlook in testing. (If the problem were so obvious, the coder would have avoided it in the first place, right?)
4. Re:In Short, Yes by HalWasRight · 2008-05-19 05:54 · Score: 5, Informative
  
  valgrind, BoundsChecker, and I believe the others mentioned, are all run-time error checkers. These require a test case that execises the bug. The static analysis tools the poster was asking about, like those from Coverity and Green Hills, don't need test cases. They work by analyzing the actual semantics of the source code. I've found bugs with tools like these in code that was hard enough to read that I had to write test cases to verify that the tool was right. And it was! The bug would have caused an array overflow write under the right conditions.
  
  --
  "This mission is too important to allow you to jeopardize it." -- HAL
5. Re:In Short, Yes by Anonymous Coward · 2008-05-19 06:45 · Score: 5, Insightful
  
  The proper answer would be: No. A fully working static code analyzer would be like solving the Halting Problem, which has been proven to be impossible. Essentially you can just try to catch as many potential problems as you can, but you can never catch all. I hate it when the halting problem is trotted out as "proof" that formal verification is impossible. If you like to put intractable recursion in your code then you probably shouldn't be a programmer. (Maybe you could draft legislation instead.) In practice, you should be able to prove (at least informally) that your program halts when it's supposed to.
  
  The only real significance of the halting problem is to demonstrate that there can be some pretty absurd programs out there. It is not an indictment of static analyses. Nor is it an excuse to have less than total confidence in the correctness of your code.
6. Re:In Short, Yes by neokushan · 2008-05-19 11:38 · Score: 5, Funny
  
  I hope you realise I just spent a good 2mins googling around for an explanation of a for loop with 4 parts to it instead of the 3 I was used to seeing. I genuinely thought it was some special, relatively unknown and underused part of the C spec that I'd just not seen before.
  Then I realised it was just the HTML screwing up a less-than symbol. Then I felt a bit silly.
  Then I just had to tell someone....
  
  --
  +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Just like compiler warnings... by mdf356 · 2008-05-19 04:24 · Score: 5, Insightful

Here at IBM we have an internal tool from research that does static code analysis.

It has found some real bugs that are hard to generate a testcase for. It has also found a lot of things that aren't bugs, just like -Wall can. Since I work in the virtual memory manager, a lot more of our bugs can be found just by booting, compared to other domains, so we didn't get a lot of new bugs when we started using static analysis. But even one bug prevented can be work multiple millions of dollars.

My experience is that, just like enabling compiler warnings, any way you have to find a bug before it gets to a customer is worth it.

--
Terrorist, bomb, al Qaeda, nuclear, yellowcake, kill, assassinate. Carnivore is dead... long live Echelon.
OSS usage by MetalliQaZ · 2008-05-19 04:25 · Score: 5, Insightful

If I remember correctly, one of these companies donated their tool to many open source projects, including Linux and the BSDs. I think it led to a wave of commits as 'bugs' were fixed. It seemed like a pretty good endorsement to me...

--
"Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
They do work by Anonymous Coward · 2008-05-19 04:29 · Score: 5, Interesting

Static analysis does catch a lot of bugs. Mind you, it's no silver bullet, and frankly it's better, given the choice, to target a language+environment that doesn't suffer problems like dangling pointers in the first place (null pointers, however, don't seem to be anything Java or C# are really interested in getting rid of).

Even lint is decent -- the trick is just using it in the first place. As for expense, if you have more than, oh, 3 developers, they pay for themselves by your first release. Besides, many good tools such as valgrind are free (valgrind isn't static, but it's still useful).
Yes, they work. by Anonymous Coward · 2008-05-19 04:32 · Score: 5, Insightful

You will probably be amazed at what you will catch with static analysis. No, it's not going to make your program 100% bug-free (or even close), but every time I see code dies on an edge case that would've been caught with static analysis, it makes me want to kill a kitten (and I'm totally a "cat person" mind you).

Static analyzers will catch the stupid things - edge cases that fail to initialize a var, but then lead straight to de-referencing it; memory leaks on edge-case code paths, etc. that shouldn't happen but often do, and get in the way of find real bugs in your program logic.
Of course they can work by Idaho · 2008-05-19 04:33 · Score: 5, Interesting

Such tools work in a very similar way to what is already being done in many modern language compilers (such as javac). Basically, they implement semantic checks that verify whether the program makes sense, or is likely to work as intended in some respect. For example, they will check for likely security flaws, memory management/leaking or synchronisation issues (deadlock, access to shared data outside critical sections, etc.), or other kind of checks that depend on whatever domain the tool is intended for.

It would probably be more useful if you could state which kind of problem you are trying to solve and which tools you are considering to buy. That way, people who have experience with them could suggest which work best :)

--
Every expression is true, for a given value of 'true'
Re:Yes. by Anonymous Coward · 2008-05-19 04:37 · Score: 5, Informative

Sigh. That bug wasn't from fixing the use of uninitialized memory, it was from being overzealous and "fixing" a second (valid, not flagged as bad by Valgrind) use of the the same function somewhere near the first use.
Re:Yes. by Anonymous Coward · 2008-05-19 04:38 · Score: 5, Informative

I think the actual details weren't very widely reported anyway. Apparently two statements were removed; one read from uninitialised memory, but the other was completely valid. Since the second one was responsible for most of the randomness, removing it reduced the keyspace to the point where it can be brute forced.
Coverity & Klocwork by Anonymous Coward · 2008-05-19 04:50 · Score: 5, Informative

We have had presentations from both Coverity and Klocwork at my workplace. I'm not entirely fond of them, but they're wayyyyy better than 'lint'. :) I much prefer using "Purify" whenever possible, since run-time analysis tends to produce fewer false-positives.

My comments would be:

(1) Klockwork & Coverity tend to produce a lot of "false positives". And by a lot, I mean, *A LOT*. For every 10000 "critical" bugs reported by the tool, only a handful may be really worth investigating. So you may spend a fair bit of time simply weeding through what is useful and what isn't.

(2) They're expensive. Coverity costs $50k for every 500k lines of code per year... We have a LOT more code than this. For the price, we could hire a couple of guys to run all of our tools through Purify *and* fix the bugs they found. Klocwork is cheaper; $4k per seat, minimum number of seats.

(3) They're slow. It takes several days running non-stop on our codebase to produce the static analysis databases. For big projects, you'll need to set aside a beefy machine to be a dedicated server. With big projects, there will be lots of bug information, so the clients tend to get bogged down, too.

In short: It all depends on how "mission critical" your code is; is it important, to you, to find that *one* line of code that could compromise your system? Or is your software project a bit more tolerant? (e.g., If you're writing nuclear reactor software, it's probably worthwhile to you to run this code. If you're writing a video game, where you can frequently release patches to the customer, it's probably not worth your while.)
In short, YMMV by Moraelin · 2008-05-19 05:05 · Score: 5, Informative

My experience has been that while in the hands of people who know what they're doing, they're a nice tool to have, well, beware managers using their output as metrics. And beware even more a consultant with such a tool that he doesn't even understand.

The thing is, these tools produce

A) a lot of "false positives", code which is really OK and everyone understand why it's ok, but the tool will still complain, and

B) usually includes some metrics of dubious quality at best, to be taken only as a signal for a human to look at it and understand why it's ok or not ok.

E.g., ne such tool, which I had the misfortune of sitting through a salesman hype session of, seemed to be really little more than a glorified grep. It really just looked at the source text, not at what's happening. So for example if you got a database connection and a statement in a "try" block, it wanted to see the close statements in the "finally" block.

Well, applied to an actual project, there was a method which just closed the connection and the statements supplied as an array. Just because, you know, it's freaking stupid to copy-and-paste cute little "if (connection != null) { try { connection.close(); } catch (SQLException e) { // ignore }}" blocks a thousand times over in each "finally" block, when you can write it once and just call the method in your finally block. This tool had a trouble understanding that it _is_ all right. Unless it saw the "connection.close()" right there, in the finally block, it didn't count.

Other examples include more mundane stuff like the tools recommending that you synchronize or un-synchronize a getter, even when everyone understands why it's OK for it to be as it is.

E.g., a _stateless_ class as a singleton is just an (arguably premature and unneded) speed optimization, because some people think they're saving so much by a singleton instead of the couple of cycles it takes to do a new on a class with no members and no state. It doesn't really freaking matter if there's exactly one of it, or someone gets a copy of it. But invariably the tools will make an "OMG, unsynchronized singleton" fuss, because they don't look deep enough to see if there's actually some state that must be unique.

Etc.

Now taken as something that each developper understands, runs on his own when he needs it, and uses his judgment of each point, it's a damn good thing anyway.

Enter the clueless PHB with a metric and chart fetish, stage left. This guy doesn't understand what those things are, but might make it his personal duty to chart some progress by showing how much fewer warnings he's got from the team this week than last week. So useless man-hours are spent on useless morphing perfectly good code, into something that games the tool. For each 1 real bug found, there'll be 100 harmless warnings that he makes it his personal mission to get out of the code.

Enter the snake-oil vendor's salesman, stage right. This guy only cares about selling some extra copies to justify his salary. He'll hype to the boss exactly the possibility to generate such charts (out of mostly false positives) and manage by such charts. If the boss wasn't already in a mind to do that management anti-pattern, the salesman will try to teach him to. 'Cause that's usually the only advantage that his expensive tool has over those open source tools that you mention.

I'm not kidding. I actually tried to corner one into;

Me: "ok, but you said not everything it flags there is a bug, right?"

Him: "Yes, you need to actually look at them and see if they're bugs or not."

Me: "Then what sense does it make to generate charts based on wholesale counting entities which may, or may not be bugs?"

Him: "Well, you can use the charts to see, say, a trend that you have less of them over time, so the project is getting better."

Me: "But they may or may not be actual bugs. How do you know if this week's mix has more or less actual bugs than last weeks, regardless of wh

--
A polar bear is a cartesian bear after a coordinate transform.
Coverity Prevent Rocks by Arakageeta · 2008-05-19 05:06 · Score: 5, Informative

My large C/C++ project (2,000,000+ SLOC) started using Coverity Prevent about a year ago. Its results have truly been invaluable. We simply have too much code for standard human code reviews or for detailed run-time coverage analysis (ex. Insure* or valgrind). Prevent has caught many programming errors (some extremely obscure and/or subtle) and saved use a ton of money and time.
* I really like Insure, but it is difficult to set up on a system composed of many shared libraries. However, there are some bugs that really need run-time analysis to catch.
I second valgrind by jberryman · 2008-05-19 06:24 · Score: 5, Funny

There's also valgrind, for Linux users
It's great for finding all those elusive bits of code that might be accidentally seeding a pseudo-random number generator somewhere.