Static Code Analysis Tools?
rewt66 asks: "We are looking for a good static analysis tool for a fairly large (half a million lines) C/C++ project. What tools do you recommend? What do you recommend avoiding? What experience (good or bad) have you had with such tools?"
I found the static analyser in SGI's Prodev Workshop to be quite excellent, though that was a while ago and I am comparing it with nothing - I'm not sure how it stacks up against more recent offerings :
r odev.html#B
http://www.sgi.com/products/software/irix/tools/p
Looks like it's IRIX only though, so YMMV, to put it mildly.
Max.
That's great and all, but some things just take a lot of code. Refactoring into libraries only goes so far, you're still going to have a ton of code, it'll just be split up in libraries. That's useful, and it's good advice, but since the poster didn't ask about it, you could at least give him the benefit of the doubt and assume the project is already organized appropriately. Half a million lines isn't that big, certainly not big enough to automatically assume their codebase is organized badly.
I just started looking at LLVM, maybe it is good for what you want.
http://llvm.org/
If you are on Windows, you can use the native C++ static analysis that comes with the Windows SDK. /analyze switch when invoking the compiler (cl.exe)
Just add the
It's the tool that is used by MS to test its own code, known internally as PreFast.
It helped me find many bugs in other people's code.
But FindBugs does not cover the C/C++ codebase...
C/C++ checkers:
http://www.coverity.com/ (commercial)
http://www.dwheeler.com/flawfinder/ (OSS)
India.
Work smarter, not harder.
I strongly suggest you look at coverity.
They have excellent checks as well as the best framework for creating custom tests that I have ever come across.
NOTE: I am not affiliated with coverity, just a very satisfied user.
LL
http://www.gimpel.com/html/lintinfo.htm/
I've never tried it for a code base as large as 500k. My guess it that I used it up to 15k. I was very pleased with it. I agreed with just about every warning it raised, and was able to easily suppress individual instances or whole classes of errors. I also found it somewhat easier to get started with compared to the big tools from Rational et al.
I think it's a bit pricey for a an open-source coder like me, but it should be cheap enough for a company with a tools budget.
wc project.c
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
Whatever you use, make sure you adjust the settings to only capture those problems that you think are critical. With 500k lines of code, unless your codebase is *extremely* solid running a Lint tool will result in a LOT of action items. I've used SPLINT (a lint for secure programming - http://www.splint.org/) in a project with a codebase much smaller than 500k and it took weeks to finish addressing all the issues - sometimes these things can be more of a curse than a blessing.
I work on a C/C++ code base that is a lot bigger than 500k lines. I've worked with results produced by Klocwork and also with the output from Reasoning. Both of these services/packages will cost you money but both provide good insight into your code. The commercial packages generally produce more focused results with less false-positives, so while they cost you money up front, your developers will spend less time weeding out the noise.
If paying money out for a commercial package isn't your thing, don't overlook the old standby lint or splint, an updated successor.
Also well worth investigating to see how your code is actually running is Valgrind and it's associated tools. The Valgrind toolkit will give you a good idea where memory is being leaked, where variables and pointers are going off the rails. Valgrind hooks into a running program, so it's important to make sure that you test all the corners of the codebase if you go this route.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
A coworker of mine who's quite a C/C++ jockey used it recently (this month), and said it's still very good.
There's nothing wrong with having lots of code in a project. A solution with 1000 libraries of 500 lines each is no better. Don't break stuff up just for the sake of not having a lot of code in a project. Break it up and refactor it if it NEEDS it for context/architecture/organization reasons.
I agree that much code is far longer than it needs to be, but I don't think it's fair to equate this with large projects.
IME, large projects (over a million lines, say) often get that way because they have been built around some sort of framework, and the boilerplate code pushes the line count up. When you get past a certain scale -- more than a handful of developers, or with the team split across multiple geographic locations, that sort of thing -- such frameworks can be very valuable in retaining a sane, structured overall design. Since most of it is typically generated rather than hand-crafted, it doesn't really impact on developer productivity; if anything, it helps it, by maintaining some kind of order in systems that are otherwise too large for any one individual to fully comprehend. (This assumes the framework is well designed and not itself wasteful and overcomplicated, of course.)
On the other hand, it is perfectly possible for a library that should take 1,000 lines in a couple of files to expand to 10,000 lines across five different files. This sort of thing can be a killer, with cluttered interfaces to modules, inefficient algorithms written in verbose style, and so on.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
One important thing to consider is the set of compilers, tools, target system, and build environments you are using. If you are using MS only products the you will most likely have very good support because most all source code analysis suits will simply import the build information and you will be off and running right away. If your environment is Unix or embedded systems then things may be more difficult because you will need to hook into the build process somehow. The scanner tools usually intercept the CC command from a "make" build and call their back end using their custom processing rather than the compiler proper. Different products do this in different ways so be sure the product you choose knows how to deal with your specific build environment. In my case I walked into another parties environment and needed to simulate a build for a new build environment that I had never seen before, every time. Not one environment ever looked like the next, so the setup and configuration was always a big challenge, just to get started.
Prexis is primarily a tool for life cycle scanning of source code for security issues. There are two ways to perform the code scanning, with either the main engine component which can schedule nightly scans and track progress over time or with the additional Prexis Pro utility, which is designed for quick assessments by the engineers on their own code without logging everything into the main database. The Pro tool worked best for my code assessments since I had no need for tracking changes over time, and it was a little easier to configure which counts for a lot in my situation.
PolySpace is a completely different tool with a different purpose from Prexis. PolySpace attempts to mathematically discover runtime flaws in the code while only using static analysis to do so. It does a great job on smaller projects, but because of the complexity and thoroughness of its analysis, it is somewhat slow. PolySpace needs to evaluate an entire application all at once in order to do a good analysis. If your .5 MSLOC of code is many separate programs/executables then you will be fine, but if you are talking about one huge monolithic application then you may have to evaluate it in chunks which just increases the false positives and forces the engineer to do more manual chasing of details to determine if the issue is really a problem or not. From what I have seen this product is in a class by itself.
BTW - keep you eyes on this site: http://samate.nist.gov/index.php/Main_Page
Many sucessful products are made up of around 500K lines of C++.
Most console computer games for example start at around 500K lines...
From their marketing blurb...
Understand, our flagship product, helps thousands of companies maintain impossibly large or complex amounts of source code. It parses source code for reverse engineering, automatic documentation, and calculating code metrics. We have versions for Ada 83, Ada 95, FORTRAN 77, FORTRAN 90, FORTRAN 95, Jovial, K&R C, ANSI C and C++, Delphi, and Java. Multi-million SLOC projects are common with our users.
Fortify a security static scanner and covers C/C++ as well as Java, JSP, .NET, C#, XML, CFML, PL/SQL and T-SQL.
And part == project?
.DLL is a project internally.
Is is SAP a single project, and are all those individual parts considered projects too? Perhaps a single
You seem to be missing the point that there is no clear definition or scale for a project, atleast not in the world outside of yours where every single compiled module seems to be a "project".
In real-life, a project may be anything from rebuilding an entire set of applications to fixing a typo in a batch file.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?