Static Code Analysis Tools?

Ideas by tomstdenis · 2007-03-30 00:35 · Score: -1, Offtopic

1. If you have 500k lines in a single project, consider re-factoring it into separate libraries that you can divide and conquer. Also, if you have 500k lines of code, consider cleaning it up, re-factoring it, etc. Fewer lines of code is more impressive than more.

2. Google for David Wagner and David Molnar, they seem to be up on that sort of work.

--
Someday, I'll have a real sig.

Re:Ideas by Anonymous Coward · 2007-03-30 00:51 · Score: 4, Insightful

1. If you have 500k lines in a single project, consider re-factoring it into separate libraries that you can divide and conquer. Also, if you have 500k lines of code, consider cleaning it up, re-factoring it, etc. Fewer lines of code is more impressive than more.

That's great and all, but some things just take a lot of code. Refactoring into libraries only goes so far, you're still going to have a ton of code, it'll just be split up in libraries. That's useful, and it's good advice, but since the poster didn't ask about it, you could at least give him the benefit of the doubt and assume the project is already organized appropriately. Half a million lines isn't that big, certainly not big enough to automatically assume their codebase is organized badly.
Re:Ideas by Anonymous Coward · 2007-03-30 01:01 · Score: 0

If the original poster were using "project" in the context of "Visual Studio C/C++ .dsp files", you're correct. That would be insane. But there's nothing in the original post to indicate that he was using project in that specific sense. I'd be more inclined to believe that he meant "project" in the more general sense:

1. something that is contemplated, devised, or planned; plan; scheme.
2. a large or major undertaking, esp. one involving considerable money, personnel, and equipment.
Re:Ideas by tomstdenis · 2007-03-30 01:02 · Score: 1

While yes, some things take a lot of code, but more often than not the excess code is a result of new coders contributing to a project for which they don't really have a grasp of the big picture. So they re-invent the wheel or add way much more to what should be a simple task.

For example, I worked on DB2 for a while. I routinely saw 3000 line files that implement such complicated things as hash lists. Then there was another 2000 line file that performs modular reduction in a dozen different ways because they didn't want to use a hash to sort their data into buckets, etc... Not saying DB2 is shite (cuz I never really used it I can't say anyways), but if DB2 were written properly and with an eye towards code size, it'd be probably 1/4th the size if not smaller.

If people bragged about the fewest lines of code with the most functionality, maybe we'd not be buying gigs of ram to run an OS ...

To me, when I hear that someone worked on a project with 10M lines of code or whatever, I'm rarely impressed. Not only because most likely they were a small player in a huge project, but that chances are the 10M line program is 10x larger than it needs to be.

Tom

--
Someday, I'll have a real sig.
Re:Ideas by tomstdenis · 2007-03-30 01:06 · Score: 1

I assumed that he meant lines of C and/or C++ code.

Look at something like my LibTomCrypt. It covers a wide range of cryptographic algorithms, it's only ~48K lines of code, quite a bit of which are tables for the ciphers/hashes. There are also plenty of comments, etc. Of actual code there is probably only ~30K or so.

And in that 30K I do symmetric ciphers, hashes, prngs, MACs, RSA (with PKCS #1), ECC (DSA/DH), DSA (DSS) and a decent subset of ASN.1.

Would it be more impressive if I did all that in 100K lines?

--
Someday, I'll have a real sig.
Re:Ideas by mwvdlee · 2007-03-30 01:08 · Score: 1

Would you consider the entirety of Windows a single "project"?

How DO you define the a project?

Perhaps it's already split into numerous sub-projects with even more sub-sub-projects.

I've seen a project where large quantities of source code was automatically recompiled with a new compiler. That single project easily had several million lines of code ;)

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Ideas by tomstdenis · 2007-03-30 01:15 · Score: 1

Windows is not a single "project". It's comprised of dozens of applications, hundreds of libraries (DLLs), and hundreds of drivers.

Would you consider a Fedora Core installation a single project? No, it's the amalgamation of hundreds of independent OSS projects.

No one DLL or application should be 500k lines of code. If it is, it's either a lot of tables, or shitty code that finds new and inventive ways of doing things you don't need.

Tom

--
Someday, I'll have a real sig.
Re:Ideas by ThePhilips · 2007-03-30 01:36 · Score: 1

The key here is that once some piece of (relatively) independent code is in library, you can make a test suit for it.
After any change committed to library run local test and see does it work or not.
The approach does miracles to reusability and maintainability of code.

--
All hope abandon ye who enter here.
Re:Ideas by Anonymous Coward · 2007-03-30 01:58 · Score: 0

But this is a bad example, because crypto algorithms are often deliberately designed to have small code sizes, the core algorithms for most symmetric ciphers and hashes are pretty trivial (the math is fantastic, the implementation is mechanical and generally a few hundred lines for each algorithm or mode). No, it really wouldn't be impressive if it took you 100K lines.

Write me a general purpose audio decoder supporting mp3, flac, vorbis, wav, and apple aac in under 30K lines and I'll be impressed. I bet any one of those encodings would take more than 30K lines to implement (well, not .wav, but the others would) even if you don't count comments or whitespace.
Re:Ideas by tomstdenis · 2007-03-30 02:05 · Score: 1

why would you put mp3/flac/vorbis/etc in the same project? Why not just link them in like you're supposed to? As for mp3 codecs [and probably vorbis] most of that is unrolled DCT like transforms and tables.

That's I think part of the problem, people think they have to have all of the source in one build to make a project.

A hello world program execution is the result of a kernel, shell, standard C library, etc... none of which you count as lines of code in the program.

Tom

--
Someday, I'll have a real sig.
Re:Ideas by j00r0m4nc3r · 2007-03-30 02:20 · Score: 2, Insightful

There's nothing wrong with having lots of code in a project. A solution with 1000 libraries of 500 lines each is no better. Don't break stuff up just for the sake of not having a lot of code in a project. Break it up and refactor it if it NEEDS it for context/architecture/organization reasons.
Re:Ideas by Anonymous+Brave+Guy · 2007-03-30 02:29 · Score: 2, Insightful

I agree that much code is far longer than it needs to be, but I don't think it's fair to equate this with large projects.

IME, large projects (over a million lines, say) often get that way because they have been built around some sort of framework, and the boilerplate code pushes the line count up. When you get past a certain scale -- more than a handful of developers, or with the team split across multiple geographic locations, that sort of thing -- such frameworks can be very valuable in retaining a sane, structured overall design. Since most of it is typically generated rather than hand-crafted, it doesn't really impact on developer productivity; if anything, it helps it, by maintaining some kind of order in systems that are otherwise too large for any one individual to fully comprehend. (This assumes the framework is well designed and not itself wasteful and overcomplicated, of course.)

On the other hand, it is perfectly possible for a library that should take 1,000 lines in a couple of files to expand to 10,000 lines across five different files. This sort of thing can be a killer, with cluttered interfaces to modules, inefficient algorithms written in verbose style, and so on.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:Ideas by tomstdenis · 2007-03-30 02:30 · Score: 1

Chances are very good that if you have >100K lines of code, and they're not all tables, or just plain wasted white space, that you have functionality that can be broken off and re-used through a library. Do you even know what 500K lines of code is? That's a ridiculous amount of code.

If you look at things like the kernel or GCC they're already split up into mini libraries inside the host project. So yeah, all of GCC may be several million lines of code (I don't know the exact numbers) but it's not just one project.

By "project" I mean a work task not a .dsp file. As in, there isn't one person who actively works on the *entire* GCC code base on a routine basis. Most people focus on a specific part of it. So if you're part of the project is a separate block of 10K lines of code, you don't claim you're actually working on 20M lines of code.

So if this guy is actively working on a 500K line project, as in, he's actively developing parts of the entire 500K of code, chances are he needs to refactor the code and look at the design documents again. Most huge projects start off with the requisite support and grow into a final application.

For example, suppose you were writing a winamp clone from scratch. You'd start with the mp3 decoder, test that out, once it's working, package it up, then start on the output plugin, once they're working, package that up, then on the gui, etc... And if you actually look at Winamp, it's a central exe with a bunch of DLLs that do the grunt work. I seriously doubt Justin would on a daily basis be working on code from every aspect of the project. Likely, things like the MP3 DLL sat untouched for months on end [if not longer].

The point is, you'd test/verify the portions of the code as they're written. you wouldn't be looking at the entire mess of 500K lines all at once. That's just unmanageable from a verification point of view.

Tom

--
Someday, I'll have a real sig.
Re:Ideas by tomstdenis · 2007-03-30 02:35 · Score: 1

Part of IBMs problem is turnaround. Many of the developers are new to DB2 and fresh out of uni. The hash template I saw was a prime example of "I found this in a textbook somewhere." It was completely overkill since it's only used to hash array of bytes (why a template?) and the montgomery reduction used to perform the bucketing is not needed since the hash is invoked only upon startup/shutdown.

Whoever wrote that code obviously failed "problem statement" 101. Worse yet, the code had bugs in it and wasn't being maintained. I don't mean to pick on IBM, I'm sure this happens everywhere else. And while most of the folk there are smart, and experienced, the code I saw didn't reflect a growing concern over code size.

We can see this in OSS as well. For example, OpenOffice. Not only do they include their own copies of shared objects, but it's a mix of java, python, perl, C and C++. All in one application. OpenOffice for all it's virtues is a SHITTY PROGRAM that no sane proper experienced developer would have come up with.

--
Someday, I'll have a real sig.
Re:Ideas by Anonymous Coward · 2007-03-30 02:40 · Score: 0

If you look at things like the kernel or GCC they're already split up into mini libraries inside the host project. So yeah, all of GCC may be several million lines of code (I don't know the exact numbers) but it's not just one project.
GCC (and binutils) is a very, very bad example of a "clean" codebase. Plenty of thousand-lines-long functions and files go into the range of tens of thousands of lines. Not to mention that the whole thing is an unnavigable rats-nest with heaps of things you just have to know to make it work.
Re:Ideas by tomstdenis · 2007-03-30 02:42 · Score: 1

I didn't say it's good, I said it's being refactored. It may or may not get better but it's a start. GCC at least has some comments in the code. Which is more than I can say for most other OSS.

Anyways, point being, you shouldn't have 500K lines in any single part of a project. It makes testing and verification impossible

--
Someday, I'll have a real sig.
Re:Ideas by Anonymous+Brave+Guy · 2007-03-30 03:07 · Score: 1

No one DLL or application should be 500k lines of code. If it is, it's either a lot of tables, or shitty code that finds new and inventive ways of doing things you don't need.

That's a very bold statement! Is there some reason for adopting that particular magic number?

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:Ideas by tomstdenis · 2007-03-30 03:15 · Score: 1

500K came from the summary.

Frankly, I'd be disappointed if any one part was larger than 100K lines of code.

Tom

--
Someday, I'll have a real sig.
Re:Ideas by tjwhaynes · 2007-03-30 08:43 · Score: 1

Part of IBMs problem is turnaround. Many of the developers are new to DB2 and fresh out of uni. The hash template I saw was a prime example of "I found this in a textbook somewhere." It was completely overkill since it's only used to hash array of bytes (why a template?) and the montgomery reduction used to perform the bucketing is not needed since the hash is invoked only upon startup/shutdown.
I have to stop you there. Turnaround on DB2 developers, at least in my area, is almost zero. Most of the developers around me who have 5 or more years experience, some having been with the project for 20 plus years.
Now we do hire a fair number of IIP students each year for 16 months sessions - maybe you were surrounded by students.
In my experience, DB2 concentrates on functionality, stability and performance. Code-size is tackled when it impacts one of those areas and is otherwise unimportant.
Cheers,
Toby Haynes

--
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
Re:Ideas by Anonymous Coward · 2007-03-30 14:32 · Score: 1, Insightful

Who said anything about putting all of the source into one build? The OP said he had 500Ksloc of code that were used to build his project. He never said it was all in one module! It seems to me you (and just about everyone else) have made a ridiculous assumption about his codebase and just run with it.

I've seen projects in the 1.5Msloc range, but they were broken down into 1500+ different modules to make them managable. It was all homegrown because there were no free or commercial alternatives to any of the pieces. But when it came to enforcing code quality measurements and reviews we applied the same standards to all homegrown components. From the program manager's perspective we were applying tools to a "1.4 Msloc project". It doesn't mean we kept all of the source in the same .c file, or even in the same directory, or even controlled by a single Makefile. It is much more likely that the OP is in a similar situation.

I'm curious, do you think the Linux kernel and/or any of the BSDs are badly managed projects just because they have a lot of source? Or do you not consider the Linux kernel a single project even though you can build it with a single "make" command?

As for the codecs, both libflac and libvorbis are in the 40-50Ksloc range, not counting testing code. If I want to write an audio app unencumbered by the GPL I'd need *something* to replace that code. How I organize the build doesn't change the fact that there will be an extra 80-100Ksloc of code that goes into my app that will need to be vetted and maintained. For all practical purposes the code size of the project just went up, even though the modules are separate. In an ideal world those modules could be reused by other projects, but in reality there are many cases where we modularize the code for maintainability but never see any demand for reuse. Crypto and A/V codecs probably end up with high reusability, the software inside an F22 probably does not.
Re:Ideas by jgrahn · 2007-03-31 06:07 · Score: 1

OpenOffice for all it's virtues is a SHITTY PROGRAM that no sane proper experienced developer would have come up with.

I don't doubt OO is shitty -- I wouldn't poke it with a stick. But one important thing to realize is that smart people end up writing shitty programs all the time.
For example, I once tested an API that was obviously designed and written by utter morons. Yet each time I had to talk to one of the programmers, or their manager, I was pleasantly surprised. They were smart, committed, had the right knowledge, and often happily admitted that the result sucked.
The key in these cases is: how are the projects run, by whom, and why?

FindBugs by Bodhidharma · 2007-03-30 00:41 · Score: 0, Offtopic

For Java programming, I use FindBugs. I mostly use it through an Eclipse plugin.

--
A dyslexic man walks into a bra.

Re:FindBugs by Anonymous Coward · 2007-03-30 01:00 · Score: 2, Informative

But FindBugs does not cover the C/C++ codebase...

C/C++ checkers:
http://www.coverity.com/ (commercial)
http://www.dwheeler.com/flawfinder/ (OSS)
Re:FindBugs by EvanED · 2007-03-30 06:04 · Score: 1

One more in the same genre as coverity: http://www.grammatech.com/products/codesonar/overv iew.html

Disclaimer: I have never used this tool and actually know relatively little about it. However, my current research uses other software the same company makes (CodeSurfer) and is very much tied to this company, and I have an internship with them this summer. The company was started by my adviser and his adviser, employs a couple former advisees of my adviser, etc.

Cast ? by Anonymous Coward · 2007-03-30 00:43 · Score: 0

http://www.castsoftware.com/

Prodev Workshop by dwater · 2007-03-30 00:43 · Score: 3, Informative

I found the static analyser in SGI's Prodev Workshop to be quite excellent, though that was a while ago and I am comparing it with nothing - I'm not sure how it stacks up against more recent offerings :

http://www.sgi.com/products/software/irix/tools/pr odev.html#B

Looks like it's IRIX only though, so YMMV, to put it mildly.

--
Max.

LLVM by klahnako · 2007-03-30 00:53 · Score: 2, Informative

I just started looking at LLVM, maybe it is good for what you want.

http://llvm.org/

PreFast by Yakust · 2007-03-30 00:58 · Score: 2, Informative

If you are on Windows, you can use the native C++ static analysis that comes with the Windows SDK.
Just add the /analyze switch when invoking the compiler (cl.exe)
It's the tool that is used by MS to test its own code, known internally as PreFast.
It helped me find many bugs in other people's code.

Re:PreFast by cookd · 2007-03-31 10:26 · Score: 1

/analyze is pretty good. If you're using one of the more expensive editions of Visual Studio, support for /analyze is built into the IDE and very convenient.

With the latest versions of the Windows SDK, /analyze becomes much more powerful. /analyze has built-in models for the behavior of some CRT-defined functions, but all other functions are black boxes. The newest CRT and Windows SDK headers (as well as any .h files generated by a recent version of MIDL) have all been annotated with "SAL" annotations that tell PREfast or /analyze how to model their behavior. For example, here is strncpy:

__RETURN_POLICY_DST char*
strncpy(
__out_ecount(_Count) char _Dest,
__in_z const char * _Source,
__in size_t _Count
);

SAL annotations are tags like "__in" that are #define'd to nothing for normal compilation, but are understood by /analyze and PREfast as indicating constraints on the parameters passed to a function and the function's return values. If one path through your code calls strncpy(dest, src, 45) when dest == NULL or dest == char[44], /analyze will flag an error.

PREfast has an extensive plugin system that is missing from /analyze. In addition, /analyze is not configurable (it's either on or off, nothing in between). Finally, because it is not configurable, some classes of warnings that are often false positives have been disabled for /analyze. But /analyze is still incredibly useful.

--
Time flies like an arrow. Fruit flies like a banana.

Static analysis tool? by Moggyboy · 2007-03-30 01:03 · Score: 5, Funny

India.

--
Work smarter, not harder.

Re:Static analysis tool? by Fujisawa+Sensei · 2007-03-30 01:08 · Score: 2, Insightful

India

That may be part of the problem. Cheap junior programmers from India doing cut'n paste coding.

--
If someone is passing you on the right, you are an asshole for driving in the wrong lane.
Re:Static analysis tool? by Moggyboy · 2007-03-30 01:14 · Score: 1

Tell me about it. I spent two years of my life fixing a "Bangalore Special". I meant it as a joke, albeit a poor one.

--
Work smarter, not harder.
Re:Static analysis tool? by tfinniga · 2007-03-30 02:34 · Score: 4, Funny

doing cut'n paste coding
Seriously, that's a huge problem. All of a sudden your code stops working, and when you check it out, it's all missing.
"Sorry, I needed it somewhere else."

Copy and paste coding is much better.

--
Powered by Web3.5 RC 2

Coverity by LLuthor · 2007-03-30 01:06 · Score: 4, Informative

I strongly suggest you look at coverity.

They have excellent checks as well as the best framework for creating custom tests that I have ever come across.

NOTE: I am not affiliated with coverity, just a very satisfied user.

--
LL

Re:Coverity by Anonymous Coward · 2007-03-30 01:35 · Score: 0

Do you know what the license cost is?
Re:Coverity by 644bd346996 · 2007-03-30 02:10 · Score: 1

Almost certainly, since he has made use of the product. The fact that he recommends it implies that he considers it well worth the cost.
Re:Coverity by greenrom · 2007-03-30 03:31 · Score: 1

I too have used Coverity, but I wasn't as impressed with it. Especially considdering the price. It is better than lint, but it's not that much better. Expect to get a lot of false positives.

We used it once on a large set of code from a company we acquired. Since none of us were very familiar with the code, and the code had a lot of stability problems, the thought was that it might help us find some of the more elusive bugs and improve the stability of the software.

Coverity did find a lot of "problems". But most of those problems weren't bugs. It mainly found things like indexing into an array without explicitly checking the range of the index or variables that could be left uninitialized depending on the sequence of the code. About 90% of the time, we'd look at the code and quickly see what it was complaining about and also realize that what it was complaining about could never actually happen. That said, I don't think it's any worse than any other static code analysis tool I've ever used.

The one tool I found that was actually very useful in tracking down bugs is Rational Purify. It's not a static code analysis tool, but depending on what you want to do, it might meet your needs better. It puts a ton of runtime checks in your executable. If you're able to generate test cases that exercise most paths in your code, it will let you hone in in the real problems more quickly than a static code analysis tool. I've found it's great for those kinds of bugs where something corrupts something in memory which then causes a failure much later. It does use a lot of CPU time, though, so if your application is very CPU intensive, it might be too slow to be useful.
Re:Coverity by tlhIngan · 2007-03-30 03:39 · Score: 1

If you're a business, there's also KlocWork which seems to work well enough. Bit pricey and can't be installed for home use, but enterprise use is quite nice (hint: competitor to Coverity). I heard they may offer F/OSS scanning as well - one of the nice things is that you can disable a warning on a block of code once it's been verified as a false positive so a subsequent scan won't bring it up again.
Re:Coverity by johnnliu · 2007-03-30 10:01 · Score: 1

While not disputing coverity's features, I feel you should discuss other tools you've used in comparison to coverity and describe why you had the conclusion of "the best framework for creating custom tests that I have ever come across".

FlexeLint / PC-lint by DoofusOfDeath · 2007-03-30 01:07 · Score: 4, Informative

http://www.gimpel.com/html/lintinfo.htm/

I've never tried it for a code base as large as 500k. My guess it that I used it up to 15k. I was very pleased with it. I agreed with just about every warning it raised, and was able to easily suppress individual instances or whole classes of errors. I also found it somewhat easier to get started with compared to the big tools from Rational et al.

I think it's a bit pricey for a an open-source coder like me, but it should be cheap enough for a company with a tools budget.

Re:FlexeLint / PC-lint by demo · 2007-03-30 01:20 · Score: 1

Yeah, I'd recommend this too.

--
---
Re:FlexeLint / PC-lint by McGregorMortis · 2007-03-30 01:39 · Score: 1

Pricey? Well, it's not free, but it's almost free compared to Coverity or high-end tools like that. And it really does some very clever checks. You get a lot of bang for your static analysis buck.

I've been using PC-Lint for over 10 years now. I think it's made me a better programmer.

I love PC-Lint, but I really do wish its handling of C++ was better. It was really rough at first, generating kinds of false errors on even the most harmless-looking template code. It's better now, but it still has a lot of trouble with the Boost libraries. Boost pushes C++ to the uttermost limits of what is legal, and PC-Lint chokes on a great deal of it.

I love PC-Lint, and I love Boost, and it breaks my heart that they can't get along better.
Re:FlexeLint / PC-lint by JDisk · 2007-03-30 02:12 · Score: 2, Informative

I have to agree with this recommendation (Gimpel lint).

A few points, though:

- It is purely text-based, so if you are looking for a shiny GUI-based tool (easier to sell to the PHB), you are out of luck.

- depending on the quality of your code, running it for the first time can result in a huge (make that HUGE) amount of warnings. You might want to start small and only turn on more and more options later. Initially, you will have to invest quite a bit of time to get your code "lint-clean". In the long run, this is well worth it.
Re:FlexeLint / PC-lint by Anonymous+Brave+Guy · 2007-03-30 02:22 · Score: 1

I'd agree with the recommendation, and FWIW I work on a project with over 1,000,000 lines of C++ code.

I also agree with the warnings from others about Lint being a bit verbose until you shut off a few stylistic things you might not care about, which fortunately is easy to do.

I also also agree with the caveat about false positives with non-trivial C++ code: sometimes it just plain misunderstands and gives incorrect warning/error messages. It's been improving steadily in recent versions, though, and the version we use is a little out of date now so I imagine the latest version is unlikely to cause much irritation here unless you're really pushing the frontiers with template metaprogramming and the like.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:FlexeLint / PC-lint by SWestrup · 2007-03-31 15:09 · Score: 1

I have to agree. I fell in love with Gimpel lint years ago and now I always suggest it as a useful tool whenever I enter a new coding shop.

I only wish the Linux version was as cheap as the Windows one, so I could afford to buy a copy.
Re:FlexeLint / PC-lint by GolfBoy · 2007-03-31 16:34 · Score: 1

Ditto this. Used it on ~850,000 lines of code. Takes some doing to get it configured to flag what you want, and not what you want to ignore. But a great tool. Customer support was fantastic. Reported a bug on ATL template analysis and it was fixed within 2 weeks.

All the statistics I ever use by Bromskloss · 2007-03-30 01:31 · Score: 3, Funny

wc project.c

--
Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities

For finding duplicated code... by tcopeland · 2007-03-30 01:34 · Score: 0

...which, in a 500K LOC program, there may be a bit of, try the copy/paste detector, CPD. There's a chapter on CPD in my PMD book, too...

--
The Army reading list

Splint by DaveCar · 2007-03-30 01:35 · Score: 1

http://www.splint.org/

END OF LINE

Careful what you wish for by pdovy · 2007-03-30 01:41 · Score: 2, Informative

Whatever you use, make sure you adjust the settings to only capture those problems that you think are critical. With 500k lines of code, unless your codebase is *extremely* solid running a Lint tool will result in a LOT of action items. I've used SPLINT (a lint for secure programming - http://www.splint.org/) in a project with a codebase much smaller than 500k and it took weeks to finish addressing all the issues - sometimes these things can be more of a curse than a blessing.

C and C++ Static Analysis tools by tjwhaynes · 2007-03-30 01:47 · Score: 2, Informative

I work on a C/C++ code base that is a lot bigger than 500k lines. I've worked with results produced by Klocwork and also with the output from Reasoning. Both of these services/packages will cost you money but both provide good insight into your code. The commercial packages generally produce more focused results with less false-positives, so while they cost you money up front, your developers will spend less time weeding out the noise.

If paying money out for a commercial package isn't your thing, don't overlook the old standby lint or splint, an updated successor.

Also well worth investigating to see how your code is actually running is Valgrind and it's associated tools. The Valgrind toolkit will give you a good idea where memory is being leaked, where variables and pointers are going off the rails. Valgrind hooks into a running program, so it's important to make sure that you test all the corners of the codebase if you go this route.

Cheers,
Toby Haynes

--
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.

Re:C and C++ Static Analysis tools by DoofusOfDeath · 2007-03-30 02:25 · Score: 1

Valgrind hooks into a running program, so it's important to make sure that you test all the corners of the codebase if you go this route.

One minor clarification: valgrind can't attach to an already-running program the way a debugger can. Valgrind is actually an x86 emulator, so you have to ask valgrind to execute your program from the very beginning.
Re:C and C++ Static Analysis tools by SkunkPussy · 2007-03-30 05:47 · Score: 1

Are there any tools that perform a similar function to valgrind but on windows?

--
SURELY NOT!!!!!
Re:C and C++ Static Analysis tools by jgrahn · 2007-03-31 05:47 · Score: 1

Are there any tools that perform a similar function to valgrind but on windows?

If valgrind isn't available on Windows (I wouldn't know, or care), there's the always the classic, Rational Purify. It's probably expensive.

Purify by Khelder · 2007-03-30 02:02 · Score: 2, Informative

I'm happy to say I used C/C++ heavily for quite a while now, but when I did, Purify was really, really useful for finding problems.

A coworker of mine who's quite a C/C++ jockey used it recently (this month), and said it's still very good.

Re:Purify by Anonymous+Brave+Guy · 2007-03-30 03:12 · Score: 1

Our experience was just the opposite. We recently gave up using Purify entirely, because it wasn't finding anything that tools like Valgrind didn't find more reliably and much faster. YMMV.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Re:Purify by Tamerlan · 2007-03-30 04:10 · Score: 1

OP asked specifically about static analysis tools whereas both Purify and Valgrind are dynamic analysis tools.

--
my sstream of consciousness

For what? by _iris · 2007-03-30 02:21 · Score: 1

Why are you analyzing your code? What are you looking for? Performance optimizations? Security flaws? Bugs in general?

Re:For what? by rewt66 · 2007-03-30 10:10 · Score: 1

Bugs in general.

What software analysis tool? That all depends... by hAckz0r · 2007-03-30 02:33 · Score: 3, Informative

There are many software tools out there for static analysis, but differ in what they do or who they target as their customer. The big names in my mind are Coverty, Fortify, Prexis, and PolySpace. I only have personal experience with Prexis and PolySpace so I will just speak to those.

One important thing to consider is the set of compilers, tools, target system, and build environments you are using. If you are using MS only products the you will most likely have very good support because most all source code analysis suits will simply import the build information and you will be off and running right away. If your environment is Unix or embedded systems then things may be more difficult because you will need to hook into the build process somehow. The scanner tools usually intercept the CC command from a "make" build and call their back end using their custom processing rather than the compiler proper. Different products do this in different ways so be sure the product you choose knows how to deal with your specific build environment. In my case I walked into another parties environment and needed to simulate a build for a new build environment that I had never seen before, every time. Not one environment ever looked like the next, so the setup and configuration was always a big challenge, just to get started.

Prexis is primarily a tool for life cycle scanning of source code for security issues. There are two ways to perform the code scanning, with either the main engine component which can schedule nightly scans and track progress over time or with the additional Prexis Pro utility, which is designed for quick assessments by the engineers on their own code without logging everything into the main database. The Pro tool worked best for my code assessments since I had no need for tracking changes over time, and it was a little easier to configure which counts for a lot in my situation.

PolySpace is a completely different tool with a different purpose from Prexis. PolySpace attempts to mathematically discover runtime flaws in the code while only using static analysis to do so. It does a great job on smaller projects, but because of the complexity and thoroughness of its analysis, it is somewhat slow. PolySpace needs to evaluate an entire application all at once in order to do a good analysis. If your .5 MSLOC of code is many separate programs/executables then you will be fine, but if you are talking about one huge monolithic application then you may have to evaluate it in chunks which just increases the false positives and forces the engineer to do more manual chasing of details to determine if the issue is really a problem or not. From what I have seen this product is in a class by itself.

BTW - keep you eyes on this site: http://samate.nist.gov/index.php/Main_Page

CodeSonar + other commercial tools by mmcdouga · 2007-03-30 02:57 · Score: 1

I work on a commercial static analysis tool called CodeSonar. It costs money, but we do offer free trials.

Our major competitors in this space are Coverity and Klocwork.

All three tools can (to some extent) infer how a program will behave at run-time, so they find more subtle bugs than tools that just look for suspicious patterns in your code.

SPLINT is the answer for C. by Z00L00K · 2007-03-30 03:03 · Score: 1

See www.splint.org for a really good tool for static code checking when it comes to C.

I have used it sometimes, and as I have noticed that in some cases the version from CVS is better than the released version. (but as always, your mileage may vary).

For C++ it's a lot harder, but the programming rules for C++ and the compilers are a bit stricter too, so you may be helped there.

To make things worse (or better, depending on how you see it :-) ) you can always take a look at PurifyPlus from IBM. It contains three components, Purify; which checks runtime for memory leaks and illegal memory access, Quantify; which checks for performance bottlenecks and PureCoverage; which checks so that all parts of your code actually has been executing during your tests.

C++ is also a lot harder to do static checking on due to the fact that it contains inheritance and still allows a lot of features from C so an object can be passed around in a perfectly legal manner and still be hiding from the syntax checker. Openings for really strange bugs if someone decides to do "smart programming".

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.

Management Side by Bender0x7D1 · 2007-03-30 03:47 · Score: 1

Regardless of what tool you select, you will have to decide what rules you want to apply and what you are trying to get out of using the tool. If management doesn't understand the purpose of the tools, they may make inappropriate decisions on how to use them. As an example, I worked on a large project, (hundreds of developers), and management decided that we needed to use a static analysis tool and that code had to be "clean" before it could be checked in. It was phased in, so we had a month to eliminate errors, and another month to eliminate warnings from our entire code base, but management wanted to use the entire ruleset and not allow the comments in the code that told the tool to ignore certain rules for certain lines.

Fortunately, we had a commmittee that was responsible for testing the tool and integrating it into our software engineering process, with a few people that management really listened to. After a few meetings, the committee was allowed to determine the ruleset that would be used and updated the rules for code inspections so the "ignore" comments were allowed, but had to be included as part of an inspection.

If we hadn't had a strong committee that got management to relax the rules life would have been a living hell trying to alter the code to make the tool happy. If 4 people can agree that the best thing to do is break a rule, you should trust them. If you can't trust them, then you shouldn't have them working for you. Remember, tools are dumb and don't understand the "why" behind the code. Yes, tools will find a lot of things that should be fixed, but they aren't always right.

--
Reading code is like reading the dictionary - you have to read half of it before you can go back and understand it.

It All Depends on Context by raftpeople · 2007-03-30 03:48 · Score: 1

Would you consider SAP an application? Or any other ERP system? I would.

Re:It All Depends on Context by tomstdenis · 2007-03-30 04:49 · Score: 1

No *one part* of SAP should be 100K+ lines long.

What would it be doing that it can't refactor the code into manageable and verifyable libraries?

Tom

--
Someday, I'll have a real sig.
Re:It All Depends on Context by mwvdlee · 2007-04-01 19:18 · Score: 2, Insightful

And part == project?

Is is SAP a single project, and are all those individual parts considered projects too? Perhaps a single .DLL is a project internally.

You seem to be missing the point that there is no clear definition or scale for a project, atleast not in the world outside of yours where every single compiled module seems to be a "project".

In real-life, a project may be anything from rebuilding an entire set of applications to fixing a typo in a batch file.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?

Klocwork static analysis suite by Gwyn+Fisher · 2007-03-30 03:54 · Score: 1

Shameless commercial plug here... I'm the CTO of Klocwork (www.klocwork.com), a vendor of source code analysis tools. We provide security vulnerability and implementation defect checking for C, C++ and Java. In addition, as others on this thread have stated, you're going to want to look at refactoring, architectural analysis, rule tuning, metrics, trends, all the usual stuff and all of which we supply as part of our enterprise suite of products. Check your supplier list carefully as all of the companies in this space offer different subsets of the whole. There's a decent page on Wikipedia on static analysis that mentions the prevalent tools in this space, including our major competitors. Last point: be careful to try before you buy (whether "buy" involves money or not), as all tools are not created equal.

Headway Software by Anonymous Coward · 2007-03-30 03:56 · Score: 0

I used there tools for a large project (probably 100k) that had spiraled out of control and needed some major restructuring. You use their compiler to build your code, and it gathers lots and lots of information. We used it to analyze all the connections between the various modules/files, but it will also give you many different metrics. We also used their GUI to restructure the existing code base visually, and see manage all the interactions. Very useful, and nice friendly small company. http://www.headwaysoftware.com/index.php

My Static Analysis by mkcmkc · 2007-03-30 04:04 · Score: 0, Troll

If your project has 500K lines of C/C++, it will almost certainly fail.

--
"Not an actor, but he plays one on TV."

Re:My Static Analysis by idries · 2007-03-30 04:33 · Score: 2, Informative

Many sucessful products are made up of around 500K lines of C++.

Most console computer games for example start at around 500K lines...
Re:My Static Analysis by mkcmkc · 2007-03-30 05:56 · Score: 0, Troll

Many sucessful products are made up of around 500K lines of C++.
That may be, but it doesn't really contradict my comment. The question is: What proportion of 500K+ projects fail?

--
"Not an actor, but he plays one on TV."
Re:My Static Analysis by Anonymous Coward · 2007-03-31 03:50 · Score: 0

If you get to 500000 lines of code, chances are the app will be too expensive to replace. It wont we fun using or working on, though.

Klocwork by Tamerlan · 2007-03-30 04:07 · Score: 1

I am employee of Klocwork.

If you are researching this for you enterprise I suggest you evaluate Klocwork (and its competitors: Coverity, Grammatech, Parasoft, there are others). We handle large-scale C/C++ projects, our own codebase is much larger than yours and we run Klocwork in-house to track defects in our own code on a daily basis and on developer desktops for subprojects. In fact we successfully handled mammoth projects as big as 10M lines of code and beyond (but frankly, it is getting rather tricky at that point).

We do have product for individual developers and small shops, but for now it is Java only.

--
my sstream of consciousness

HPUX Tool - C Advise by Anonymous Coward · 2007-03-30 04:34 · Score: 0

You can use the free tool : C Advise on HPUX to run static analysis of C and/or C++ code. It's pretty good. I think you have to be DSPP member to download it, but registration is free.

Understand for C++ by jvaigl · 2007-03-30 04:51 · Score: 2, Interesting

I'm working on a project that's evolved over several years and there's been high turnover among the developers. We use a product called Understand for C++. It has a lot of great reverse engineering, metric generation, and source browsing features that make it pretty useful.

From their marketing blurb...

Understand, our flagship product, helps thousands of companies maintain impossibly large or complex amounts of source code. It parses source code for reverse engineering, automatic documentation, and calculating code metrics. We have versions for Ada 83, Ada 95, FORTRAN 77, FORTRAN 90, FORTRAN 95, Jovial, K&R C, ANSI C and C++, Delphi, and Java. Multi-million SLOC projects are common with our users.

Re:Understand for C++ by nonsequitor · 2007-03-30 06:33 · Score: 1

I recently used this on my last job for max stack depth analysis. A good tool I must say, fairly cheap as well for a corporate budget. The user interface is a bit rough, not very pretty, but I hear its improved a lot over the years. I think I would insist of getting a WebEx demo or something since I doubt I even touched 2% of the program's features.
Re:Understand for C++ by Amertune · 2007-03-30 10:01 · Score: 1

I've used it to go into old code and figure it out so that I could know where to make the changes I needed to make and know what else would be affected by those changes. The interface is getting alot better.

c++test by lordmage · 2007-03-30 05:44 · Score: 1

from parasoft corporation. Statically tests functions for 50 cases.

I prefer the Insure++ product myself. It really helps in finding bugs.

IT IS NOT FREE.

--
I can program myself out of a Hello World Contest!!

Depends on platform and kind of analysis by c0d3h4x0r · 2007-03-30 06:23 · Score: 1

You didn't mention what platform you're building on, and you also didn't mention exactly what kind of analysis you want to perform.

If you're on Windows, the latest Visual Studio C/C++ compilers include a pretty good (but basic) code analysis tool built in. Just use the /analyze flag to cl.exe.

--
Moderator hint: a comment is neither "Flamebait" nor "Troll" if it is true.

Fortify Software has a static tool by robby_r · 2007-03-30 08:00 · Score: 2, Informative

Fortify a security static scanner and covers C/C++ as well as Java, JSP, .NET, C#, XML, CFML, PL/SQL and T-SQL.

OK, I should have given more detail... by rewt66 · 2007-03-30 10:23 · Score: 1

We're on an embedded system with several CPUs.

One of the CPUs is running Linux. This code we compile with gcc on a Linux box.

Another CPU is running ThreadX. We cross-compile this on Windows using the Green Hills compiler.

A couple other CPUs run Nucleus OS. These are also cross-compiled on Windows using the Green Hills compiler.

We have gotten evaluations of KlokWorks and Coverity (and I've probably said enough here for them to figure out who we are). And they do good stuff, too. But I'm trying to look around to see what else is out there, since Coverity especially is pretty pricey, and KlokWorks didn't give us a long enough demo for us to really evaluate how well their tool found the kind of issues we are looking for.

As someone (CTO of KlokWorks?) said earlier, try before you buy...

BEAM, Coverity, splint and other types of tools by Anonymous Coward · 2007-03-30 14:05 · Score: 0

I don't think it's publicly downloadable, but IBM has a tool called BEAM. http://www.research.ibm.com/da/beam.html

The results are okay from BEAM. Maybe you can submit a comment (see bottom of the page) to request use of the tool.

I've tried to use splint with mixed results. The default warnings were rather verbose, and most of them were unimportant.

Another open source project ran Coverity over our source code and sent us a summary of the results. The noise to signal ratio seemed better for BEAM and Coverity than splint. Though it's possible that the person running Coverity over our code turned off some warnings by default.

If you're interested in runtime tools in addition to static analysis tools, IBM Rational Purify (commercial) and valgrind (free) work fairly well. Each have their own issues. Both occasionally give false positives. The valgrind tool comes with many Linux distributions, but it's not really there for Windows.

Of course, those tools don't work effectively, unless you have a good test suite. IBM Rational PureCoverage help you to discover where code coverage is needed by your test suite. Our open source project aims for 100% API coverage and >85% overall line coverage for each release. While it's difficult to get some error conditions to be exercised and tested with the test suites, it's well worth it in the long run. It's satisfying to fix a bug, add a test for the bug, and see that your fix didn't break any of the other tests. You might also be able to use the gcc profile option to get similar functionality as PureCoverage, but it won't generate a summary graph to tell you if you're meeting your code coverage coverage goals.

The static analysis tools are good to test for things that you didn't think about testing, or didn't have time to test. None of these tools solve all your software stability problems, but they greatly improve the stability. Turning on compiler warnings for your application will find some minor issues, but the static analysis tools make it easier to find bugs that only appear if you analyze both the called function and the function caller.

(Full disclosure: I work on open source software at IBM)

sparse by normalperson · 2007-03-30 14:14 · Score: 1

Linux kernel developers use this:

http://www.kernel.org/pub/software/devel/sparse/

The compiler by jgrahn · 2007-03-31 05:41 · Score: 1

"C/C++". That's like saying "Fortran/LISP". Which language is it? Both?

If you are like all projects I have seen, you haven't turned on the relevant compiler switches for ANSI/ISO compliance and full warnings. Do that first.

Second, get yourselves a few more compilers. If you use gcc, fetch the latest version. It doesn't matter if you can run the compiled code.

Third, write type safe code. If it's really C++ code you are writing, disable C-style casts and see how much of those monstrosities (from a C++ point of view) you use.

After you are done, look for linters (aka static analysis tools). One weak spot of C and C++ compilers is that they work on the translation unit level; they don't see the complete source code at once.

Re:What software analysis tool? That all depends.. by jgrahn · 2007-03-31 05:51 · Score: 1

PolySpace attempts to mathematically discover runtime flaws in the code while only using static analysis to do so. It does a great job on smaller projects, but because of the complexity and thoroughness of its analysis, it is somewhat slow.

Last time I heard, Polyspace didn't do C++ -- just C and some random toy language (Java or Ada?). Cool but extremely expensive.

C++test from Parasoft by carl244541 · 2007-03-31 13:46 · Score: 1

C++test from Parasoft has static analysis and Automatice Unit test generation. But you should always try before you buy.

IncludeManager by 0xbeefcake · 2007-03-31 14:29 · Score: 1

Some related C++ analysis tools for Visual Studio may also be of interest to you, IncludeManager and StyleManager: https://secure.profactor.co.uk/products.php

87 comments