Tools For Understanding Code?
ewhac writes "Having just recently taken a new job, I find myself confronted with an enormous pile of existing, unfamiliar code written for a (somewhat) unfamiliar platform — and an implicit expectation that I'll grok it all Real Soon Now. Simply firing up an editor and reading through it has proven unequal to the task. I'm familiar with cscope, but it doesn't really seem to analyze program structure; it's just a very fancy 'grep' package with a rudimentary understanding of C syntax. A new-ish tool called ncc looks promising, as it appears to be based on an actual C/C++ parser, but the UI is clunky, and there doesn't appear to be any facility for integrating/communicating with an editor. What sorts of tools do you use for effectively analyzing and understanding a large code base?"
Tests are indeed very good to understand a code base- Nearly all the last year I was working on a code base that nobody understood completely, although I had someone to ask about the general code structure. Writing tests helped me to understand what some parts of the code actually do. And where I needed to change things I could make myself sure that I didn't break anything.
Another great tool is valgrind+KCachegrind - it gives you really nice call trees. Vtune can do something similar as well, but IMHO the output is not as good as in KCachegrind. The only problem, of course, is that valgrind makes your program very slow and, it is, AFAIK, not available on MS Windows.Vtune, OTOH, runs the program at normal speed, but it's calltree output is ugly, at least on Linux.
If these two options are not for you than you might add a trace output to each function. IMO this is better than using a debugger - especially in C++ with BOOST and STL, where a lot of stepping goes through inline functions.With proper logging levels you can get a very useful output to see what's going on. It helps to understand the code, and it also helps, if you hit a bug.
I had a very wise undergrad EE prof who said on the first day of design class that we needn't worry about the many "complicated" things that we would have to design during the course because we had already completed all of our circuit analysis courses. He said it's much harder to figure out the details of someone's design than to design it yourself. Same applies here in software. I've been there working with other's undocumented code and quite frankly it was infrequently that I left the project with more respect for the programmer. Here I'll just say what I learned from the experiences as useless as it might be.
If the coding style used is appropriate you stand some chance. Lines of code don't matter much when behavior is sufficiently complex that you cannot list the states and events that trigger execution and state change let alone keep track of them in your head long enough to understand their context.
I once had a similar problem with some legacy OS9 c code that performed a simple communication task and updated a monitor. With no documentation from the writers I was to "simply add some new data to be collected and display it." The problem with this 3000 loc was that it was written as a state machine with no modularization - next to impossible to follow in a debugger. What I wanted to do is run a performance analyzer along with the code but I was told that was "out of budget". This would have told me at least the parts of code being executed frequently and I could start to associate the external events with the code processing.
On very large applications like AT&T's RNS (residential account management for BellSouth) that exceed million-lines-of-c-code the only thing that made the application workable for new features was the fact that it was created in a CMM III product environment thus it was well documented in design, development, testing, feature changes, bug fixes, etc. Even with all of this the number of processes and related data stores still showed a lot of bleed over and function duplication (there was no simple way to determine if a function was in existence that already did what you needed and even harder to determine if it was state data dependent and thus unusable in certain other states. Attempts by us (contract coders mainly) to get the company to allow us build a function-finding-tool/database to eliminate this problem fell on mostly deaf ears.
Because of this we had to depend on the longer-lived of the system architects to get an idea of where functionality existed. There were many times though when no one knew and weeks had to be spent reverse engineering communication structures, what the heck undocumented stretches of code did, re-write the documentation correctly and then start to implement the feature or correct the problem that had "been there for years." Management did not like the time taken to repair poor coding as this was not included as one our trackable metrics and therefore not in our feature/bug's budget (since it was not considered to be either).
RNS sounds bad but it was a breeze compared to that tightly optimized state machine code without documentation. So, my recommendations are:
1) If it is stream-of-thought-code (kind of like Faulkner's The Sound and the Fury), not modularized, not documented Tell your manager that it most likely will have to be re-designed to understand it fully. That means do an essential model of it's processing and data stores, use-cases, objects and events or whatever rigorous methodology you prefer. Then use that to re-write it. If management doesn't want to do that then you do not work for a company interested in maintainable code but wants a cheap fix. I would leave as soon as you get from them what they took from you in suckering you into the place.
2) If it is structured and/or developed in a "self-documenting-language" like Ada, Modula, Eiffel, etc. that forces structure (or at least makes it easier to write structured rather than unstructured), finish documenting it properly a
Be as you would have the world become.