Slashdot Mirror


Tools For Understanding Code?

ewhac writes "Having just recently taken a new job, I find myself confronted with an enormous pile of existing, unfamiliar code written for a (somewhat) unfamiliar platform — and an implicit expectation that I'll grok it all Real Soon Now. Simply firing up an editor and reading through it has proven unequal to the task. I'm familiar with cscope, but it doesn't really seem to analyze program structure; it's just a very fancy 'grep' package with a rudimentary understanding of C syntax. A new-ish tool called ncc looks promising, as it appears to be based on an actual C/C++ parser, but the UI is clunky, and there doesn't appear to be any facility for integrating/communicating with an editor. What sorts of tools do you use for effectively analyzing and understanding a large code base?"

9 of 383 comments (clear)

  1. Stepping Through by blaster151 · · Score: 5, Insightful

    I've always found that stepping through the debugger at runtime is a decent way to start making sense of a large code base. Easier, anyway, than trying to read static code printouts. Just set a breakpoint at a point of interest, fire up the application, and use it as a starting point. You get a sense for program flow and it's a great way to generate questions--lots of them. (What does class SuchAndSuch do? It looks like the application is handling remoting in such-and-such a fashion; is that right?) You can also choose one aspect of the architecture and selectively ignore or step over other aspects, building up your understanding one aspect at a time. In my case, with Visual Studio as a development environment, I can hover the mouse cursor over variable names to see their current values. In the case of variables of a certain type, like datasets or XML structures, I can use realtime visualizers to browse the contents and get a much better feel for what's going on.

    If there's no one at your company that can help answer your questions and bring you up to speed, I feel for you - your employers ought to know enough to give you some extra margin. It can be very hard to take over a large code base without some human-to-human handover time.

    Also, is it an object-oriented system? I assume that it's not, based on your post, but you don't say either way. If it is, the important aspects of program flow often live in the interactions between classes and objects and the business logic is decentralized. OO is great, but it can be harder to reverse-engineer business logic because it's distributed among various classes. A debugger that lets you step through running code is almost essential in this case.

    1. Re:Stepping Through by orclevegam · · Score: 5, Insightful

      Much as I would love to agree with you, unfortunately the world isn't always so accommodating. Sometimes you have to suck it up and stay with a job till you can find something better, and most employers won't let you toss anything out, let alone a major chunk of their code base. Doesn't matter if it's utter crap, they paid for it, and as far as their concerned turd polishing is better then starting from scratch even if starting from scratch would be a hell of a lot cheaper. Can't expect MBAs to understand the difference between good code and bad code, to them it's all just code, and as far as their concerned, the more the better. It's the old idiotic idea that more lines of code means a better product, therefor anything that reduces lines of code must be a bad thing.

      --
      Curiosity was framed, Ignorance killed the cat.
    2. Re:Stepping Through by plover · · Score: 5, Informative
      (Warning: you asked!)

      Well, the learning curve is certainly important in the real world, although I expect a professional to know his or her tools before they arrive on the job. But there are a metric crapload of things I like better about Visual Studio that make it a much more effective debugger than gdb, in my opinion. (Note that I am not a big gdb user, so I may be cutting it a bit short in the feature set here. My apologies in advance if I do so.)

      Things I've found I prefer include many tool windows simultaneously showing the states of registers, memory, the call stack, an object or seven (expanded to show a few properties), and automatic resolution of virtually every symbol and name, including the operating system (although you have to download the symbol files for your OS version from Microsoft.) And you still have full navigation through the source.

      Simply hovering the mouse over a symbol will bring up a tool-tip to display the contents. If you highlight an entire expression such as pFoo->pBar->Blah.count+7 and hover, the tooltip will display the calculated result.

      You can set a temporary breakpoint by setting the cursor on a line of code and clicking "run to cursor." You can run, single step, run to the current cursor, or run till function return. That last one is great for re-entering a function multiple times to test different conditions.

      The variables window contains the current call stack as a dropdown list -- changing the stack lets you see the newly-local variables. Watch windows can display data as hex or decimal, just right click and select. Watch entries can even be used as calculators (enter a literal value, such as 0xf0 + 12, and it will display the results.)

      In the watch windows, you can also call arbitrary functions (good for testing without driving your code to that point) or other functions in your memory space, such as the C runtime memory checkers. If you're trying to track an errant pointer, create a debug build, start running and break, type _CrtCheckMemory() into a watch window, and every time the watch window is refreshed, it will check all your fenceposts. You might get lucky and spot your corruption as it happens. The /GZ compiler option will perform a similar task at the function level, but this would let you do it at a line level.

      There are also dozens of possible formats it can display your watch variables in -- suffix a pointer with ,s and it'll display the contents as an ASCII string. Only see one byte because of Unicode? Suffix the pointer with ,su and you'll see the unicode string. A ,wm suffix displays window messages by name. ,hr suffix displays HRESULTs by name.

      The memory windows will highlight in another color any data that's changed since the last time it was refreshed, whether it be a single step or a previous breakpoint. You can have memory displayed as bytes, shorts, or longs. And with the newer visual studios, you can have multiple memory windows, so you can keep track of two, three or four arrays simultaneously. You simply drag and drop them wherever they're convenient, then step through the code and watch for colored variables indicating change.

      Again, all these windows are automatically updated every time the debugger drops from the program to your control. I've got two 17" monitors, and I can fill them both. The problem with debugging is that sometimes you are really starting blind, and the faster you can get more information, the less time you waste debugging.

      There's a cute "magic trick" I like to show people with the memory window and the disassembly window. Let's say you've had a crash, and attached the debugger to the running program. You're looking at a corrupt stack in the call stack window -- just one line of garbage data. What to do? Where did it break? Enter @ESP in the memory window. Change the view to 'long' and it displays the memory as 8-digit numbers. If y

      --
      John
  2. Understand C++ by SparkleMotion88 · · Score: 5, Informative

    Sorry I don't have an open source tool for you, but I've used Understand for C++ in the past and it was pretty helpful. To me, the most useful piece of information for understanding a large codebase is a browseable call graph. I'm sure there are simpler tools out there that generate a call graph, but this is the only one I've used with C++.

  3. You must have inherited my old project by theophilosophilus · · Score: 5, Funny

    Sorry about that.

    --
    Why have 1 person driving a backhoe when you could employ 20 with shovels?
  4. What I do by laughing_badger · · Score: 5, Informative
    SourceNavigator : A good visualisation package http://sourcenav.sourceforge.net/

    ETrace : Run-time tracing http://freshmeat.net/projects/etrace/

    This book is worth a read http://www.spinellis.gr/codereading/

    Draw some static graphs of functions of interest using CodeViz http://freshmeat.net/projects/codeviz/

    Write lots of notes, preferably on paper with a pen rather than electronically.

    --
    Help children born unable to swallow - www.tofs.org.uk
  5. Re:Doxygen, and Extracting Software Architectures by Mr.Bananas · · Score: 5, Informative

    I use Doxygen for C code, and it is really helpful. One of its most useful features is that it generates caller and callee graphs for all functions. You can also browse the code itself in the generated HTML pages, and the function calls are turned into links to the implementation. Data structures and file includes are also pictorially graphed for easy browsing.

    If the system you need to understand has a really big undocumented architecture, then this presentation might be useful to you (there is a research paper, but it's not free yet). In it, the authors present a systematic method of extracting the underlying architecture of the Linux kernel.

  6. Umm.. documentation? by Anonymous Coward · · Score: 5, Insightful

    Seriously folks, having spent large chunks of my working life having to decipher the mess of those who came before me I cannot stress enough the importance of clear comments, variable/function names, and consistent and readable syntax. AND WRITE F@#$%ing HUMAN READABLE DOCUMENTS DESCRIBING FUNCTIONAL REQUIREMENTS, ALGORITHMS USED, LESSONS LEARNED, ETC.
    Calling all your variables "pook" or the like may be very cute, but does not help me figure out what the heck the function is supposed to do or why I would ever want to call it. Yes it's a pain. Yes we're all under time deadlines and want to get it working first and go back and document it later. And yes, it WILL bite you in the ass (ever heard of karma? your own memory can go and then you have to decipher your OWN code!).

    That said, if you have inherited a code base from someone who ignored the above, go through and generate the documentation yourself. Write flow charts and software diagrams showing what gets called where and why. Derive the equations and algorithms used in each piece and figure out why the constant values are what they are. Finally, start at the main function or reset vector (I do a lot of microcontroller development) and trace the execution path.

  7. Absolute tosh ! by golodh · · Score: 5, Insightful
    An interesting post, even if it's absolute tosh. No-one in his right mind tackles a new code-base of any size or complexity with nothing but a printout. Not if he's expected to understand how it works and/or maintain it in a responsible way.

    In fact, it nicely highlights the difference between "software engineers" and "code monkeys". Code monkeys just dive in; they never pause to think. In fact ... they tend to avoid thinking. It's not their strong point. After all ... they're paid to code, right? Not to think. Software engineers on the other hand, look before they leap and spot the places where they need to pay attention first. And they're systematic about it.

    In fact, a software engineer will happily spend a day or two putting the right tools in place, *including* a full backup and a proper version management system for when he's going to have to touch anything.

    The first thing you want to know about a new code base (after you find out what it's supposed to be doing) is its structure. Tools like Doxygen (see previous posts) show you that structure *far* quicker and *far* more reliably than any amount of dumb code-browsing can. And besides ... once you do it, you've got that documentation stashed away securely instead of milling around incoherently in your head (you'll have completely forgotten most of what you read by next month) or on disorganised pieces of note paper.

    The second thing is to figure out if it calls any "large" functionalities like subroutine libraries or even stand-alone programs like databases, let alone if it makes operating system calls. The call-tree will give you an excellent view, and the linker files can complete the picture. You wouldn't be the first maintenance programmer who found out after months that his application critically depends on some other application he wasn't told about.

    The third thing is to see where your code does dirty things. Let the compiler help you. Just compile your application with warnings on and have a look at what the compiler comes up with. You might be surprised (and horrified). Then compile with the settings used by your predecessor and check that your executable is bit-for-bit identical to what's running (you wouldn't be the first sucker who's given a slightly-off code base).

    If performance is at all important, then running the whole thing for a night on a standard case under a good profiler will also tells you lots of important things. Starting with where your code spends its time, where it allocated memory and how much, and where the heavily-used bits of code are. All neatly written down in the profiler logs.

    Finally, run your application with a tool to detect memory management errors the first chance you get. Useful tools are Valgrind (in a Linux environment), Purify (expensive, but probably worth it) under Windows, and sundry proprietary utilities under Unix. Just about 90% of the errors made in C programs come from memory management problems, and half of them don't show up except through memory leakage and overwritten variables (or stacks .. or buffers .. or whatever). You'll need all the help you can get here, and as far as these errors are concerned, dumb code browsing is useless. Just keep your head when looking at reports from such tools ... they can throw up false positives. Ask around on a forum with specific questions if you're allowed, or ask your supervisor. After all ... you showed due dilligence.

    When you know all that (if you have the tools in place, all of this can be done within 1 day + 1 overnight run + 1 hour reading the profiler output), go ahead and trace through the code in a debugger. You'll be in a *far* better position to judge what you should be reading.