Slashdot Mirror


Ask Slashdot: What Tools To Clean Up a Large C/C++ Project?

An anonymous reader writes I find myself in the uncomfortable position of having to clean up a relatively large C/C++ project. We are talking ~200 files, 11MB of source code, 220K lines of code. A superficial glance shows that there are a lot of functions that seem to be doing the same things, a lot of 'unused' stuff, and a lot of inconsistency between what is declared in .h files and what is implemented in the corresponding .cpp files. Are there any tools that will help me catalog this mess and make it easier for me to locate/erase unused things, clean up .h files, and find functions with similar names?

12 of 233 comments (clear)

  1. Static analysis tools... by underqualified · · Score: 5, Informative

    If you're company is willing to pay for it, you can get something like Coverity. On the free(as in beer) side there is CppCheck and clang.

  2. You call that large? by Anonymous Coward · · Score: 5, Insightful

    Seriously, that's mid-sized at best.

    1. Re:You call that large? by Oxdeadface · · Score: 5, Insightful

      What compels comments like this? The first AC posts absolutely nothing of value, just wants to let everyone know that they disagree with a minor point that's completely irrelevant to the OP's question. Thanks for the insight, champ. The followup, probably the same person, goes on to ramble like an old fart telling a useless anecdote about his kid that's barely even related to the topic at hand. At what point did either of these seem like a good idea? Neither of these comments address the question being asked or even attempt to be useful at all. No one cares what you consider a large program and absolutely no one gives a shit about you or your fucking crotch fruit. These comments are just some sad cunt's way of claiming, "I'm more experienced and better than you." Fuck right off.

  3. clang static code analysis by Anonymous Coward · · Score: 5, Informative

    scan-build and scan-view from clang++ will show you what is being used and what isn't as far as static code analysis goes.

  4. Document first by gbjbaanb · · Score: 5, Insightful

    So, figure out the layers or logical components between each module and then you will be able to chew smaller chunks.

    Then, doxygen the whole lot, making sure to use dot to create the graphs for callers and callees. This will let you see the interaction points so you can see what impact a change in one method will have (ie which callers you have to check).

    Some people will say "write unit tests" but frankly, it never works with a legacy code base, to effectively unit test you have to write your code differently to how you'd normally do it. You don't have that luxury here. So a good integration test suite should be developed to test the functionality of the whole thing, then you can repeat it to make sure your changes still work. Its not as instant as unit testing (but more effective) so you'll have to invest in a build system that regularly builds and runs the (automated) integration test and tells you the results - and commit changes reasonably regularly so you can isolate changes that end up breaking the system.

    The rest of the task is simply hard work running through how it works and understanding it. There's no short-cuts to working hard, sorry.

    1. Re:Document first by bmajik · · Score: 5, Insightful

      This.

      One of my first professional programming projects was to take a look at the custom C++ billing software our company had purchased from a contract programmer.

      I had a long unix and programming background, and was back for a summer job after doing 1 semester of C++ in college.

      My boss told me, since I was the only one who had C++ experience, to start documenting the system.

      At the time, we were using IRIX, and so I was using the SGI compiler and tools suite, which were, I believe, licensed from EDG. The point is that there was a very nice call graph visualizer. This was helpful for understanding things at a superficial level.

      However, what was even better was just running the program a bunch of times on test data and seeing what it did while under the debugger.

      While my summer began with the task of documenting the system, as I learned things I'd report them to my boss.

      By the end of the summer, I had re-written some fundamental parts of the system; I'd moved some of the processing outside, and I pre-processed and pre-sorted the data.

      The overall execution time went from many hours to about 45 minutes to calculate monthly bills. THe key innovation was replacing the inner loop of the charge tabulation -- which was 2 or 3 levels of nested linked list traversal.

      Instead, I used the standard unix sort tools to pre-sort the data files before being loaded into the system, and I changed the system to use a data structure that supported a binary search.

      The majority of the code got left alone. By understanding the code under a debugger, and realizing that how it worked on production data was much different than how it performed on the test data it was originally delivered with, I was able to make a critical set of changes that had a huge impact.

      In general, I spend as much time as I can not writing code, but instead, understanding how the existing system works. For a current project, I've spent the last two weeks playing with somebody else's code, and now I've expanded it so that it can also operate on my data sets, and I've probably changed fewer than 100 lines across about 5 different projects.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
  5. Eclipse, Xcode or any IDE by guruevi · · Score: 5, Insightful

    Any decent IDE has the capability of pointing at least towards unused blocks of code and will generate a tree of function calls. I've worked with Eclipse and Xcode both of which have these capabilities. Even GCC (or another C compiler) can warn you about chunks of unused code or missing/bad header files. You can also rename functions across the entire codebase if necessary.

    If your code has warnings or errors, continue fixing until the warnings are gone. As far as functions that do similar things but are named differently, that is a bit harder because 'looks like they are doing the same thing' doesn't always mean they ARE doing the same thing (if they have the exact same code, you could perhaps solve with statistical analysis or simply a text finder).

    Make sure that if you replace a function that it has the same behavior in all cases. Even mediocre developers have learned that reuse existing code is a "good thing" and often different functions that do "the same thing" have edge cases (often undocumented) where it does behave differently (especially in C/C++ eg. difference in signedness, memory mapping method, characters etc)

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  6. If you don't know what it does, don't touch it. by BlueKitties · · Score: 5, Interesting

    Seriously, you never know when some previous programmed made a "duplicate" function to do something bizarre, like force a particular initialization order of static-class-member variables between translation units. Sometimes deleting pointless code can do... terrible things. Just be careful, test your changes, etc.

    --
    "Sorrow is better than laughter, for by sadness of face the heart is made glad." [Ecclesiastes 7:3]
  7. Re:Unit tests by gstoddart · · Score: 5, Interesting

    I've maintained several legacy code bases over the years.

    And I will flat out tell you that unit tests have VERY limited utility in terms of understanding a mess of code you inherited. At least, in the beginning.

    Sure, you can start with a couple of basic premises, and you can convince yourself those basic premises still work.

    But the initial grokking of your code, understanding all places where a function may be used, understanding all of the tricky bits and gotchas, trying to understand why there are 9 functions which look like they do the same thing? That takes some time and effort, and quite possibly some tools.

    Unit tests are great for starting to build up a few things, and move towards better stuff ... but in a system which has several hundred (or several thousand) functions and interactions, resulting in really large numbers of code paths ... having a few unit tests describing the stuff you understand doesn't mean all of the stuff you don't understand wasn't broken, simply because you don't know what you don't know.

    So it is important to understand your new unit tests on legacy code are, at best, a VERY incomplete view of your code. That will improve over time, but you could potentially need to write a few thousand of them to be sure you're not breaking anything in the big picture.

    If you do things wholesale, then you are likely to break something in an unmanageable way. Oh and make sure things are version controlled ;)

    Oh, yes .... This .. for the love of god, this.

    You should learn how to tag branches and the like in your version control so you can identify a baseline of "before I ever touched anything" and then be able to cleanly build everything which predates you, as well as building your "after refactoring this part".

    Branching/tags/whatever your version control calls it -- that doesn't take up much space, so use them often, and consistently. Let the tool do the heavy lifting of keeping track of what you've changed.

    You do NOT want to find yourself unable to build it as it existed, or identify all of the diffs between what you started with and what you have.

    --
    Lost at C:>. Found at C.
  8. Answer: read slashdot for long enough by plcurechax · · Score: 5, Interesting

    See: Working Effectively with Legacy Code book review (2008) for a book of that title by Michael Feathers (PDF article) on that very topic.

    There is even a summary of key points at Programmers @ StackExchange. Hundreds if not thousands of programmer's blogs address this very topic.

    You're welcome. Now get back to work.

  9. DXR, the code indexer by Grincho · · Score: 5, Interesting

    Wow, what an easy pitch. :-) At Mozilla, we've put together a tool called DXR ( https://github.com/mozilla/dxr... ). It indexes your code and lets you do text and regex searches. But if you can get your project to build under clang, you can really have some fun, with queries that find...

    * Calls of a function (great for dead code removal)
    * Uses a type
    * Overrides of a method
    * Uses and definitions of macros
    * etc., etc., etc. There are something like 24 different structural queries you can do.

    Because all of this is informed by the internal data structures of the clang compiler, it's nigh on 100% accurate (aside from more dynamic behaviors like sticking function pointers in a table and passing them around). You can also explore a hyperlinked version of the source, bouncing from #include to #include and drilling into methods.

    Here's how to set it up: https://dxr.readthedocs.org/en...
    Here's our production instance you can play with: https://dxr.mozilla.org/mozill...

    If you run into trouble, pop into #static on irc.mozilla.org, and we'll be happy to help you.

  10. Re:But are you lacking experience and the brain fo by I'm+New+Around+Here · · Score: 5, Funny

    Hey, MC Hammer built my house for me.

    Unfortunately, I'm not allowed to touch it.

    --
    If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.