Ask Slashdot: What Tools To Clean Up a Large C/C++ Project?
An anonymous reader writes I find myself in the uncomfortable position of having to clean up a relatively large C/C++ project. We are talking ~200 files, 11MB of source code, 220K lines of code. A superficial glance shows that there are a lot of functions that seem to be doing the same things, a lot of 'unused' stuff, and a lot of inconsistency between what is declared in .h files and what is implemented in the corresponding .cpp files. Are there any tools that will help me catalog this mess and make it easier for me to locate/erase unused things, clean up .h files, and find functions with similar names?
Seriously, you never know when some previous programmed made a "duplicate" function to do something bizarre, like force a particular initialization order of static-class-member variables between translation units. Sometimes deleting pointless code can do... terrible things. Just be careful, test your changes, etc.
"Sorrow is better than laughter, for by sadness of face the heart is made glad." [Ecclesiastes 7:3]
I've maintained several legacy code bases over the years.
And I will flat out tell you that unit tests have VERY limited utility in terms of understanding a mess of code you inherited. At least, in the beginning.
Sure, you can start with a couple of basic premises, and you can convince yourself those basic premises still work.
But the initial grokking of your code, understanding all places where a function may be used, understanding all of the tricky bits and gotchas, trying to understand why there are 9 functions which look like they do the same thing? That takes some time and effort, and quite possibly some tools.
Unit tests are great for starting to build up a few things, and move towards better stuff ... but in a system which has several hundred (or several thousand) functions and interactions, resulting in really large numbers of code paths ... having a few unit tests describing the stuff you understand doesn't mean all of the stuff you don't understand wasn't broken, simply because you don't know what you don't know.
So it is important to understand your new unit tests on legacy code are, at best, a VERY incomplete view of your code. That will improve over time, but you could potentially need to write a few thousand of them to be sure you're not breaking anything in the big picture.
Oh, yes .... This .. for the love of god, this.
You should learn how to tag branches and the like in your version control so you can identify a baseline of "before I ever touched anything" and then be able to cleanly build everything which predates you, as well as building your "after refactoring this part".
Branching/tags/whatever your version control calls it -- that doesn't take up much space, so use them often, and consistently. Let the tool do the heavy lifting of keeping track of what you've changed.
You do NOT want to find yourself unable to build it as it existed, or identify all of the diffs between what you started with and what you have.
Lost at C:>. Found at C.
See: Working Effectively with Legacy Code book review (2008) for a book of that title by Michael Feathers (PDF article) on that very topic.
There is even a summary of key points at Programmers @ StackExchange. Hundreds if not thousands of programmer's blogs address this very topic.
You're welcome. Now get back to work.
Wow, what an easy pitch. :-) At Mozilla, we've put together a tool called DXR ( https://github.com/mozilla/dxr... ). It indexes your code and lets you do text and regex searches. But if you can get your project to build under clang, you can really have some fun, with queries that find...
* Calls of a function (great for dead code removal)
* Uses a type
* Overrides of a method
* Uses and definitions of macros
* etc., etc., etc. There are something like 24 different structural queries you can do.
Because all of this is informed by the internal data structures of the clang compiler, it's nigh on 100% accurate (aside from more dynamic behaviors like sticking function pointers in a table and passing them around). You can also explore a hyperlinked version of the source, bouncing from #include to #include and drilling into methods.
Here's how to set it up: https://dxr.readthedocs.org/en...
Here's our production instance you can play with: https://dxr.mozilla.org/mozill...
If you run into trouble, pop into #static on irc.mozilla.org, and we'll be happy to help you.