Ask Slashdot: What Tools To Clean Up a Large C/C++ Project?
An anonymous reader writes I find myself in the uncomfortable position of having to clean up a relatively large C/C++ project. We are talking ~200 files, 11MB of source code, 220K lines of code. A superficial glance shows that there are a lot of functions that seem to be doing the same things, a lot of 'unused' stuff, and a lot of inconsistency between what is declared in .h files and what is implemented in the corresponding .cpp files. Are there any tools that will help me catalog this mess and make it easier for me to locate/erase unused things, clean up .h files, and find functions with similar names?
Seriously, that's mid-sized at best.
So, figure out the layers or logical components between each module and then you will be able to chew smaller chunks.
Then, doxygen the whole lot, making sure to use dot to create the graphs for callers and callees. This will let you see the interaction points so you can see what impact a change in one method will have (ie which callers you have to check).
Some people will say "write unit tests" but frankly, it never works with a legacy code base, to effectively unit test you have to write your code differently to how you'd normally do it. You don't have that luxury here. So a good integration test suite should be developed to test the functionality of the whole thing, then you can repeat it to make sure your changes still work. Its not as instant as unit testing (but more effective) so you'll have to invest in a build system that regularly builds and runs the (automated) integration test and tells you the results - and commit changes reasonably regularly so you can isolate changes that end up breaking the system.
The rest of the task is simply hard work running through how it works and understanding it. There's no short-cuts to working hard, sorry.
Any decent IDE has the capability of pointing at least towards unused blocks of code and will generate a tree of function calls. I've worked with Eclipse and Xcode both of which have these capabilities. Even GCC (or another C compiler) can warn you about chunks of unused code or missing/bad header files. You can also rename functions across the entire codebase if necessary.
If your code has warnings or errors, continue fixing until the warnings are gone. As far as functions that do similar things but are named differently, that is a bit harder because 'looks like they are doing the same thing' doesn't always mean they ARE doing the same thing (if they have the exact same code, you could perhaps solve with statistical analysis or simply a text finder).
Make sure that if you replace a function that it has the same behavior in all cases. Even mediocre developers have learned that reuse existing code is a "good thing" and often different functions that do "the same thing" have edge cases (often undocumented) where it does behave differently (especially in C/C++ eg. difference in signedness, memory mapping method, characters etc)
Custom electronics and digital signage for your business: www.evcircuits.com
First off, 220k lines of source isn't that big.
You're not going to solve this with a big bang so get that idea out of your head. You're going to solve it gradually, and for a code base of that size it's going to take maybe a year of relatively slow improvement. Everyone on the team has to be on board, and every code review must include "What has been improved?" and "Did anything get worse? If so, that's not okay."
1) Pick your battles. The code you're not changing is code that doesn't need to be looked at. Address your pain points as they come up.
2) When you find a pain point while making a change, MAKE IT TESTABLE. Since you're in here making a usually simple fix, a single nominal test verifying that fix is fine. Testing anything else is a waste of time. Testable code will improve over time.
3) If you can't make code testable because of an intractable dependency graph, welcome to the hell of "Design Dead". The only way out of this scenario is a rewrite of that area.
4) Find your comfort level with regard to time boxing refactoring work. On my engagements, they just happen automatically, without explanation outside the team, nor apology to anyone. When estimating a piece of work, pad it with some extra time for cleanup. Only actually create work items for design dead areas. Your definition of done must include testable, tested and improved code.
5) Duplicate code in itself isn't evil, and inconsistencies are simply inevitable. If you find duplicate code, pick one and deprecate the rest. However, code that is tightly coupled to the deprecated code will need to be refactored and if the coupling traverses an extended dependency graph, you'll simply have to live with the duplication and just stop adding to it.