Slashdot Mirror


Ask Slashdot: What Tools To Clean Up a Large C/C++ Project?

An anonymous reader writes I find myself in the uncomfortable position of having to clean up a relatively large C/C++ project. We are talking ~200 files, 11MB of source code, 220K lines of code. A superficial glance shows that there are a lot of functions that seem to be doing the same things, a lot of 'unused' stuff, and a lot of inconsistency between what is declared in .h files and what is implemented in the corresponding .cpp files. Are there any tools that will help me catalog this mess and make it easier for me to locate/erase unused things, clean up .h files, and find functions with similar names?

14 of 233 comments (clear)

  1. You call that large? by Anonymous Coward · · Score: 5, Insightful

    Seriously, that's mid-sized at best.

    1. Re:You call that large? by Oxdeadface · · Score: 5, Insightful

      What compels comments like this? The first AC posts absolutely nothing of value, just wants to let everyone know that they disagree with a minor point that's completely irrelevant to the OP's question. Thanks for the insight, champ. The followup, probably the same person, goes on to ramble like an old fart telling a useless anecdote about his kid that's barely even related to the topic at hand. At what point did either of these seem like a good idea? Neither of these comments address the question being asked or even attempt to be useful at all. No one cares what you consider a large program and absolutely no one gives a shit about you or your fucking crotch fruit. These comments are just some sad cunt's way of claiming, "I'm more experienced and better than you." Fuck right off.

  2. Document first by gbjbaanb · · Score: 5, Insightful

    So, figure out the layers or logical components between each module and then you will be able to chew smaller chunks.

    Then, doxygen the whole lot, making sure to use dot to create the graphs for callers and callees. This will let you see the interaction points so you can see what impact a change in one method will have (ie which callers you have to check).

    Some people will say "write unit tests" but frankly, it never works with a legacy code base, to effectively unit test you have to write your code differently to how you'd normally do it. You don't have that luxury here. So a good integration test suite should be developed to test the functionality of the whole thing, then you can repeat it to make sure your changes still work. Its not as instant as unit testing (but more effective) so you'll have to invest in a build system that regularly builds and runs the (automated) integration test and tells you the results - and commit changes reasonably regularly so you can isolate changes that end up breaking the system.

    The rest of the task is simply hard work running through how it works and understanding it. There's no short-cuts to working hard, sorry.

    1. Re:Document first by bmajik · · Score: 5, Insightful

      This.

      One of my first professional programming projects was to take a look at the custom C++ billing software our company had purchased from a contract programmer.

      I had a long unix and programming background, and was back for a summer job after doing 1 semester of C++ in college.

      My boss told me, since I was the only one who had C++ experience, to start documenting the system.

      At the time, we were using IRIX, and so I was using the SGI compiler and tools suite, which were, I believe, licensed from EDG. The point is that there was a very nice call graph visualizer. This was helpful for understanding things at a superficial level.

      However, what was even better was just running the program a bunch of times on test data and seeing what it did while under the debugger.

      While my summer began with the task of documenting the system, as I learned things I'd report them to my boss.

      By the end of the summer, I had re-written some fundamental parts of the system; I'd moved some of the processing outside, and I pre-processed and pre-sorted the data.

      The overall execution time went from many hours to about 45 minutes to calculate monthly bills. THe key innovation was replacing the inner loop of the charge tabulation -- which was 2 or 3 levels of nested linked list traversal.

      Instead, I used the standard unix sort tools to pre-sort the data files before being loaded into the system, and I changed the system to use a data structure that supported a binary search.

      The majority of the code got left alone. By understanding the code under a debugger, and realizing that how it worked on production data was much different than how it performed on the test data it was originally delivered with, I was able to make a critical set of changes that had a huge impact.

      In general, I spend as much time as I can not writing code, but instead, understanding how the existing system works. For a current project, I've spent the last two weeks playing with somebody else's code, and now I've expanded it so that it can also operate on my data sets, and I've probably changed fewer than 100 lines across about 5 different projects.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
  3. Eclipse, Xcode or any IDE by guruevi · · Score: 5, Insightful

    Any decent IDE has the capability of pointing at least towards unused blocks of code and will generate a tree of function calls. I've worked with Eclipse and Xcode both of which have these capabilities. Even GCC (or another C compiler) can warn you about chunks of unused code or missing/bad header files. You can also rename functions across the entire codebase if necessary.

    If your code has warnings or errors, continue fixing until the warnings are gone. As far as functions that do similar things but are named differently, that is a bit harder because 'looks like they are doing the same thing' doesn't always mean they ARE doing the same thing (if they have the exact same code, you could perhaps solve with statistical analysis or simply a text finder).

    Make sure that if you replace a function that it has the same behavior in all cases. Even mediocre developers have learned that reuse existing code is a "good thing" and often different functions that do "the same thing" have edge cases (often undocumented) where it does behave differently (especially in C/C++ eg. difference in signedness, memory mapping method, characters etc)

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  4. Risky by Anonymous Coward · · Score: 2, Insightful

    This strikes me as a very risky undertaking. If there are a lot of functions/modules doing similar things, any attempt to combine many similar functions into one runs a huge risk of introducing bugs if you can't wrap your head around the entire program (which is unlikely imo). There is a huge time and budget risk in this endeavor.

  5. If it works, DO NOT FUCK WITH IT!!! by Anonymous Coward · · Score: 0, Insightful

    You admit you don't know what it's doing.

    But you want to "fix" it?

    HELLOOOO!!! Disaster awaits if you mess with code you don't understand.

    If it doesn't work, toss it.

    Either way, you're back to DO NOT FUCK WITH IT. At least not until you understand it. ALL of it.

    1. Re:If it works, DO NOT FUCK WITH IT!!! by OrangeTide · · Score: 3, Insightful

      Indeed! This is why writing a test for it, for ALL of it, would be a good start. Not only does one start to learn the deep details of the code when they are doing test development, without running the risk of creating new subtle bugs, at the end of the test writing exercise they also get the bonus of having a useful test suite.

      --
      “Common sense is not so common.” — Voltaire
    2. Re:If it works, DO NOT FUCK WITH IT!!! by HornWumpus · · Score: 3, Insightful

      Not possible.

      Sometimes you have a mess that you don't want to fuck with, but you have to.

      Don't combine the duplicate functions into one. Decide which one is the 'good one' then have all the others call it and fix up the results to match the alternative versions. Do this one at a time and test it to death.

      A plan that has worked for me is to separate the code into two piles. The application, which remains a fucking mess, and a library which only gets clean code. Eventually all the good stuff is in the library and you can just replace the calling mess with a new version.

      More basically: If you touch it, it will be your mess until you leave for a new job. Think long and hard if you don't want to stick this one on someone else.

      Unless management knows and publicly acknowledges the scope of the problem, don't touch it. You will be held responsible for breaking it, but fixing it will be invisible. Don't be a hero. Falling on grenades isn't fun (unless you are talking about the fat girl).

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
  6. Stricter compilation also an option by Codeyman · · Score: 3, Insightful

    Along with coverity as one of the commenters suggested, you can compile the code with stricter compilation options (like -Werror in gcc, which will error out if variables/functions are not used etc), you would then need to go through each of these files manually and resolve all the issues. Tools like bcpp can help you make sure your complete code base follows a common coding standard. Apart from that, if the name of the function is not indicative of what the function actually does, there are no tools smart enough to help you with that. You'd need to do a lot of cleanup manually by hand.

  7. Re:But are you lacking experience and the brain fo by Immerman · · Score: 3, Insightful

    Who said anything about doing the job? They're asking for suggestions for automated code analysis that can hilight potential "problem" areas/code duplication/etc. Seems like a common enough situation that someone may have made a tool for it. Automated *repair* would be a far more challenging task, but just hilighting potential inconsistencies and redundancy "hot spots" is something that could be done with fairly high false-positive/negative rates and still be extremely useful when faced with cleaning up an atrocious codebase.

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  8. Re:rm by hcs_$reboot · · Score: 1, Insightful

    I did, but then my system asks
    rm: remove regular file 'abc.c' ?
    Not yours?

    --
    Slashdot, fix the reply notifications... You won't get away with it...
  9. Re:rm by Anonymous Coward · · Score: 2, Insightful

    Get your build process under control. Then figure out which code is dead. ...

  10. I've done this far too many times by WinstonWolfIT · · Score: 4, Insightful

    First off, 220k lines of source isn't that big.

    You're not going to solve this with a big bang so get that idea out of your head. You're going to solve it gradually, and for a code base of that size it's going to take maybe a year of relatively slow improvement. Everyone on the team has to be on board, and every code review must include "What has been improved?" and "Did anything get worse? If so, that's not okay."

    1) Pick your battles. The code you're not changing is code that doesn't need to be looked at. Address your pain points as they come up.
    2) When you find a pain point while making a change, MAKE IT TESTABLE. Since you're in here making a usually simple fix, a single nominal test verifying that fix is fine. Testing anything else is a waste of time. Testable code will improve over time.
    3) If you can't make code testable because of an intractable dependency graph, welcome to the hell of "Design Dead". The only way out of this scenario is a rewrite of that area.
    4) Find your comfort level with regard to time boxing refactoring work. On my engagements, they just happen automatically, without explanation outside the team, nor apology to anyone. When estimating a piece of work, pad it with some extra time for cleanup. Only actually create work items for design dead areas. Your definition of done must include testable, tested and improved code.
    5) Duplicate code in itself isn't evil, and inconsistencies are simply inevitable. If you find duplicate code, pick one and deprecate the rest. However, code that is tightly coupled to the deprecated code will need to be refactored and if the coupling traverses an extended dependency graph, you'll simply have to live with the duplication and just stop adding to it.