Slashdot Mirror


Ask Slashdot: What Tools To Clean Up a Large C/C++ Project?

An anonymous reader writes I find myself in the uncomfortable position of having to clean up a relatively large C/C++ project. We are talking ~200 files, 11MB of source code, 220K lines of code. A superficial glance shows that there are a lot of functions that seem to be doing the same things, a lot of 'unused' stuff, and a lot of inconsistency between what is declared in .h files and what is implemented in the corresponding .cpp files. Are there any tools that will help me catalog this mess and make it easier for me to locate/erase unused things, clean up .h files, and find functions with similar names?

233 comments

  1. rm by Anonymous Coward · · Score: 4, Funny

    Who about "rm"?

    1. Re:rm by hcs_$reboot · · Score: 1

      The question is "rm what?"

      --
      Slashdot, fix the reply notifications... You won't get away with it...
    2. Re:rm by LifesABeach · · Score: 1, Funny

      My thoughts went to that energetic group that desire the H1B. Dangle one in front of the crowd and make your request.

    3. Re:rm by infernalC · · Score: 3, Funny

      -fr of course.

    4. Re:rm by Anonymous Coward · · Score: 0

      "sudo rm -rf /" is what you are looking for.

    5. Re:rm by davester666 · · Score: 1

      / ...make sure to execute it on the source control server first...

      --
      Sleep your way to a whiter smile...date a dentist!
    6. Re:rm by ArmoredDragon · · Score: 1

      I'd prefer -rf. Why? Because recursive and THEN force is better than just forcing it to be recursive.

    7. Re:rm by fahrbot-bot · · Score: 4, Funny

      Who about "rm"?

      Ah yes. Every *nix programmer has, hopefully only once, experienced the joy of the following:

      % rm * .o
      .o: No such file or directory

      --
      It must have been something you assimilated. . . .
    8. Re:rm by Z00L00K · · Score: 1

      No, "-rf *"

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    9. Re:rm by geantvert · · Score: 0

      Pleae, stop bashing the frenchs.

    10. Re:rm by geantvert · · Score: 1, Offtopic

      Ok! But your object files were properly removed so the command was successful.

    11. Re:rm by hcs_$reboot · · Score: 1, Insightful

      I did, but then my system asks
      rm: remove regular file 'abc.c' ?
      Not yours?

      --
      Slashdot, fix the reply notifications... You won't get away with it...
    12. Re:rm by nedlohs · · Score: 1, Offtopic

      There was always one student who when tarring up their code for submission for an assignment would leave off the tar file name:

      tar -cf *

      I hope that first file wasn't important :)

      Of course then they'd see there was no tar file when they tried to submit and would then run:

      tar -cf assignment.tar *

      Not noticing that that first file isn't what they thought it was anymore.

    13. Re:rm by Anonymous Coward · · Score: 0

      Thanks, buddy! I happen to also have a codebase that needs a clean-up run. Let me

      NO CARRIER

    14. Re:rm by Anonymous Coward · · Score: 2, Insightful

      Get your build process under control. Then figure out which code is dead. ...

    15. Re:rm by KiloByte · · Score: 2

      rm "-rf *"
      rm: invalid option -- ' '
      Try 'rm --help' for more information.

      Neither the asterisk nor the space are valid options.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    16. Re:rm by jeffmeden · · Score: 1, Funny

      you mean you dont have `alias rm="rm -rf"` in your bash.rc file? Pansy.

    17. Re:rm by plopez · · Score: 0

      REAL sysadmins don't make backups. The reinstall and reconfigure everything from scratch.

      --
      putting the 'B' in LGBTQ+
    18. Re:rm by qzzpjs · · Score: 1

      I thought that was to make it go Real Fast...

    19. Re:rm by Anonymous Coward · · Score: 0

      Pleae, stop bashing the frenchs.

      Who said anything about mustard?

    20. Re:rm by kthreadd · · Score: 1

      No that's what dd does.

    21. Re:rm by PincushionMan · · Score: 0

      Agreed. Period, as in rm -rf ., works great, though, because it'll delete the current directory, including '..'. the parent directory. It repeats recursively just like you asked.
      A friend of mine tried to wipe out all the dot files and dot directories in his home directory as root by typing rm -rf .* He got a little more deleted than he bargained for in the process, thankfully for him he was at the console saying, "Hmm, this is taking an awfully long time...".

      Personally, I prefer rm -R ., even though it is a FreeBSD-ism.

    22. Re: rm by Anonymous Coward · · Score: 0

      % rm * ~

    23. Re:rm by Anonymous Coward · · Score: 0

      +1

      Aliases ftw

    24. Re:rm by Anonymous Coward · · Score: 0

      Solution; First create a temporary directory. Move the stuff in. Recursively delete that directory.

      This is the safest way to delete stuff. It also have the upside of letting you review what going to be deleted. It is the precursor of the 'recycle bin'. And also is superior the the 'recycle bin' because you can have as many as you like on any file system.

    25. Re:rm by Z00L00K · · Score: 0

      The quotes weren't supposed to be typed...

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    26. Re:rm by cheesybagel · · Score: 0

      I have done rm -rf / opt as root once. even better.

    27. Re:rm by Anonymous Coward · · Score: 0

      But it's sub-par mustard, doncha know.

    28. Re:rm by fahrbot-bot · · Score: 0

      you mean you don't have `alias rm="rm -rf"` in your bash.rc file? Pansy.

      I use tcsh - lemming.

      --
      It must have been something you assimilated. . . .
    29. Re:rm by Anonymous Coward · · Score: 0

      Nope, never done that. I always 'make clean', which is far harder to get wrong.

      Besides, surely you've committed everything to version control?

    30. Re:rm by andremerzky400 · · Score: 0

      story time :) I once created a dir in /etc/, by accident, with a control character. The dir was '^.' I stupidly ran 'rm -rf ^.' on that thing. Took quite a while to recover the system (we managed to recover w/o reboot! Yay!)

    31. Re:rm by Anonymous Coward · · Score: 0

      Ah, no. I had a lot of edited files with a tilde appended. Do not ever mistype "rm * ~"

    32. Re: rm by Anonymous Coward · · Score: 0

      Liar! Nobody he's fish. Even the author of that shell uses bash.

    33. Re: rm by Anonymous Coward · · Score: 0

      Haha, kindle autocorrect make laugh me

    34. Re:rm by Anonymous Coward · · Score: 0

      Why, rms of course!

    35. Re:rm by _merlin · · Score: 1

      Fuck no! You should never alias in -i like that, because you'll learn to depend on it and then fuck yourself over when your in a shell without those aliases. Those aliases are downright dangerous.

    36. Re:rm by Anonymous Coward · · Score: 0

      Slashdot needs a idiot / parse troll moderation.
      Burning jesus on a pogo stick, Are you retarded?
      You are a fucking pinhead.

    37. Re:rm by TechyImmigrant · · Score: 1

      Who about "rm"?

      No. He should re-implement it in Haskell. Then no one will be able to recover it.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    38. Re:rm by ctrl-alt-canc · · Score: 1

      Real BOFD use 'alias ls="rm -fr"'

    39. Re:rm by Anonymous Coward · · Score: 0

      And make sure, you don't forget the hidden files:
      # rm * .* -rf

    40. Re:rm by Anonymous Coward · · Score: 0

      In what year did that happen to your friend? No modern implementation would allow something like that. And by "modern", I'm very lax, you could probably go back at least 20 years ago and that would still be true.

    41. Re:rm by americanpossum · · Score: 1

      I'm glad to say that I've never experienced this particular joy, but I do enjoy "make clean" much better than anything with "rm *" in it. 8-)

    42. Re: rm by Anonymous Coward · · Score: 0

      No need to have -rf specified on the command line, just touch a file named -rf and let the shell glob pass it to rm for you.

    43. Re:rm by countach · · Score: 1

      -rf /

  2. Fire by buserror · · Score: 1

    Or laser from orbit!

    1. Re:Fire by Anonymous Coward · · Score: 0

      What happened to the nuke?

    2. Re:Fire by kwiecmmm · · Score: 2

      Nuke it from orbit. It's the only way to be sure.

  3. Static analysis tools... by underqualified · · Score: 5, Informative

    If you're company is willing to pay for it, you can get something like Coverity. On the free(as in beer) side there is CppCheck and clang.

    1. Re:Static analysis tools... by Z00L00K · · Score: 1

      Programmers combined with food and beverage.

      Just declare which standard you want to reach first.

      But sometimes it's way easier to analyze how the current system works and then write a new one. Just figure out which parts that actually contains useful stuff and use that as a template.

      --
      If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
    2. Re:Static analysis tools... by tjb6 · · Score: 2

      Coverity will certainly tell you a lot of things that are broken, but probably wont help you decide how to fix them.
      Brain power is probably the best approach to this one, although some automated detection of unused code and paths won't hurt.
      Amy number of other static analysers will do the same job.

    3. Re:Static analysis tools... by MadKeithV · · Score: 2

      If you're company is willing to pay for it, you can get something like Coverity. On the free(as in beer) side there is CppCheck and clang.

      Coverity is expensive, slow, and failed to successfully compile any of our large real-world projects ("large" here meaning tens of thousands of files). This was with their own consultants / sales people on-site to babysit the process. They couldn't do it and couldn't figure out why it wasn't working. From their explanations it also seemed that on a properly large code-base you'd have to spend a long time tweaking the output to rid yourself of spurious messages/warnings.

      From experience, the best way to clean up a large project is to not actually mess it up in the first place. If you think it's messed up anyway, then the first thing you need to do is to think long and hard about cost/benefit. A large, specialized company like Coverity can't actually make a specialized compiler/linker that works for all possible correct C++ programs. C++ is notoriously hard to handle "automatically" unless you follow certain strict rules, and if the project is already that messy it's unlikely that any automatic tool is going to make that much of it.
      Where do you want to go with this project? Is it actually working fine now? Does it need changes? Can you afford these changes? If the answers to all of these are "yes" then I think your best bet is good old elbow grease. Start adding unit tests for the code, and then manually start cleaning up the code. Just the act of adding unit tests will teach you a lot about the dependencies of the code.

  4. But are you lacking experience and the brain for i by Anonymous Coward · · Score: 0

    Because these two parts are crucial. Just like there are no "suffiiciently smart compilers", there are no "sufficiently smart optimizers/cleaners" that can do the job for you. You'll have to roll your sleeves up for this one.

  5. CLion by Anonymous Coward · · Score: 3, Interesting

    https://www.jetbrains.com/clion/

  6. You call that large? by Anonymous Coward · · Score: 5, Insightful

    Seriously, that's mid-sized at best.

    1. Re:You call that large? by Anonymous Coward · · Score: 1

      That's the one. Mod this up. I'm sure I'll get negative mod points for telling it like it is, but I get the impression you're just seeing something out of your comfort range for the first time; don't worry, you'll get over it in time and laugh when you look back. In the meantime, you need to do is redefine your concept of large project and adjust your skill and comfort level accordingly. I just taught my 8 year old son logo, he has figured out basic logic control, but still has to produce anything significant, I still get a kick out of his shock and awe as he declares my 150 line programs are epic.

    2. Re:You call that large? by Oxdeadface · · Score: 5, Insightful

      What compels comments like this? The first AC posts absolutely nothing of value, just wants to let everyone know that they disagree with a minor point that's completely irrelevant to the OP's question. Thanks for the insight, champ. The followup, probably the same person, goes on to ramble like an old fart telling a useless anecdote about his kid that's barely even related to the topic at hand. At what point did either of these seem like a good idea? Neither of these comments address the question being asked or even attempt to be useful at all. No one cares what you consider a large program and absolutely no one gives a shit about you or your fucking crotch fruit. These comments are just some sad cunt's way of claiming, "I'm more experienced and better than you." Fuck right off.

    3. Re:You call that large? by Anonymous Coward · · Score: 1

      I suggest studying the code for a long while before doing anything to it... especially anything automated. Ask your manager for some time to just study the code before making changes. A superficial glance will always give you the impression of chaos. You will probably find that familiarizing yourself with the code will make it seem less disorganized, and you will understand why things are the way they are. If you're not comfortable with code-bases this size or larger (believe me much larger code-bases are common) then you might not be in a good position to judge the high-level organization of the code. It often happens that stupid-looking things are there for good reason (hopefully commented, but...) and trying to fix them will just be a wild goose chase while you rediscover issues the previous developers encountered, and then rediscover their solutions and end up putting the code back the way it was.

      While you're studying the code, improve the comments/documentation, especially anything that is confusing (if you find it confusing someone else probably will too) and write tests if there aren't any.

      When you do start making changes, check them in in small chunks. Don't make massive changes to half the source files and check them in with "code clean up" as the check-in comments. Each change should make sense on it's own. Have each change reviewed by another developer on your team. Put a short comment in each commit. Have another developer synch occasionally and verify you haven't broke anything. Ideally anyone can synch at any time and get a working build (or at least no less working than before). Some changes require you to edit a lot of files, e.g. moving files, renaming headers, renaming in general... Make those changes separately and coordinate when you commit them so you don't screw other people working on the code base.

      This is a good way to go about it even if you're the only one working on the code base. When you inevitably make a change with unintended consequences, it'll be much easier to find and fix if you have made incremental changes.

      Good luck.

    4. Re:You call that large? by Anonymous Coward · · Score: 0

      Prrety much. Since we're talking legacy code, it most likely doesn't use any of the new C++11/14 features that make the language closer in abstraction level to something like Java, sometimes even stepping on the heels of Python and the like. It's probably an old amalgamation of "C with classes" and maaaybe some templates. In220K lines, the thing cannot really do *that* much.

    5. Re:You call that large? by Anonymous Coward · · Score: 0

      Midsized for a bloated and verbose language.

      That is morbidly obese for a large project written in a clean, concise language.

  7. clang static code analysis by Anonymous Coward · · Score: 5, Informative

    scan-build and scan-view from clang++ will show you what is being used and what isn't as far as static code analysis goes.

  8. Easy by Anonymous Coward · · Score: 2, Funny

    cd Large_Cplusplus_project

    sudo rm -r *

    sudo apt-get install java

    1. Re:Easy by Anonymous Coward · · Score: 1

      This will tripple his codebase with long method names alone.

    2. Re:Easy by blue9steel · · Score: 2

      Step 4, call sysadmin and complain about slowness of execution of new Java app running on hardware that was sized for C++ code.

    3. Re:Easy by Anonymous Coward · · Score: 0

      Thats the ticket... replace code with a ticking zero day bomb. Might as well add some Flash and some word documents in there as well if we are going to be completely useless (all served by php).

    4. Re:Easy by Anonymous Coward · · Score: 1

      Too bad java is far more verbose than the nastiest C++ and C++ is a bloated pig.

    5. Re:Easy by Anonymous Coward · · Score: 0

      WTF?

      Java sucks but the security issues are browser-based and 99.9999% of java code is server side.

      Dumbass

    6. Re:Easy by Anonymous Coward · · Score: 0

      echo "Great now we have a VM f****** things up and no memory management, plus a new third-party runtime requirement on every system. If that wasn't enough s/he also DELETED THE DAMN ORIGINAL CODE FROM THE DEVELOPMENT SERVER so now we have to fetch the off-site backup. Thanks $BLEEPING_IDIOT !" | mail -s "Fire this idiot" development-head && sudo usermod --lock --shell /usr/bin/false $BLEEPING_IDIOT_USRNAME

  9. Document first by gbjbaanb · · Score: 5, Insightful

    So, figure out the layers or logical components between each module and then you will be able to chew smaller chunks.

    Then, doxygen the whole lot, making sure to use dot to create the graphs for callers and callees. This will let you see the interaction points so you can see what impact a change in one method will have (ie which callers you have to check).

    Some people will say "write unit tests" but frankly, it never works with a legacy code base, to effectively unit test you have to write your code differently to how you'd normally do it. You don't have that luxury here. So a good integration test suite should be developed to test the functionality of the whole thing, then you can repeat it to make sure your changes still work. Its not as instant as unit testing (but more effective) so you'll have to invest in a build system that regularly builds and runs the (automated) integration test and tells you the results - and commit changes reasonably regularly so you can isolate changes that end up breaking the system.

    The rest of the task is simply hard work running through how it works and understanding it. There's no short-cuts to working hard, sorry.

    1. Re:Document first by laughingskeptic · · Score: 3, Informative

      This will find the static interaction points, but will miss the dynamic interaction points. He also has to watch for callbacks and methods present to satisfy oddball templates in C++, methods that will be invoked as a result of casts, etc.

    2. Re:Document first by Anonymous Coward · · Score: 0

      Having not waded into a quagmire so large (220k lines of code) would your method take more or less time than just rewriting the whole thing while using known good components from the original code?

      Just curious. If there is so much inconsistency and redundancy, would a partial rewrite including known good bits be faster?

    3. Re:Document first by Anonymous Coward · · Score: 1

      Yes, it's difficult to write unit tests for old code.

      What does that tell me? It says don't touch the old code! If you can't test it properly, you're going to break something when you refactor it.

      Better to leave it be. When you add a feature or fix bug, _then_ refactor all the things you touch and add regression tests. Over time you'll slowly clean up the code base. More importantly, it's only during those moments when you'll [briefly] understand the relevant program behavior and actually be in a position to know whether to merge code or keep it separate, refactor some logic, etc.

      But going into a large code base and refactoring "just because" is seriously stupid.

    4. Re:Document first by bmajik · · Score: 5, Insightful

      This.

      One of my first professional programming projects was to take a look at the custom C++ billing software our company had purchased from a contract programmer.

      I had a long unix and programming background, and was back for a summer job after doing 1 semester of C++ in college.

      My boss told me, since I was the only one who had C++ experience, to start documenting the system.

      At the time, we were using IRIX, and so I was using the SGI compiler and tools suite, which were, I believe, licensed from EDG. The point is that there was a very nice call graph visualizer. This was helpful for understanding things at a superficial level.

      However, what was even better was just running the program a bunch of times on test data and seeing what it did while under the debugger.

      While my summer began with the task of documenting the system, as I learned things I'd report them to my boss.

      By the end of the summer, I had re-written some fundamental parts of the system; I'd moved some of the processing outside, and I pre-processed and pre-sorted the data.

      The overall execution time went from many hours to about 45 minutes to calculate monthly bills. THe key innovation was replacing the inner loop of the charge tabulation -- which was 2 or 3 levels of nested linked list traversal.

      Instead, I used the standard unix sort tools to pre-sort the data files before being loaded into the system, and I changed the system to use a data structure that supported a binary search.

      The majority of the code got left alone. By understanding the code under a debugger, and realizing that how it worked on production data was much different than how it performed on the test data it was originally delivered with, I was able to make a critical set of changes that had a huge impact.

      In general, I spend as much time as I can not writing code, but instead, understanding how the existing system works. For a current project, I've spent the last two weeks playing with somebody else's code, and now I've expanded it so that it can also operate on my data sets, and I've probably changed fewer than 100 lines across about 5 different projects.

      --
      My opinions are my own, and do not necessarily represent those of my employer.
    5. Re:Document first by Anonymous Coward · · Score: 0

      When you add a feature or fix bug, _then_ refactor all the things you touch and add regression tests. Over time you'll slowly clean up the code base. More importantly, it's only during those moments when you'll [briefly] understand the relevant program behavior and actually be in a position to know whether to merge code or keep it separate, refactor some logic, etc.

      This exactly what we've been doing with the MySQL codebase.

      Fun fact: For the MySQL 5.6 tree, that's about 750MB, 18,000+ files, or about 16 million LOC, choose your flavour.

    6. Re:Document first by boristhespider · · Score: 3, Interesting

      No.

      It would also be stamped on by management and any competent product owner, unless it was absolutely dripping in tests before he embarked on anything of the sort. If the code is producing the desired numbers but is simply a total and utter mess, no-one is going to thank him for declaring he's going to rebuild it from scratch, and the only way it would be sanctioned at all is if he could absolutely guarantee the same numbers before and after (to within rounding and ordering error). Given the state of the codebase he's talking about, those tests would have to be end-to-end tests since as others have noted writing unit tests for legacy code is in general a thankless and time-consuming task. (Then again, so is attempting to build end-to-end tests that satisfy every useful codepath.)

      I genuinely have no idea how large the codebase at my company is; at a guess I'd wager we're in the many millions of lines of code (certainly enough to render Intellisense an utterly useless, chugging, unusable piece of shit) and quite possibly more. Some of it is really quite good code with thorough unit test coverage -- that tends to be the more recent stuff. The rest is covered, in principle if not in reality, by a large number of end-to-end tests that at the very least exercise some extremely fragile pieces of code quite effectively. Even with this, rampant refactoring is discouraged, let alone rampant rewriting. It soaks up developer time we can't afford to spend, and the danger of hitting a bug that isn't covered by our end-to-end tests (or, even more infuriatingly, fixing a bug that clients have grown to trust the results of) is pretty high. Unless there's a very good reason to out and out rewrite, it's to be very much discouraged. Careful refactoring, once every self-contained block of work rerunning all the unit tests and as many of the end-to-end tests as is practical, is about the only way to proceed.

    7. Re:Document first by TechyImmigrant · · Score: 2

      If I was asked to 'fix' or 'clean up' a codebase, I'd refuse.

      1) 'fixed' or 'cleaned up' is not well defined.
      2) One you've changed it to your definition of 'fixed' it's going to be jibberish to the next guy.
      3) You don't fix code, you own code. Your management should be asking you to own the code so you can nurture is and improve it. Fixing things is one aspect of improving code.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    8. Re:Document first by Anonymous Coward · · Score: 0

      So you actually understood the code you were changing? If only more people did that. I had a few colleagues who did not bother reading the code they extended/changed. Instead they shoveled the new crap in and when code crashed, they added a static variable to flag "exceptional" path and continued until the code sort of worked. Eventually the codebase is littered with static variables which continue to keep state from previous runs, which fires back each and every time something changes.

    9. Re:Document first by Joey+Vegetables · · Score: 1

      I'll agree on points 1 and 3. However, I would not consider code to be "clean" until it is readily understandable by any average programmer with a basic level of domain knowledge.

    10. Re:Document first by TechyImmigrant · · Score: 1

      I'll agree on points 1 and 3. However, I would not consider code to be "clean" until it is readily understandable by any average programmer with a basic level of domain knowledge.

      That's a lofty goal, but good to aim for.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    11. Re:Document first by romons · · Score: 1

      <p>I spent a year 'modularizing' a big chunk of the cisco router source base. It was driven by the CTO, who wanted more modularity and code ownership, dammit! It was a terrible idea, caused lots of bugs, and made the code harder to understand and maintain. I did win a prize for the effort, though.
      <p>In my opinion, code cleanup on legacy code is rarely going to pay for itself. Even rewrites from scratch usually fail miserably. Legacy software is the way it is for a reason. If you screw with it, you are increasing the system's entropy, which is almost always bad.

      --
      Go to Heaven for the climate, Hell for the company -- Mark Twain
  10. Eclipse, Xcode or any IDE by guruevi · · Score: 5, Insightful

    Any decent IDE has the capability of pointing at least towards unused blocks of code and will generate a tree of function calls. I've worked with Eclipse and Xcode both of which have these capabilities. Even GCC (or another C compiler) can warn you about chunks of unused code or missing/bad header files. You can also rename functions across the entire codebase if necessary.

    If your code has warnings or errors, continue fixing until the warnings are gone. As far as functions that do similar things but are named differently, that is a bit harder because 'looks like they are doing the same thing' doesn't always mean they ARE doing the same thing (if they have the exact same code, you could perhaps solve with statistical analysis or simply a text finder).

    Make sure that if you replace a function that it has the same behavior in all cases. Even mediocre developers have learned that reuse existing code is a "good thing" and often different functions that do "the same thing" have edge cases (often undocumented) where it does behave differently (especially in C/C++ eg. difference in signedness, memory mapping method, characters etc)

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
    1. Re:Eclipse, Xcode or any IDE by iplayfast · · Score: 1

      I've been dealing with a large project that covers and I've found that QtCreator is excellent. It's fast (which means it's better the eclipse) With gcc it will point out unused variables. And it has refactoring, which makes the job much easier. DOxygen is also good for getting a layout of the whole program.

    2. Re:Eclipse, Xcode or any IDE by Noughmad · · Score: 1

      I haven't used the others much, but here I must recommend KDevelop for its code browsing capabilities. I have worked on several big C++ projects (mostly small changes, not full-on refactoring), and it really helps you get into the code quickly. It doesn't have much in the way of refactoring tools that I would know of, but it's _great_ for looking at code.

      --
      PlusFive Slashdot reader for Android. Can post comments.
  11. Risky by Anonymous Coward · · Score: 2, Insightful

    This strikes me as a very risky undertaking. If there are a lot of functions/modules doing similar things, any attempt to combine many similar functions into one runs a huge risk of introducing bugs if you can't wrap your head around the entire program (which is unlikely imo). There is a huge time and budget risk in this endeavor.

  12. If you don't know what it does, don't touch it. by BlueKitties · · Score: 5, Interesting

    Seriously, you never know when some previous programmed made a "duplicate" function to do something bizarre, like force a particular initialization order of static-class-member variables between translation units. Sometimes deleting pointless code can do... terrible things. Just be careful, test your changes, etc.

    --
    "Sorrow is better than laughter, for by sadness of face the heart is made glad." [Ecclesiastes 7:3]
    1. Re:If you don't know what it does, don't touch it. by Hognoxious · · Score: 1

      Seriously, you never know when some previous programmed made a "duplicate" function to do something bizarre

      Oh, come on! If somebody did that there'd be a comment, right at the top, explaining it.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  13. Unit tests by Midnight+Thunder · · Score: 4, Interesting

    While I dislike writing unit tests, I have to admit they are useful in protecting your butt when something breaks, since the test should catch it first. Of course you need to decide whether in a particular scenario they add value or just make you manager happy.

    In a case like yours, you can make code modifications and hope nothing breaks or build unit tests and ensure that you don't break any of them when refactoring. Initially rather than just ripping out the seemingly duplicate methods, rip out/tweak their implementation and have them point to what they seems like a the right method to provide the common functionality. If your unit tests show breakage, then you know that you missed something.

    If you do things wholesale, then you are likely to break something in an unmanageable way. Oh and make sure things are version controlled ;)

    --
    Jumpstart the tartan drive.
    1. Re:Unit tests by gstoddart · · Score: 5, Interesting

      I've maintained several legacy code bases over the years.

      And I will flat out tell you that unit tests have VERY limited utility in terms of understanding a mess of code you inherited. At least, in the beginning.

      Sure, you can start with a couple of basic premises, and you can convince yourself those basic premises still work.

      But the initial grokking of your code, understanding all places where a function may be used, understanding all of the tricky bits and gotchas, trying to understand why there are 9 functions which look like they do the same thing? That takes some time and effort, and quite possibly some tools.

      Unit tests are great for starting to build up a few things, and move towards better stuff ... but in a system which has several hundred (or several thousand) functions and interactions, resulting in really large numbers of code paths ... having a few unit tests describing the stuff you understand doesn't mean all of the stuff you don't understand wasn't broken, simply because you don't know what you don't know.

      So it is important to understand your new unit tests on legacy code are, at best, a VERY incomplete view of your code. That will improve over time, but you could potentially need to write a few thousand of them to be sure you're not breaking anything in the big picture.

      If you do things wholesale, then you are likely to break something in an unmanageable way. Oh and make sure things are version controlled ;)

      Oh, yes .... This .. for the love of god, this.

      You should learn how to tag branches and the like in your version control so you can identify a baseline of "before I ever touched anything" and then be able to cleanly build everything which predates you, as well as building your "after refactoring this part".

      Branching/tags/whatever your version control calls it -- that doesn't take up much space, so use them often, and consistently. Let the tool do the heavy lifting of keeping track of what you've changed.

      You do NOT want to find yourself unable to build it as it existed, or identify all of the diffs between what you started with and what you have.

      --
      Lost at C:>. Found at C.
    2. Re:Unit tests by eulernet · · Score: 0

      unit tests have VERY limited utility in terms of understanding a mess of code you inherited

      Totally agree with that !

      In fact, most legacy code cannot be unit-tested, since the code has never been designed to be tested.
      Adding unit tests requires that the routines are cleanly cut.
      Since it's rarely the case, refactoring code could be extremely difficult.

      Writing tests for new parts of code is good practice, especially if you have to maintain your code in the future, but it's useless if the code already runs since a long time.

      I have a way to attack legacy projects: I try to simplify/optimize the code in order to own it.
      Perhaps in your case, should you try to split the routines in smaller sources.

    3. Re:Unit tests by gstoddart · · Score: 2

      I agree about some code being unit-test-proof. I've definitely encountered some.

      For the original poster ... start with backups, so you 100% isolate yourself from your own stupidity ... and I'm not calling you stupid, I'm saying everyone who has ever done this has had that "oh, crap, did I just do that?" moment. Plan for it now so you don't have to try to deal with it later.

      Then spend a lot of time simply going through the code. Using something like FreeMind or a giant whiteboard to map out the high level stuff. Take paper notes. Lots of them. Spend a lot of time reading it, getting familiar with it, and developing a mental understanding of it.

      Understand the hierarchy, the modules, and the high level stuff. Pick a few modules and delve into them. Dissect them to the point you can start to understand how the pieces fit together, and at least have a roadmap. You should be able to draw a diagram which broadly describes the chunks of functionality in your sleep.

      If you are trying to make code changes on day one, you're doing it wrong. If your boss expects you to be doing code changes on day one, he's an idiot who doesn't understand what you're being asked to do.

      I would say that easily the first few weeks (if not more depending on the code) should be spent doing nothing more than reading and trying to understand. And then doing it some more. Be prepared to walk through with a debugger just to confirm what you think is true -- surprisingly, it often isn't when dealing with someone else's code.

      Think of this as being as much archaeology as a technical exercise ... you are sifting through layers of code, likely built up over the course of years, and which has a very good chance of having its own unique nature and strangeness.

      First, grasshopper, seek understanding. Then, accept that your understanding is incomplete. Then seek more understanding. =)

      It's like trying to understand alien technology ... you could put an eye out if you aren't fully sure you have learned what it really does. ;-)

      --
      Lost at C:>. Found at C.
    4. Re:Unit tests by Forgefather · · Score: 1

      "In fact, most legacy code cannot be unit-tested, since the code has never been designed to be tested."

      We are running into this issue right now where I work. We have two different systems we use to determine pricing and one of them is closing on 30 years old. The code has several access points that mean unit tests have to done in several different formats in order to properly assess the changes making automated testing a nightmare.

      In our other system we don't have the same problem and have a program that allows us to pull data straight from prod to test the changes with a bombardment of real data before ever releasing our code into the testing environment. Needless to say this environment is far more stable.

      If I had a recommendation for the poster it would be to establish a similar automated testing tool that would allow you to compare the results of large amounts of production data after each change is introduced to have a much higher chance of catches fringe cases and not piling up a stockpile of bugs to be discovered at a later date.

      --
      "There are lies, there are damn lies, and there are statistics"
  14. graphviz by Anonymous Coward · · Score: 3, Informative

    graphviz can visualize the inter-functional and inter-file dependencies.

    It's free and built into the functionality of doxygen.

    I'd recommend recommenting all the functions using doxygen - because to clean up a large project you need to know it.

  15. Lots of time, lots of money by Anonymous Coward · · Score: 0

    There's no one-click "solution" that's going to rewrite your code for you. I'm assuming the program design isn't well-documented other than maybe an occasional inline comment here and there, so you basically can't do much until you know exactly what the program is supposed to do and how it's currently doing it. Then you can start to identify unused functions. Until then, it's best not to make any changes just because it *looks* like something is unused.

    It's also good to be able to compare a massive cleanup project with rebuilding from scratch. If the codebase is truly as fucked up as you think it is, rebuilding from scratch could be a viable alternative. Especially if the cleanup looks like it would take 6 months to a year, plus extensive testing to make sure stuff didn't break.

  16. Looks like a reverse engineering project by prefec2 · · Score: 4, Interesting

    Modularize the software. There are a lot of tools which can help you to analyze static dependencies in the code which can help you to identify components. You could also use a run-time analysis tool for example Kieker which is initially for Java, but there is an extension for C/C++.

    1. Re:Looks like a reverse engineering project by vikingpower · · Score: 1

      Finally. Finally someone popped up who pronounced the word "component". Mod parent up into fucking heaven. Finafuckingly.

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
  17. Your brain and the compiler. by Anonymous Coward · · Score: 0

    Use your brain while exploring the code.

    When you find code that you think is irrelevant, remove it.

    Try compiling.

    If the compilation fails, see what's actually using the code you thought was unused. Remove it, too, if you think it is unused.

    Try compiling.

    Repeat as much as is necessary.

    When everything is compiling, make sure the software still works. Make sure you didn't remove any code that is dynamically loaded, too.

    Commit your changes to your source control system.

    Repeat as often as is necessary.

  18. it is not a large codebase, you can fix things up by Anonymous Coward · · Score: 0

    Hi,
    200 K lines is not a very large codebase, you can fix it up with emacs or just some grepping (or check the GNU idutils, ctags, etags, etc but not really needed for 200K lines), and some good regex. Auto-open all files matching the grep pattern with your editor, then apply the appropriate regular expression to all open files/buffers, or just check them one by one (it's better, and faster than you'd think, it's still a size you can manage by hand).

    Good luck!

  19. Document first by Anonymous Coward · · Score: 1

    Doxygen was my first thought as well.

  20. simple by Tablizer · · Score: 0

    Python :-)

  21. If it works, DO NOT FUCK WITH IT!!! by Anonymous Coward · · Score: 0, Insightful

    You admit you don't know what it's doing.

    But you want to "fix" it?

    HELLOOOO!!! Disaster awaits if you mess with code you don't understand.

    If it doesn't work, toss it.

    Either way, you're back to DO NOT FUCK WITH IT. At least not until you understand it. ALL of it.

    1. Re:If it works, DO NOT FUCK WITH IT!!! by OrangeTide · · Score: 3, Insightful

      Indeed! This is why writing a test for it, for ALL of it, would be a good start. Not only does one start to learn the deep details of the code when they are doing test development, without running the risk of creating new subtle bugs, at the end of the test writing exercise they also get the bonus of having a useful test suite.

      --
      “Common sense is not so common.” — Voltaire
    2. Re:If it works, DO NOT FUCK WITH IT!!! by HornWumpus · · Score: 3, Insightful

      Not possible.

      Sometimes you have a mess that you don't want to fuck with, but you have to.

      Don't combine the duplicate functions into one. Decide which one is the 'good one' then have all the others call it and fix up the results to match the alternative versions. Do this one at a time and test it to death.

      A plan that has worked for me is to separate the code into two piles. The application, which remains a fucking mess, and a library which only gets clean code. Eventually all the good stuff is in the library and you can just replace the calling mess with a new version.

      More basically: If you touch it, it will be your mess until you leave for a new job. Think long and hard if you don't want to stick this one on someone else.

      Unless management knows and publicly acknowledges the scope of the problem, don't touch it. You will be held responsible for breaking it, but fixing it will be invisible. Don't be a hero. Falling on grenades isn't fun (unless you are talking about the fat girl).

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    3. Re:If it works, DO NOT FUCK WITH IT!!! by NewWorldDan · · Score: 1

      plan that has worked for me is to separate the code into two piles. The application, which remains a fucking mess, and a library which only gets clean code

      No, I've tried that. I have an ecosystem of 23 applications that make my project work. There are now 6 separate libraries of various generations that need to be maintained. Ugh. I just don't have the staff to clean everything up while getting everything done that needs to get done.

    4. Re:If it works, DO NOT FUCK WITH IT!!! by HornWumpus · · Score: 1

      You have six generations of code still live?

      Your problem is more basic. Nobody ever finishes anything. No code is ever retired.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    5. Re:If it works, DO NOT FUCK WITH IT!!! by NewWorldDan · · Score: 1

      Oh, it gets worse. I had a programmer that created some circular dependencies between the libraries. It was literally impossible to compile everything from scratch for a while. "But I can compile it just fine" That's because you have the current copies of everything on your machine. I deleted them and told him not to commit anything else until everything compiled cleanly.

    6. Re:If it works, DO NOT FUCK WITH IT!!! by wisnoskij · · Score: 1

      I will not pretend to have 10% the experience you likely have, but I would almost suggest leaving the existing program for reference and just start writing your own. Take what is programmed well from the original, and reference the original constantly.

      --
      Troll is not a replacement for I disagree.
    7. Re:If it works, DO NOT FUCK WITH IT!!! by Anonymous Coward · · Score: 0

      You've got a job in consulting? I don't understand something so lets rewrite it from scratch and make all the same mistakes.

    8. Re:If it works, DO NOT FUCK WITH IT!!! by HornWumpus · · Score: 1

      Sorry, but the basic problem is management.

      Until actually finishing things is a management priority you will get the same results. Your staff should get better jobs.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
  22. A magnetized needle & a steady hand by Anonymous Coward · · Score: 0

    Or butterflies...

  23. easy solution by Anonymous Coward · · Score: 1

    Find a bug and call your team irresponsible // fork it to libre[your product name] // Upload the source to OpenBSD repo.

    1. Re:easy solution by Anonymous Coward · · Score: 0

      hah hah, i had a good chuckle. (i use openbsd daily btw, great OS).

  24. Does Lint Exist anymore by Anonymous Coward · · Score: 0

    I used it a log time ago and it was excellent

    1. Re:Does Lint Exist anymore by OrangeTide · · Score: 4, Informative

      Compiler warnings have mostly caught up with the capabilities of Lint. There are some things Lint still does, but there are lots of things it warns about that have, as far as I know, never been the cause of a real bug. Getting a project to be 100% warning free with gcc -Wall is possible, and usually possible with -Wextra (maybe not so much with g++). The warnings usually are valuable, and I've personally seen bugs that could have been caught with gcc's warnings. Other compilers have other warnings and personalities, but I think it's worthwhile to investigate using warnings to check out a project with any compiler.

      --
      “Common sense is not so common.” — Voltaire
    2. Re:Does Lint Exist anymore by Megane · · Score: 1

      Also you may have to turn on optimization for your compiler to report certain warnings. I know this has happened to me before with gcc. Do your first -O2 build since a few weeks ago, and you will probably see some warnings. I've even had warnings that only showed up after I tried a 64-bit build. Also, learn to use asserts, it's all about the belt-and-suspenders stuff.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    3. Re:Does Lint Exist anymore by Carewolf · · Score: 1

      Compiler warnings have mostly caught up with the capabilities of Lint. There are some things Lint still does, but there are lots of things it warns about that have, as far as I know, never been the cause of a real bug. Getting a project to be 100% warning free with gcc -Wall is possible, and usually possible with -Wextra (maybe not so much with g++). The warnings usually are valuable, and I've personally seen bugs that could have been caught with gcc's warnings. Other compilers have other warnings and personalities, but I think it's worthwhile to investigate using warnings to check out a project with any compiler.

      To make it easy and fast use -Wall -Werror. That way you don't have to skim the log, but can just run make and come back when it breaks, and keep going until it compiles. Remember to remove -Werror later though otherwise compiler updates can bite you.

    4. Re:Does Lint Exist anymore by OrangeTide · · Score: 1

      I like to build with -Wall -Wextra and then do all my edits in a big batch instead of kicking off a potentially slow build. Usually the nasty bits that are troublesome to fix is when a warning is between two different components or libraries that share a header. The worse is when they are in different code repositories, making it difficult to commit one atomic change that fixes both projects.

      I leave -Werror in forever because I want people who mess around with tools to have to manually remove the flags and start filing bugs on all the new warnings. (I'm not going to expect the person updating the tools to be the one to fix all the new warnings).

      But this is all really policy choices and optimization of the process. I think we agree that compiler warnings are a useful tool.

      --
      “Common sense is not so common.” — Voltaire
  25. Git then doxygen by Ultra64 · · Score: 3, Informative

    You didn't mention a version control system, so assuming you aren't using one:

    Turn it into a git repository so you can easily back out of changes.

    Then run doxygen and start reading through the documentation.

    1. Re:Git then doxygen by JustNiz · · Score: 1

      If you run doxygen on an existing codebase that was developed without doxygen support already built-in, all you get is a giant list of classes and member names, and empty spaces where any descriptions would go.
      This has little to no value in trying to understand existing architecture or functionality.

    2. Re:Git then doxygen by vikingpower · · Score: 2

      DOXYGEN ?? You MacOS punks ! Now get off my lawn, before I hose you with my emacs-generated documentation !

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
  26. Refactoring strategy: flatten, then factor out by Anonymous Coward · · Score: 1

    I've successfully used this pattern:

    When I run into some badly designed code where areas of responsibility were blurred or utterly gone, just do this:

      - Flatten the whole thing into a single function (or as few as possible).
      - Restructure the result, removing redundancy wherever possbil.
      - Factor out into smaller, more logical units afterwards.

    Make sure that the whole thing works at every step along the way. (In other words, use functionally invariant modifications of the source.)

    1. Re:Refactoring strategy: flatten, then factor out by Anonymous Coward · · Score: 0

      removing redundancy wherever possbil.

      Uh, possible would like to have a word with you regarding this "redundancy" business.

    2. Re:Refactoring strategy: flatten, then factor out by Anonymous Coward · · Score: 0

      Dude. *Flattening* 220k lines of code in a *single* function? Seriously? Do you even know what are you talking about?

  27. Re: If you don't know what it does, don't touch it by Anonymous Coward · · Score: 1

    That's exactly the kind of crappy code to get rid of. It's a hidden risk that should be exposed and eliminated. More often than not that kind of stupidity is due to some hotshot dickbag "brogrammer" trying to show off and strut his stuff. It is otherwise completely unnecessary. Programmers like that do stupid stuff that requires "heroic" workarounds to be used, and then portray themselves as "heroes" when they implement these unnecessary hacks that they forced in the first place. To hell with them and their awful code.

  28. Still small ... by Anonymous Coward · · Score: 0

    220K is still not that large. Trouble comes if it is that large you cannot create projects w/ cross references in the IDE of your choice ...

  29. Man Hours by dragonk · · Score: 2

    To be quite frank, what you need are man hours. There are many tools out there that can help you finding corners or edges to start working on, but you can do the same with a coin toss, no tool will significantly reduce the amount of man hours that will have to be spent fixing, re-factoring and re-organizing. Take a good loooooong look, devise a simple strategy and then jump in somewhere. From personal experience, add lots of assertions as you go.

  30. sorry by Anonymous Coward · · Score: 0

    There's over 1000 lines of code (on average) in each source file? I'm sorry. That sounds like a mess.

    1. Re:sorry by ray-auch · · Score: 1

      Really ? I don't think you've seen really messy legacy code then.
      Try >10k LOC per file, 13k lines in a _view_, that's a mess - a "do not touch" mess.
      Oh, and 200k LOC is a small project, really.

    2. Re:sorry by HornWumpus · · Score: 1

      Not as bad a mess as a project with 20 lines of code per file.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    3. Re:sorry by Anonymous Coward · · Score: 0

      Agree here.

      But I would call 200k LOC medium sized. Large is 1M+ LOC, small is under 100k LOC.

  31. Few ideas by postmortem · · Score: 4, Informative

    1. Modern IDE with good gcc parser: Eclipse, Netbeans, 3rd party paid ones. Not Visual Studio. You want it to build call hierarchy tree for you, so that you can find methods that are unused. It will require some manual steps
    1a. if you have $, Understand for C/C++ is proprietary tool that will map a hierarchy of your code.
    2. perform structural coverage analysis of code in live action, will help map the dead code. gcov is free if you can use it.

    1. Re:Few ideas by vikingpower · · Score: 1

      Call hierarchy is the word. I use that all the time with NetBeans, the built-in function for that is really awesome. Yields a lot of insight.

      --
      Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
    2. Re:Few ideas by m3741 · · Score: 1

      If you're trying to wrap your head around a code base, I think Understand from Sci-Tools is an excellent choice. For a company, it's cheap and you'd get your money's worth.

  32. Look at the compiler warnings.. by toonces33 · · Score: 1

    And crank up the warning level to help you find inconsistencies between headers and declarations. In fact, you might need to start by cleaning up header files.

    Doxygen can help you find truly dead code.

    Cloned code is a pain to deal with - I don't know how you fix that. I guess it depend on how much of it there is..

  33. Hire me by ezakimak · · Score: 0, Offtopic

    I'm available to help.

    1. Re:Hire me by Anonymous Coward · · Score: 0

      Aren't resources tools? Including human resources?

  34. Visual Studio by Anonymous Coward · · Score: 1

    I'm sure I'll be down voted into oblivion for saying this on slashdot, but visual studio is actually a fantastic IDE. It will give you clear visibility on where a function or variable is referenced anywhere in the code which makes it very easy to remove duplicate functions and other legacy nasty-ness like thousands of lines of functions that are never called.

    It doesn't do anything that you can't achieve with a combination of open source tools but it's a hell of a lot easier to use IMO.

  35. Stricter compilation also an option by Codeyman · · Score: 3, Insightful

    Along with coverity as one of the commenters suggested, you can compile the code with stricter compilation options (like -Werror in gcc, which will error out if variables/functions are not used etc), you would then need to go through each of these files manually and resolve all the issues. Tools like bcpp can help you make sure your complete code base follows a common coding standard. Apart from that, if the name of the function is not indicative of what the function actually does, there are no tools smart enough to help you with that. You'd need to do a lot of cleanup manually by hand.

  36. systemd by fredan · · Score: 0, Flamebait

    they are going to release this new feature later this year.

    1. Re:systemd by marcello_dl · · Score: 0

      Bad advice. He is probably trying to clean up systemd itself.

      --
      ---- MISSING MISCELLANEOUS DATA SEGMENT --- [sigdash] trolololol
    2. Re:systemd by blue9steel · · Score: 1

      Too late, it's grown several orders of magnitude bigger in the time it took you to make this post.

  37. My code! by Anonymous Coward · · Score: 0

    Thus, you are looking at my code!!!

  38. Before you do anything by OrangeTide · · Score: 3, Interesting

    You need to write a test suite to confirm what works and what does not work.

    Once you have tests, you can start running coverage tools (like gcov or Coverity).
    If your tests are not covering parts, you need more tests or need to consider removing that part of the code.

    When tests are complete, then you can think about how to clean it up (refactor, rewrite, organize or whatever word the cool programmers are using now days). You can use your compiler warnings as a lint. And start to work through the spammy build logs to eliminate all the warnings. A good goal is to have zero warnings and after that build with -Werror which will cause builds to fail if any new warnings are introduced. (if you have 3rd parties or customers that build these libraries, you might not want to do that)

    Another option that becomes available after writing proper tests, is that you can make the decision to discard the entire project and start over from scratch. This is good if the requirements have changed dramatically over the years and a lot of messy hacks exist to support obsolete requirements. I must warn you though, usually rewriting is a waste of time. Time that is better spent understanding and fixing the existing code, after all source code is just a text file, you know how to edit a text file right?

    --
    “Common sense is not so common.” — Voltaire
    1. Re:Before you do anything by Anonymous Coward · · Score: 0

      And you're assuming the developer in question has a year or two to work on this, assuming no other projects and no updates to the baseline.

    2. Re:Before you do anything by Anonymous Coward · · Score: 0

      This. This a million times.

      I'm in charge of a much larger C library (~430KLOC) that is about ten years older than I am (I'm in my 30's). I went about "modernizing" it and failed miserably at almost every turn. Too many assumptions were unknown to me. Too many side effects. Hell, I just didn't know my way around the code (and how could I with its size?).

      So I went back and wrote an extremely comprehensive test suite that uses unit style tests, larger integration tests, fuzzing, and pseudo-random-combination based inputs. This test suite consists of nearly 100KLOC, driven by coverage tools, yet only covers about 45% of the codebase. The rest are relics that are unused in practice (maybe) or simply not reachable without very specific error conditions. Do I delete these unused sections? No. Do I modify them? No. Why? I don't have tests to prove their usefulness or lack thereof.

      Even today I cannot make large changes in the code without introducing untoward side effects. However, the big difference is today I know immediately when I've broken the library.

      TL;DR: put it in git/svn, write a comprehensive test suite, live by continuous integration. Anything less will be a nightmare.

    3. Re:Before you do anything by gstoddart · · Score: 2

      You need to write a test suite to confirm what works and what does not work.

      No, before you do anything you need to spend some time understanding what it does and sifting through the code for a LOT of hours. You need to understand the layout, the coding style, start to identify the bits which look like duplicates but which might not be.

      You need to be prepared to document the hell out of it, and be able to walk someone else through it -- if only as an exercise of "this is what I think I see, do you think you see the same thing?"

      Your initial stuff should be entirely in your brain, on your whiteboard, in your paper notes, or in your electronic notes. There's no substitute for spending time ferreting around in the code.

      If you start writing a test suite before you do anything ... you probably don't have enough understanding of the code to write the test suite in the first place.

      And then you'll spend your time trying to make the program fit your test suite.

      Another option that becomes available after writing proper tests, is that you can make the decision to discard the entire project and start over from scratch.

      No, if that's even an option, you need to review, understand, and document it first. If you go off half cocked writing a test suite only to decide you are going to scrap the whole thing ... you've wasted your time writing the test suite.

      Legacy code doesn't always play well with the idealized assumptions of "write a test suite". In fact, I'd say that's the last thing you want to be doing.

      If your management thinks this is a magic process where you dive in on day 1 ... run like hell, because they have no understanding of what you are really doing and what it will take.

      --
      Lost at C:>. Found at C.
    4. Re:Before you do anything by OrangeTide · · Score: 1

      If you're not given time to do something correctly, and by correctly I mean use engineering practices, you can refuse to do it until a compromise is arranged (my preference) or you can do it half-assed. There is the right way, and there is the way you end up doing things because you ran out of time. And you have to do your best to make the difference between the two as small as you can. You will pay for taking short cuts, sometimes the cost of a short cut works out to be worthwhile, usually it does not. And if you don't do the due diligence of investigating your options, then you can't even predict if your short cuts are a net benefit.

      Also, it doesn't take usually take two years to write a useful test suite for most projects, maybe if it was mission critical.

      --
      “Common sense is not so common.” — Voltaire
    5. Re:Before you do anything by OrangeTide · · Score: 1

      Any estimate on how long it took for you to get your tests up to a point where you could comfortably make reasonably small changes to the code base?

      --
      “Common sense is not so common.” — Voltaire
    6. Re:Before you do anything by OrangeTide · · Score: 1

      No, before you do anything you need to spend some time understanding what it does and sifting through the code for a LOT of hours. You need to understand the layout, the coding style, start to identify the bits which look like duplicates but which might not be.

      I don't agree that any of this if immediately important. It's putting the cart before the horse. You can review it when you are ready to make modifications. The existing software, documentation (we hope) and maybe header files are a good start. It's more important to understand how to use something before you try to understand the internal details, but really these two things tend to happen in parallel in fits in and starts when we try to grasp the complexities of a large project.

      No, if that's even an option, you need to review, understand, and document it first. If you go off half cocked writing a test suite only to decide you are going to scrap the whole thing ... you've wasted your time writing the test suite.

      The new code must pass the old test suite. Documentation is desirable, but not necessary to make forward progress.

      Legacy code doesn't always play well with the idealized assumptions of "write a test suite". In fact, I'd say that's the last thing you want to be doing.

      I admit I simplify to be able to fit what I had to say on the subject in a short post. But I have been writing and using test suites on old and new code for decades. I do have real software in production that has been unchanged for years, and I have software I have inherited code from others who did not document, comment or even note the original design requirements. So do not dismiss my comments so easily.

      --
      “Common sense is not so common.” — Voltaire
    7. Re:Before you do anything by Anonymous Coward · · Score: 0

      If you're not given time to do something correctly, and by correctly I mean use engineering practices, you can refuse to do it until a compromise is arranged (my preference) or you can do it half-assed. There is the right way, and there is the way you end up doing things because you ran out of time. And you have to do your best to make the difference between the two as small as you can. You will pay for taking short cuts, sometimes the cost of a short cut works out to be worthwhile, usually it does not. And if you don't do the due diligence of investigating your options, then you can't even predict if your short cuts are a net benefit.

      Depends on what business you're in. Sometimes they accept that you can't do full due diligence up front on a project; you have to take a steaming pile of shit, that nobody knows jack shit about and rewrite that crap in a month or two for whatever reason. Then you take the hit for time on the tail end of the project rather than up front. You might even get lucky if you're going for core functionality because half to 2/3rd of the system may no longer be needed. But nobody else knows enough about what's going to be able to give you that information up front.

      Also, it doesn't take usually take two years to write a useful test suite for most projects, maybe if it was mission critical.

      This is rather dependent on the complexity of the application, the availability on knowledgeable SMEs and actual skill/experience. But full coverage test code could well amount to twice the amount of executable code in the application. And 200 files each with 1K lines in them, probably isn't well written. And with roughly 237 work days per year, how many files do you think he can write test cases on per day if that's all he's doing? He might be able to get code coverage in a year, or even half a year, or it might take him closer to 2 years. Considering asking How to, I'm guessing the level of experience isn't that high either.

  39. Perl ! by Anonymous Coward · · Score: 0

    You have text !

    Whacking text into the shape is what Perl was built for !

    Seriously, you need the opinion of an experienced application designer.

    A hands on, I looked at the code & documentation, asked questions, (rinse & repeat this cycle, until clarity)

    This is a forensic job.

    And should be PAID for by the managers that had their customers & people build this !

  40. One word: Intern by Anonymous Coward · · Score: 0

    "Here ya go, kid...some real-world experience."

  41. indent by Anonymous Coward · · Score: 0

    Try using indent, works great!

  42. Re:But are you lacking experience and the brain fo by Immerman · · Score: 3, Insightful

    Who said anything about doing the job? They're asking for suggestions for automated code analysis that can hilight potential "problem" areas/code duplication/etc. Seems like a common enough situation that someone may have made a tool for it. Automated *repair* would be a far more challenging task, but just hilighting potential inconsistencies and redundancy "hot spots" is something that could be done with fairly high false-positive/negative rates and still be extremely useful when faced with cleaning up an atrocious codebase.

    --
    --- Most topics have many sides worth arguing, allow me to take one opposite you.
  43. Answer: read slashdot for long enough by plcurechax · · Score: 5, Interesting

    See: Working Effectively with Legacy Code book review (2008) for a book of that title by Michael Feathers (PDF article) on that very topic.

    There is even a summary of key points at Programmers @ StackExchange. Hundreds if not thousands of programmer's blogs address this very topic.

    You're welcome. Now get back to work.

  44. wait by pele · · Score: 1

    Your glance IS superficial. It takes at least 2-3-6 months toget a (basic) grasp of any project and figuring out what needs to be removed in that period is a waste of time. You will find you need to re-implement most of the "crap" you removed in the first place. So patience is your friend. Look, learn, study and then after you know pretty much all code paths decide what can and cannot be refactored, if anything. Good luck!

  45. Source Navigator NG by Anonymous Coward · · Score: 0

    http://sourcenav.sourceforge.net/
    to look around and get a feeling.

    Editor is not that good (but OK), but an external editor can be used.
    Call graph etc. all is doable. FOR FREE (beer and freedom).

  46. DXR, the code indexer by Grincho · · Score: 5, Interesting

    Wow, what an easy pitch. :-) At Mozilla, we've put together a tool called DXR ( https://github.com/mozilla/dxr... ). It indexes your code and lets you do text and regex searches. But if you can get your project to build under clang, you can really have some fun, with queries that find...

    * Calls of a function (great for dead code removal)
    * Uses a type
    * Overrides of a method
    * Uses and definitions of macros
    * etc., etc., etc. There are something like 24 different structural queries you can do.

    Because all of this is informed by the internal data structures of the clang compiler, it's nigh on 100% accurate (aside from more dynamic behaviors like sticking function pointers in a table and passing them around). You can also explore a hyperlinked version of the source, bouncing from #include to #include and drilling into methods.

    Here's how to set it up: https://dxr.readthedocs.org/en...
    Here's our production instance you can play with: https://dxr.mozilla.org/mozill...

    If you run into trouble, pop into #static on irc.mozilla.org, and we'll be happy to help you.

  47. Understand for C++ by iso-cop · · Score: 1

    A non-free but worth it tool for making code make sense from https://scitools.com./ I don't work for the company. There is a 15-day free trial. It costs $1K-$2K but if the code is important and you are going to live with it a long time it is worth it.

  48. Dependency graph by Anonymous Coward · · Score: 0

    This can help you produce a dependency graph and visualize your rats nest. 200 files isn't that big. My project has 1500. Coverity can identify unnecessary include files, leaks, bugs so forth.. Read Lakos to learn a lot about dependency control and physical design. There is a big difference between cleaning up the code to placate Coverity and producing a good testable design.

  49. What I did... (Over here! Look at me!) by Anonymous Coward · · Score: 0

    A lot of the coders who contribute to my project are monkeys who don't care about making a polished final project for the end user. I do. What I've done is to first compile everything with g++ (or mingw if you don't have a Linux machine available), which enforces the standard, as vc++ has "language extensions" that allow a lot of crappy code to compile.

    Secondly, run doxygen on your code base with graphviz installed. This will "kind of" document the source code, and it will generate call graphs for classes. You'll be able to look at a class and see if anyone actually calls it. Remove any that don't get called by anyone, then some of the things they call will be uncalled, and repeat.

    There are a few code beautifiers that can help make it look like one person wrote it instead of dozens. Stack Overflow has lots of suggestions on that front. I couldn't find one that truly served my needs, and since I had to add copious doxygen-style comments to generate usable documentation for the end user (not necessary just to get basic code documentation and call graphs) anyway, I just did it by hand...still doing it actually.

  50. I've done this far too many times by WinstonWolfIT · · Score: 4, Insightful

    First off, 220k lines of source isn't that big.

    You're not going to solve this with a big bang so get that idea out of your head. You're going to solve it gradually, and for a code base of that size it's going to take maybe a year of relatively slow improvement. Everyone on the team has to be on board, and every code review must include "What has been improved?" and "Did anything get worse? If so, that's not okay."

    1) Pick your battles. The code you're not changing is code that doesn't need to be looked at. Address your pain points as they come up.
    2) When you find a pain point while making a change, MAKE IT TESTABLE. Since you're in here making a usually simple fix, a single nominal test verifying that fix is fine. Testing anything else is a waste of time. Testable code will improve over time.
    3) If you can't make code testable because of an intractable dependency graph, welcome to the hell of "Design Dead". The only way out of this scenario is a rewrite of that area.
    4) Find your comfort level with regard to time boxing refactoring work. On my engagements, they just happen automatically, without explanation outside the team, nor apology to anyone. When estimating a piece of work, pad it with some extra time for cleanup. Only actually create work items for design dead areas. Your definition of done must include testable, tested and improved code.
    5) Duplicate code in itself isn't evil, and inconsistencies are simply inevitable. If you find duplicate code, pick one and deprecate the rest. However, code that is tightly coupled to the deprecated code will need to be refactored and if the coupling traverses an extended dependency graph, you'll simply have to live with the duplication and just stop adding to it.

    1. Re:I've done this far too many times by kschendel · · Score: 1

      In addition to those excellent suggestions, remember that grep is your friend. Nifty code indexers are all well and good, and might even be all you need if *everything* is c/c++/headers. I find that the larger the code base, the less likely that that's true. Write yourself some grep wrappers if the relevant files are spread around in some awkward manner.

    2. Re:I've done this far too many times by Anonymous Coward · · Score: 0

      Is there a tool that will make a dependency graph?
              For something an order of magnitude bigger

    3. Re:I've done this far too many times by Anonymous Coward · · Score: 0

      Oops, make that 2 orders of magnitude bigger

  51. Re:But are you lacking experience and the brain fo by ShanghaiBill · · Score: 1

    there are no "sufficiently smart optimizers/cleaners" that can do the job for you.

    There are also no hammers that can build a house for you. But a hammer is still useful if you are building a house.

  52. Am I the only one who thinks this sounds like FUN? by Anonymous Coward · · Score: 0

    Boy it sucks to be on the job market again for the first time in years and see a slashdot article like this....

    I do these types of things for fun in my spare time. I love optimizing and cleaning up nasty codebases.

    Seems like you likely want someone like me who does this at an above average level due to passion.

    I'd doubt you are local but I could telecommute and would be happy to begin immediately. I have extensive C++11 experience and optimization is my specialty.

    Link to a job posting and I'll send a resume....

  53. Few suggestions by Anonymous Coward · · Score: 3, Informative

    -1-
    Install "OpenGrok" ( https://github.com/OpenGrok/OpenGrok ) and index your code.
    OpenGrok is the best source-code browsing option out there.
    Use OpenGrok to extensively read and understand your code based.
    Examples:
    Which files in the linux kernel call 'printk':
          http://lingrok.org/search?q=printk&defs=&refs=&path=fs%2F&hist=&project=linux-next
    Where is 'printk' defined?
          http://lingrok.org/search?q=&defs=printk&refs=&path=&hist=&project=linux-next

    -2-
    Use Clang's static code analyzer, 'scan-build' : http://clang-analyzer.llvm.org/scan-build.html .
    Depending on how good/bad the code is, there could be many false positives.
    but it will give you a sense of what's going on, and what to focus on.

    -3-
    Enable all possible compilation warnings (either in GCC or CLANG).
    The more the better. Use "-Werror" to ensure you don't ignore them.
    Do it iteratively if needed by enabling more warnings, fixing what breaks, and repeat.
    A good list is here:
        http://git.savannah.gnu.org/cgit/gnulib.git/tree/m4/manywarnings.m4#n103

    Especailly eliminate unused code and variables.

    -4-
    Analyzer the McCabe Complexity ( http://en.wikipedia.org/wiki/Cyclomatic_complexity ) of your code, using pmccabe ( https://people.debian.org/~bame/pmccabe/pmccabe.1 ).
    Focus on functions with too-high score, and re-factor them.

    -5-
    Add automated tests to your program, and combine it with code coverage (lcov/gcov).
    In addition to the general good advice of 'try to increase coverage', focus specifically on code sections
    which are critical but not covereged at all - write tests specifically for them.
    Having some tests is better than having no tests at all.

    -6-
    Decide on code style (e.g. linux kernel style, GNU style, any other style) and build shell commands to tests them (i.e. a combination of grep/awk etc.).
    New commited code should adhere to the style. Use git hooks to enfore it.
    Existing code should be (slowly) refactored to the new style.
    Which style is a matter of personal preference, but having a consisted style across all code really helps.

    Ideally, it should be something as easy as 'make syntex-check' in GNU Coreutils.

    -7-
    With all of the above, integrate the tests into an automated system (e.g. autotools or cmake or just makefiles) that will allow you to run and re-run and re-run these checks easily.
    If it takes 10 shell commands to do static analysis - you'll be too lazy/busy/whatever to do it more than once.
    It should be as easy as 'make static-scan' or 'make coverage'.
    Investing in writing a good makefile is worth the effort.

    Good luck.
      - gordon

  54. All I needed by Anonymous Coward · · Score: 0

    I've done this on occasion, turn a "C/C++" project that really was just a lot of C with some nonsensical use of C++ features, in something that needed a lot less in terms of macros, compiled to a smaller binary, and actually used the features C++ brings to the table to good effect. All I needed was basically my usual development envionment of editor and compiler (nvi in one session, shell prompt in another, screen to tie them both together), though a SCM (even CVS did spiffily) helped a bunch to the point of being indispensable. Another thing that helped quite a bit was something to cross-reference the source (I used ctags, works together with nvi). The rest is elbow grease.

    Personally I like consistency, so I expect ".h" files to actually work as includes in ".c" files (so for ".cpp" use ".hpp", easy does it), and this is a reasonable time to do a formatting pass, which also is a good excuse to take a look at the code without touching its content, just the formatting (to 80 cols, uniform indentation and brace styles, etc.). Afterward you can do passes doing simple changes, like deduping copy/paste code, abstracting out functions, reshuffling code so same-purpose code sits in the same files, and so on. Then cook up vehicles to do more sweeping replacements with. At all times, keep the code working; if you make a boo-boo back the fsck out until it works again, then try again.

  55. 200 files...... by OneSmartFellow · · Score: 1

    ....Is NOT anything like a large project.
    It's almost small.

    1. Re:200 files...... by Ksevio · · Score: 1

      But it's large enough that you'd want to do some automated stuff to it first, not manually read over the whole thing.

  56. Use Warning Level 4 (W4) by danknight48 · · Score: 2

    You should be running at Warning Level 4 when coding. Its good practice to prevent the issue you have now.
    It will give you a crap load of warnings (which are all worth fixing if you have the time), but, it will highlight any unused variables and/or functions.

    in Visual Studio 2008-2013:
    - Project > Properties
    - Configuration Properties > C/C++ > General
    - Change "Warning Level(W3)" to W4

  57. Bware of 'cleanups' by plopez · · Score: 4, Interesting

    Anecdote from the mists of time:

    There was this C program which had been around a while which had undergone some evolution and maintenance. The decision was made to 'clean it up' There was a data structure, an array I think, which was unused in a subroutine, lets call it subroutine A. So it was removed. The next test runs of the application and suddenly the program started core dumping. After some agonizing debugging it was discovered to come from another subroutine, lets call it subroutine B.

    There had been an array in subroutine B which a loop had run over the end of. But subroutine A had loaded just prior to B and allocated memory for the unused data structure. This had provided enough space to handle the array out of bounds error in subroutine B but when removed subroutine B began overwriting subroutine A causing the crashes.

    It was good that the crashes were easily reproducible or could have been one of those intermittent things that drive people insane. An automated tool may not catch things like that since it may not show up until run time. It is C/C++ we are talking about now isn't it?

    --
    putting the 'B' in LGBTQ+
    1. Re:Bware of 'cleanups' by rrohbeck · · Score: 3, Interesting

      I still have some superfluous debugging code in a project that literally does nothing in the production version but without it the code crashes randomly after a week or so; a classic Heisenbug. It's clearly data trashed by a wild pointer but I could never find who did it since it's a large multithreaded program that depends on hardware behavior. Neither valgrind nor Coverity were of any help. It's too big to be rewritten so we just have to live with it.

  58. Caffine by Anonymous Coward · · Score: 0

    You're going to need it.

  59. Re: If you don't know what it does, don't touch it by Anonymous Coward · · Score: 1

    That's exactly the kind of crappy code to get rid of. It's a hidden risk that should be exposed and eliminated. More often than not that kind of stupidity is due to some hotshot dickbag "brogrammer" trying to show off and strut his stuff. It is otherwise completely unnecessary. Programmers like that do stupid stuff that requires "heroic" workarounds to be used, and then portray themselves as "heroes" when they implement these unnecessary hacks that they forced in the first place. To hell with them and their awful code.

    Wrong.

    Changing someone else's code that isn't broken because it doesn't meet your stylistic tastes is what "some hotshot dickbag "brogrammer" trying to show off and strut his stuff" does.

  60. Nuke it from orbit. by gestalt_n_pepper · · Score: 1

    It's the only way to be sure....

    Seriously though. C++ is one of the most powerful, complete commercial languages.... with a code interface and syntax designed by Satan. You couldn't have *designed* a coding system that would better encourage missteps, fuck-ups, obfuscation and a plethora of errors.

    It's a product of 90s math nerds whose machismo came from knowing more and better than regular folks. It was never designed to get work done efficiently; it was designed to feed the egos of C++ programmers.

    Better to take a relatively sane language like C# and make it scalable to the point where it can do everything C++ can do with a more restricted syntax and structure that ensures consistency and readability.

    --
    Please do not read this sig. Thank you.
    1. Re:Nuke it from orbit. by Anonymous Coward · · Score: 0

      Incorrect. Professional programmers have no problem writing solid code in C++. The guys who take their craft seriously know what they're doing and have been doing it for years to produce large scale C++based systems.

      Non-professional "programmers" that cut-n-paste bubble sort code from SO, then pray before they hit the build icon in Visual Studio...well then, your comment is spot-on. As a matter of fact, it sounds like you may be one of them.

      Sorry, it had to be said. Cheers.

  61. Re:But are you lacking experience and the brain fo by I'm+New+Around+Here · · Score: 5, Funny

    Hey, MC Hammer built my house for me.

    Unfortunately, I'm not allowed to touch it.

    --
    If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
  62. Comment removed by account_deleted · · Score: 4, Interesting

    Comment removed based on user account deletion

  63. Renaming functions and variables with good names by jcdr · · Score: 1

    It's not a tool trick, but I found valuable in some project to rename functions and variables to make them telling really what there do. It's not rare that the name was a poor choice or that his semantic changed in the evolution of the project. From my point of view, it's a kind of documentation.

  64. What worked for me by Anonymous Coward · · Score: 0

    No easy answers, but I've had to work with a project that was likely very similar to this. I don't what the code looks like in your case, but in mine it was utterly appalling. Not only did functions seem to repeat the same things over and over, but they all had minor variations on the same name, e.g. DoIt_1, DoIt_2..., DoIt_n. Worse, they all used the same (very terse) variable names. Some ran on for 600-700 lines, often with multiple logical lines of code in the same line in the editor. The few, spartan comments in the code had obviously been copy/pasted freely because most said the same thing (and clearly had nothing to do with code). The code 'logic' (of it could be called that) ran all over the place, to the point where you begin to doubt your sanity (DoIt_1 calls DoIt_50 calls DoIt_2 calls DoIt_4, etc etc).

    Things that helped get some semblance of sanity:
    1) Doxygen - it can give you an overview of call trees, classes etc. This may reveal whole files that are completely unused (it did in may case), in which get rid of them.
    2) As you learn more, begin to add long, meaningful names to functions and variables - it can make a world of difference
    3) Document everything you learn about the code in comments. Trust me, you will forget what you learned otherwise with so many parts that look alike. It also helps you keep track of code you've reviewed
    4) As your understanding of the code improves, start refactoring. Try to rationalise the number of functions and exactly what they do. In my case, that sometimes meant breaking huge functions into smaller ones with a clear purpose.
    5) Begin to modularise the functions (if possible) to give better structure to the project. This might help you see where the real duplication lies.

    Clearly this is a long, hard slog. Good luck!

  65. Large? by msobkow · · Score: 1

    That's one person's project for a year to write that volume of code.

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:Large? by iamacat · · Score: 1

      Maybe that volume of crappy code! I would rather that person wrote low tens of thousands of lines which were good.

    2. Re:Large? by TranquilVoid · · Score: 1

      You must be joking (I half suspect you are), that's 1000 lines of code per day. The mythical man month figure is 10 lines. Of course it depends on the language and the domain area, and whether you're hacking or following a depressive production line like Agile, but the larger a project becomes the more time you spend on the inter-relationships to keep it well-architected, and the less lines you can add.

    3. Re:Large? by TechyImmigrant · · Score: 1

      You must be joking (I half suspect you are), that's 1000 lines of code per day. The mythical man month figure is 10 lines.

      Really the line rate should be negative. If it's a mess, a cleaned up codebase will be smaller.

      --
      I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  66. SciTools Understand by Anonymous Coward · · Score: 0

    SciTools has a Understand series that is designed to do what you are asking. I personally found it to be very useful in a very large project I was maintaining.

  67. Not large. Medium. by cheesybagel · · Score: 1

    Small, for most people, is something with tens of kLOC or less, medium projects have hundreds of kLOC and large projects have millions of LOC.

    A large project would be something like the Linux kernel which has around 16 million LOC.

    I would advise using doxygen to have a global view of the codebase, some kind of lint like g++ -Wall, and a good editor preferably with refactoring support as tools. Plus static code analyzers and valgrind.

    First thing you should do is backups. Save the old codebase source repository somewhere safe. If the code is stored in an old repository like CVS, SVN, or worse no repository at all, you should migrate it to something better like Git. Then you start working by removing dead code, indenting, do static and run-time code analysis to find bugs, then merge duplicate code. Start with the more mechanical parts first. Once you get that working and bug free you basically have the new version 1.0.

    Then you can start analyzing the codebase in order to understand it and refactor the code for real. That will be version 2.0. You will be proud of it because it has your touch on it but will probably be crap and worse than 1.0 was.

    Then you work on version 3.0 which downscopes the feature creep and bad problem analysis mistakes you made in 2.0 and is finally better than 1.0.

    1. Re:Not large. Medium. by Marginal+Coward · · Score: 1

      Then you start working by removing dead code, indenting, do static and run-time code analysis to find bugs, then merge duplicate code. Start with the more mechanical parts first. Once you get that working and bug free you basically have the new version 1.0.

      I didn't see anything about tests? How will you know it is "working and bug free" without them?

      In my experience, one of the most dangerous things you can do is to change working code in a mechanical way that should be safe. Whenever I do that, I always use something to make sure the code is unchanged. If the change is strictly cosmetic, something like Beyond Compare can be used for that. Otherwise, you need module tests that fully characterize the existing code via functional testing with complete condition/decision coverage. Or, maybe there's some tool that can be used to compare code at the level of its abstract syntax tree or whatever to ensure that its functionality is unchanged. (Does anybody know of something like that? - a Clang tool, maybe?)

      Of course, this is taking a very conservative approach. But doing "safe" changes manually on a large (or even medium) code base without some sort of automated safety net is a sure way to introduce difficult-to-find bugs. Caveat emptor.

    2. Re:Not large. Medium. by cheesybagel · · Score: 1

      You assume the software is bug free to begin with.

      Of course you need to do minimal testing. As for complete test coverage good luck doing that when you don't even know what the code is supposed to do in the first place. Unit testing is fine when you are starting a project from scratch but not on something like this.

      Theoretically it should be possible to test if two pieces of code are functionally equivalent but I know of no tool which does this without annotating the code all over by hand first. I have experience with theorem proving and compiler design so I can do it automatically to a large degree just by looking at the code.

      It is more important to make rolling back errors easy by using a revision control system than assuming you can write bug free code to begin with anyway.

    3. Re:Not large. Medium. by Marginal+Coward · · Score: 1

      You assume the software is bug free to begin with.

      No, I assume that making ostensibly non-funtional changes to functioning code is more likely to introduce bugs than to accidentally remove them. The primary goal at that stage is to not make it worse.

      Anyway, with all that experience you have with theorem proving and compiler design, etc, I can see why you don't bother with any automatic aids to assure that changes don't accidentally make things worse.

      In my own case, I often do what I call "code algebra", which are small refactorings that are intended to leave the functioning of the code unchanged. Unfortunately, try as I might, I sometimes make mistakes along the way. That's why I seek mechanical help where available - and why I so greatly admire those of you who don't need any.

      I really gotta look into that theorem proving and compiler designing stuff - maybe that's the piece of the puzzle that I've been missing all these years.

  68. Re:One word: Intern by Anonymous Coward · · Score: 0

    What makes you so certain that its not the intern assigned to the task who is posting the question?

  69. Re:But are you lacking experience and the brain fo by Minwee · · Score: 1

    Please, Slashdot, don't hurt him.

  70. Re: If you don't know what it does, don't touch it by Anonymous Coward · · Score: 0

    ...said the hotshot dickbag brogrammer trying to show off and strut his stuff.

  71. rm -rf by tigersha · · Score: 1

    Easy!

    --
    The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
  72. The header files do match implementation! by tibit · · Score: 1

    a lot of inconsistency between what is declared in .h files and what is implemented in the corresponding .cpp files

    That's impossible unless you're talking about comments in the header files, or the implementation (.cpp) files don't include their own headers. Generally speaking, every .cpp file must include its header in the first non-comment line of the file.

    Good:

    // foo.cpp, Copyright (c) 2105 Dynabone LLC
    #include "foo.h"
    #include <cmath>
    ...

    Bad:

    // foo.cpp, Copyright (c) 2105 Dynabone LLC
    #include <cmath>
    ...
    #include "foo.h"
    ...

    --
    A successful API design takes a mixture of software design and pedagogy.
  73. Re: If you don't know what it does, don't touch it by Anonymous Coward · · Score: 0

    Wrong.

    Changing someone else's code that isn't broken because it doesn't meet your stylistic tastes is what "some hotshot dickbag "brogrammer" trying to show off and strut his stuff" does.

    Agreed. It's a slippery slope that ends with systemd incorporating a web server at one end and a boot loader at the other.

  74. Tricks I learned doing much the same thing by Anonymous Coward · · Score: 0

    Okay so I had the chance to cleanup a 400kloc java project some years back. And a 1kloc C++ project more recently. The process is roughly the same.

    1) Read the documentation and/or find the previous coder and politely ask them what were they thinking.

    2) source control. You really need to use it so you can feel free to break things and role things back when you can't quite get to the bottom of why or you find you can't quite make all the necessary changes in a reasonable way

    You have an average of 1000 lines per file. That is fairly high in my experience. Not always but over a large project... It is a warning sign.

    3) Find the files that are statistically large. Set break point on the functions/methods in them and profile for them.

    4) .h files are good & bad. Good they give you an idea what to expect in the .cpp and if it isn't in the .h there had better be a reason it isn't. Bad because people misuse .h's and stick macros and other oddities especially of premature optimization in them.

    5) Documentation comments. Whether it is doxygen or javadoc the mind numbing dull task of putting in documentation will a) help you gain familiarity with the code b) let your brain start to internalize the code c) make it obvious when code is simple data encapsulation and when it is spaghetti...

    6) Change compiler if you can. If the project requires javac 1.3 go to javac 1.6. GCC switch to clang or vice versa. Compilers keep getting not so much smarter as better at reporting the dumb things you (or the previous coder) are trying to do. So just trying to compile with the wrong compiler will give you a hit list of files. Files that compile cleanly with a different compiler are probably not the problem.

  75. Lattix seems to offer great tools for such a task by Anonymous Coward · · Score: 0

    A buddy of mine works for Lattix building code parsers and the Lattix tool suite seems to be a good bet for understanding and mapping code complexity and relationships as well as modularity, quality, etc. http://www.lattix.com

  76. rm? by Anonymous Coward · · Score: 0

    find / -iname "*.cpp" -delete

  77. unit/system tests are your friend by Anonymous Coward · · Score: 0

    but learn to do them right. If it's painful to write the test then you're probably writing the wrong kind of test. Not everything needs testing. Also be aware some code may rely on errors in other code.

  78. Does Lint Exist anymore by chopper749 · · Score: 1
  79. Elbow Grease by Anonymous Coward · · Score: 0

    Go through the source, find all the functions that do the same thing, comment them out. Write a replacement function you're happy with then repeatedly compile the project, replacing each broken reference with your new function until it compiles successfully.

    Now do this for all other functions.

    It's not the quickest method but after this project there's always the possibility you'll not find work and starve to death, so enjoy it while it lasts.

  80. redundant? or maybe not? by Anonymous Coward · · Score: 0

    http://www.joelonsoftware.com/articles/fog0000000069.html

  81. Could it be a threading issue like a a deadlock? by Paul+Fernhout · · Score: 4, Interesting

    Debugging code that prints or logs may act to synchronize access to some data structure. Sometimes that can prevent a deadlock or illegal pointer access as a side effect:
    http://stackoverflow.com/quest...
    http://en.wikipedia.org/wiki/D...

    So yes, complex programs can act in strange ways from seemingly minor changes.

    I spent a couple years helping maintain a large complex multi-threaded app (which included message passing between the apps, for another layer of fun) which supported 24X7 operations where a minute's downtime could cost millions of dollars in some situations, and it was not easy. The code base was easily 10X to 100X of what the poster of the story is tasked with maintaining. Versions of the code had been in production for over fifteen years. Much of the code had been ported from C++ & Tcl to Java (although C++/Tcl systems remained), but the threading model was somewhat different between the two, and the port had not taken account of all the differences. It would have been nice to be able to rewrite some key parts of the system to make them more maintainable, but there was never enough time for that in a big way -- and realistically, bigger rewrites likely introduce new issues. Still, eventually we got most of the worst deadlocks and memory leaks and similar such things fixed and the system got to the point where people stopped even remembering off-hand the last time a core part of the system needed to be rebooted (previously a fairly frequent event). But each deadlock could involve days, weeks, or even months of study and discussion, adding log statements, writing tests, lab tests, analyzing quite a few multi-gigabyte log files (and writing tools to help with that including visualizing internal message flow), and so on. And, same as you mention, hardware and OS issues could interact with it all, making some things hard to duplicate under virtual machines for developers. One thing is that to the end user, a system that is more stable may not look that different than one that is less so -- there are no new features, so it is not obvious what is being paid for.

    Although obviously if the program you support core dumps from a bad address or stack overflow, rather than just freezes up, it is probably something else. Still, even then, a bad pointer address can sometimes come from one thread freeing a data structure when another thread is still using it. The original C++ in the above mentioned project generally was highly reliable, but it still had some odd issues too. In one rare case, memory was freed in an unexpected way under certain conditions by other code running in the same thread but in code nested way deep with essentially recursive calls processing complex messages. I finally also traced part of that too what looked like maybe a bug in a supporting third-party library (a RogueWave data structure). Because that C++ code had been in production for years, and we were loathe to change it at the risk of introducing new issues, we mostly "fixed" that issue by making changes elsewhere in the system to prevent that component from getting the pattern of data that it had trouble handling. But we would not have known exactly what to change elsewhere without a lot of analysis.

    Sadly, just as we got it mostly working well, the new shiny thing of a mostly COTS system that did something similar came along to replace much of it (at a much bigger expense than maintaining the old, but granted with some nice new features).

    As I saw someone else comment recently about a "stable" OS, the end user generally cares more about how much work a system lets them get done, not how "stable" it is. A reboot can be acceptable, depending on the situation and the alternatives, even if not desirable. Erlang code is probably the master at that approach of rebooting code when it fails. :-) Here

    --
    A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
  82. unifdef by Anonymous Coward · · Score: 0

    Use unifdef to remove code that isn't used anymore.

  83. Re: If you don't know what it does, don't touch it by HiThere · · Score: 1

    Well, no. That *is* code you want to either get rid of or *THOROUGHLY* document. But until you understand it you'd better not touch it in any non-reversible manner, and test each change so that reversing remains trivial.

    --

    I think we've pushed this "anyone can grow up to be president" thing too far.
  84. Re: Am I the only one who thinks this sounds like by Anonymous Coward · · Score: 0

    No, I like fixing projects like this too. Unfortunately, I never seem to get the opportunity these days. Companies either prefer I cope with really broken tools in perpetuity or are full steam ahead "developing" a new clusterfuck (ERPs).

  85. refactor by vasilevich · · Score: 0

    Refactor, like what I did with 200,000 lines of horrible PHP code. And rewrite some parts, too, if they are architecturally unsound.

  86. You will have to go both ways. Top down and bottom by Anonymous Coward · · Score: 0

    there isn't much difference between mid size and large size projects once you figure out how to move forward. It's going to take you a long while anyway.
    On one hand, document what you expect this application does for you.
    On the other hand, locate the redudant codes. There are ways to achieve this goal. You may have to apply a handful of these though. Yeah right, you can't build a house with just a hammer. Static code analyzer is one, if you can afford the money. The other dumb but straight forward way is to have a log message written at the start of each function. You don't need to do for all of them. Try to start with the closest to the main first. I bet you will be able to eliminate a large chunk of useless code. You will have to observe quite a while, depending of the nature of the application. E.g., for a Finance application, it's like to take you at least a quarter of a year. Then, divide and conquer. Repeat the same to a deeper level of code.
    You will have to analyze and refactor the *useful* code at the end.

  87. I love projects like this by Anonymous Coward · · Score: 0

    Hire me, I love cleaning up code.

    you need to automate it a bit, mass editing, the way I do it is old school, emacs, grep, perl, sed, awk, etc. There are a lot of little tricks you can use, but it's important to start with something that works first, then you keep making sure you don't break it, let the compiler and linker do a lot of work. Identify something you are going to target, change the name of it, compile and start editing to eliminate errors. As you edit, you'll develop a sense of what's rote/repetivie, shell out and mass change that, compile again and catch the cases you missed. It goes a lot faster than you imagine it will, and you learn a lot about the code.

    do not go too long without a working version, if you are buried in hair and can't build and you have to go home, don't be afraid to roll back edits and start fresh.

  88. That mushy spongy thing in your head by gatkinso · · Score: 1

    That is about the only thing that will really help.

    --
    I am very small, utmostly microscopic.
  89. we do! by Anonymous Coward · · Score: 0

    we even named the process DevOps for legitimacy!

  90. There are probably other tools, maybe even better tools but it is what I know. I'd say try adding the whole thing to a C++ Visual Studio project. You can then set things on to give you build errors for all unreferenced junk. Find all references etc. Other IDEs probably can do it too but at least entry level VS is free and I know it will do it so ... Only issue you might have is if it is a *nix app or whatever perhaps you'd get a lot of false errors because it won't conform to VC++. But I'm guessing their close enough to get the bulk of the work done.

  91. Two ways by Anonymous Coward · · Score: 0

    There are two ways. One is to purchase a tool like we did for Y2K code analysis of 10M lines of code - cost about $150,000. The other is a smart and experienced C++ programmer with a good foundation in object oriented analysis. They would get a tool like Sparx Enterprise Architect ($200USD), reverse engineer the code into full UML diagrams so they understand all the relationships between the various classes, its behavior at a high level, and then start looking at the code where things seem "wonky". Such a person would cost as an employee about $100-120K per year. As a consultant, you are looking at at least 6 months effort at about $150-200 per hour. You can pay less, but you will get what you pay for. FWIW, my consulting rates for this sort of work is $200USD / hour, but then right now I am full-time employed at about $150K + benefits.

  92. Go slow by iamacat · · Score: 1

    Cleanup for the sake of cleanup projects never work. Current code performs some function and nobody can keep enthusiasm reading bad code for months just to have it perform same function in the end.

    Instead, you can gradually raise code quality by setting a high bar for new changes. For example, have each change reviewed by a couple of developers other than the author who are known for good style. If a new utility method is added, ensure that the code was searched for existing similar facilities. When legacy mess has to be used, it should be wrapped into a clean interface. And so on.

  93. There's a lot - you need a plan by plover · · Score: 1

    I'm assuming you're here because this code is critical to your business, it works well enough today, and it can't be easily replaced. You need to keep it working as you go, but you desperately need to modernize it. There's a lot you can do to set yourself up for success, and it's not just tools.

    First, get it building in the most current environment available. Is it Visual Studio? Port it to VS2013. Is it Eclipse? Get it into 4.4. Is it not even in an IDE? Get it into one - they're a great timesaver. Pick a refactoring tool, too, something that will help automate common refactoring activities like "extract method." You're going to do that a lot.

    Next, get it checked into your source control system, and building on your team's build server. This would also be a good time to revisit the packaging of the deliverables. If you don't already have a task and bug management system like Jira, Mylyn, TFS, Bugzilla, or whatever, get one that integrates into your workflow and your IDE. You have a lot of work to do, and you don't want to waste it filling out Excel spreadsheets. You really need your tools to be as unobtrusive as possible.

    There is no sense starting with sub-optimal tools, or fighting a crappy build or development environment. Your time is best spent on coding, and is wasted on everything else.

    Now that you're almost ready to get working, build a small suite of automated integration tests before moving on to addressing the architecture. They'll be ugly tests, but you need to know the code is still working as you begin making changes. Make sure the build machine can launch your tests and tell you when they fail.

    Now you can dig into the code base. Identify the underlying architecture. Is it event based? Does it closely model MVC? MVVM? Once you clearly define the architecture, break the solution into individually compilable libraries that represent the layers (controller, business logic, data accessors, etc.) Move the existing modules to the most appropriate library project. (Some won't fit cleanly, so you'll end up splitting those into parts later.) For now, make sure it builds and the tests run successfully.

    Pick one of the layers to work on first, perhaps the UI, perhaps the data access layer. Get it compiling clean, with no warnings, and turn on the compiler switch to enforce "treat warnings as errors." Run a static code analysis tool (Coverity, Klocwork, Fortify, /Analyze, lint, or anything, really) and fix whatever warnings it gives you.

    Tolerate no bugs. As you go through the code, when you find a bug, fix it then and there. Your QA staff will no doubt be finding plenty of bugs on their own, but you need to keep the project as clean as you can.

    Next, start refactoring the chosen layer into appropriate subdivisions, such as a controller, business layer interface, etc. You'll want to do a bunch of other housekeeping work here: get rid of globals and singletons, push stray business logic down into the business layer, pull stray UI interactions from the business layer up to the UI layer, etc. This would be a good time to introduce some automated unit tests to the logic you extract and move around. Unit tests force you to make the code testable; things like dependencies on databases, services, files, etc., cause problems with tests, so you start treating them with dependency injection. The primary outcome is that by making your code testable, you make it modular and readable. Plus, you get a few more tests under your belt.

    Run a complexity metric across the layer, and look for the highest complexity modules. Start chipping them down. Again, look to adding some unit tests to prove that the code you're isolating does what you claim, and that you're making your logic stateless.

    Decide on an exception handling strategy, and make your exception handling consistent. Pick the one appropriate to your app and technology: SEH, try/catch, C-style return codes, whatever, just apply it consistently as you go. Sim

    --
    John
  94. Re:But are you lacking experience and the brain fo by Anonymous Coward · · Score: 0

    Did your contractor go bankrupt for the insurance claims by the workers hurting themselves and crying, after which it was the hammer time?

  95. Re:But are you lacking experience and the brain fo by Anonymous Coward · · Score: 0

    Break it down.

  96. Software archeology ... use paper and pencil. by Ihlosi · · Score: 1

    Use paper and pencil, and regularly assess your progress so you can state in meetings that you've analyzed another 13.4% of the source code. It's practiced job security.

  97. IDE + Document by Anonymous Coward · · Score: 0

    Get a good C IDE, CLion from Jetbrains comes in mind. And start document the code. Dont change anything just document it. Then pick one part and brake it out. Run any tests you have to make sure it works. Then start all over.. get an other IDE.. na.. document, change, test run... until you done :-)

  98. Valgrind by hooiberg · · Score: 1

    Valgrind is a useful tool to get a profile run, and build up a call tree. As such, you can find the functions that are never called and can be removed as such. Moreover, you can patch a few memory leaks in the bargain.

  99. Perform a backup first by Anonymous Coward · · Score: 0

    I am in the same position. Before touching any code, do a backup, put everything under source control with a tool like SVN, install an IDE such as Eclipse, find existing documentation, get advice from previous programmers if possible, divide your work and stay focused.

  100. SonarQube by pwp · · Score: 1

    I would suggest that you try SonarQube (http://www.sonarqube.org/). It is free and does a pretty decent job of finding the duplicate / unused / ... code in your project.

  101. FLAMETHROWER by Anonymous Coward · · Score: 0

    Because fire is cleansing.

    Incidentally, to all you guys saying "220K lines is not so much" I reply "a good programmer does not need 220K lines to accomplish any task".

    NASA sent men to the moon with fewer LOC....

  102. Unit tests for legacy code are a waste of time by Anonymous Coward · · Score: 0

    I'm sorry, but anyone suggesting he should start by writing unit tests for everything is being naive. To begin with, many legacy applications, especially those written in non-object-oriented languages like C/C++, aren't structured to support legitimate unit tests. The act of setting up the environment to actually be able to run a single unit test becomes a significant chore that can consume many days or weeks of time. After that you get to tackle actually writing the unit test for a method which has no clearly defined contract, and you are further hampered because you have no way of tracking down or even testing all of the non-obvious side-effects of poorly encapsulated code. You could easily spend 6 months trying to write unit tests for a small subsystem and end up with a test suite that guarantees absolutely nothing about the accuracy or completeness of the code.

    I absolutely agree with the suggestions from others that you want to work out class diagrams, sequence diagrams, and develop end-to-end functional tests for regression testing. It's critical to understand the complete ramifications of refactoring a particular class or some particular methods and functions. As you write new code to replace what is there, you can write unit tests where it makes sense. You should understand the complete contract for every method you write, so you can write the unit tests to guarantee the contract.

    Most importantly, take it slowly. Start by looking at the entire system, identify what you feel are the biggest offenders, and work to understand those areas better. And as another poster mentioned, absolutely come up with a definition for what it means to "clean up" the code. Why does the code need to be "cleaned up" in the first place? Understanding your end goal will help you prioritize which areas you spend the most time on, because I guarantee you won't be able to clean it all up unless you re-write the whole thing.

  103. Sure by Anonymous Coward · · Score: 0

    Ocaml: As fast or faster than C++ with much less LOC.

    or if performance isn't an issue go Ruby and be under 15,000 LOC.

    C++ is brain-damage

  104. Hard work by countach · · Score: 1

    Having cleaned up a lot of projects, I don't think a magic tool is the answer. You have to have management approval to fiddle and deal with the testing and new bug outcome. You should work hard to remove all warnings in the code, that often does wonders and exposes a lot of flaws. You need to understand the code and slowly, a, step at a time, evolve the code until it's pretty and we'll structured. Don't rewrite from scratch. Evolve a, step at a time. Fix one architectural issue, make sure it still works, and then keep going, keep refactoring.

  105. Consider mocking frameworks in some situations by Paul+Fernhout · · Score: 1

    While this is in general great practical advice (and no doubt hard won), I can quibble about your point #3 on complex dependency graphs requiring rewrites as the "only way out". Certainly this is more of an issue in C++ than something like Java where code can be more easily replaced at runtime. However, at least in Java, the idea of "mocking" can sometimes be useful to test code even with complex dependencies without (significant) initial rewriting.

    I used mocking with JMockit successfully in the large Java project previously mentioned. I tried other frameworks, but preferred that one. JMockit supported creating unit tests for code which was not originally designed to be testable and had complex interdependencies in how objects were constructed. However, JMockit did have a substantial learning curve, even aside from hours spent trying to come up with tests for domain-specific specific code. Eventually I created some supporting code to make the mocking easier for our project, and then another developer improved even further on my work, making mocking our specific application much easier. So, at least in our situation, with a huge complex Java codebase in production, limited developer time, and limited tests initially, mocking was a big win IMHO that let us start to get a handle on everything without having to rewrite a lot of code at first.

    That said, in general, code is easier to maintain and understand when it does not have complex dependencies. "Dependency Injection" is a good idea in a lot of cases -- although it can have its own downsides in making object construction code harder to follow:
    http://en.wikipedia.org/wiki/D...

    So, while I'm quibbling about "only way forward" because of the possibility of mocking, I'm not saying rewriting in such situations in necessarily a bad idea or even quicker than mocking sometimes -- especially as mocking can introduce its own issues.

    With JMockit, one such unexpected issue was that mocking an object created mocks up the entire class hierarchy (causing issues when you wanted to mock one class but test a sibling class). This was a subtle issue that took a while to understand, and I did not see documented explicitly anywhere (at least in introductory material) although I think there was a bug/feature request about it somewhere.

    Another JMockit issue was that mocks were instantiated and removed in relation to threading somehow and there could be issues with mocks remaining in place when previous unit tests had not completely finished running all their threads. This could sometimes lead to unit tests failing occasionally due to thread timing issues and the mocking, when a class that was mocked in one test or with certain "expectations" was then accessed by another unit test which mocked different objects or had different "expectations". Sometimes this (unfortunately) happened embarrassingly on other developer's machines with different OS or hardware or on our Hudson/Jenkins build server just by the force of numbers of times the tests were run. Usually I could get around these cases either by adding delays at the end of the unit test to let all the threads complete or, better, by having improved mocks or other code that ensured the threads were finished before the test ended.

    That said, even with both of these issues, both frustrating to understand and then work around, mocking was still a big win for the project IMHO.

    I have not used any C++ mocking frameworks so I don't know how well they work or what their limits are. However, for suggestions about some such frameworks see this StackOverflow discussion:
    http://stackoverflow.com/quest...

    The top rated answer there is about "Google Mock" but there are other choices.
    https://code.google.com/p/goog...

    I do not see the word "mock" used so far in this Slashdot d

    --
    A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
    1. Re:Consider mocking frameworks in some situations by WinstonWolfIT · · Score: 1

      MAKE IT TESTABLE implies mocks as the first course of action, but even in Java this isn't always possible, and in C++ it's significantly harder.

    2. Re:Consider mocking frameworks in some situations by WinstonWolfIT · · Score: 1

      And to expand on this, in a design-dead module, you will too often find indirect dependencies on side effects, as well as "Hail Mary" calls via hooks into completely unrelated modules that can be devilish to decouple. You also have to consider how widely used a module is. 100+ usages of a design dead module will blow a simple time-boxed DI approach out of the water, resulting as I said in having to schedule it as a formal work item. Decoupling is easy, except for the times it's not.

  106. Citations... by Anonymous Coward · · Score: 0

    -1, eh mods?

    The world is full of self-confident rock-star developers who turn their noses up at professional software engineering practices.

    As for the other thing, here are the good old citations: Amazon 1, Amazon 2, Amazon 3.

  107. Emacs ETAGS by Anonymous Coward · · Score: 0

    Late to the party, but I have to chime in: Emacs ETAGS & CTAGS is just awesome for exploring code. find-tag-other-window.