Ask Slashdot: What Tools To Clean Up a Large C/C++ Project?

rm by Anonymous Coward · 2015-02-05 06:12 · Score: 4, Funny

Who about "rm"?

Re:rm by infernalC · 2015-02-05 06:21 · Score: 3, Funny

-fr of course.
Re:rm by fahrbot-bot · 2015-02-05 06:37 · Score: 4, Funny

Who about "rm"?
Ah yes. Every *nix programmer has, hopefully only once, experienced the joy of the following:

% rm * .o .o: No such file or directory

--
It must have been something you assimilated. . . .
Re:rm by Anonymous Coward · 2015-02-05 07:22 · Score: 2, Insightful

Get your build process under control. Then figure out which code is dead. ...
Re:rm by KiloByte · 2015-02-05 07:33 · Score: 2

rm "-rf *" rm: invalid option -- ' ' Try 'rm --help' for more information.
Neither the asterisk nor the space are valid options.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.

Static analysis tools... by underqualified · 2015-02-05 06:13 · Score: 5, Informative

If you're company is willing to pay for it, you can get something like Coverity. On the free(as in beer) side there is CppCheck and clang.

Re:Static analysis tools... by tjb6 · 2015-02-05 10:26 · Score: 2

Coverity will certainly tell you a lot of things that are broken, but probably wont help you decide how to fix them.
Brain power is probably the best approach to this one, although some automated detection of unused code and paths won't hurt.
Amy number of other static analysers will do the same job.
Re:Static analysis tools... by MadKeithV · 2015-02-05 21:36 · Score: 2

If you're company is willing to pay for it, you can get something like Coverity. On the free(as in beer) side there is CppCheck and clang.
Coverity is expensive, slow, and failed to successfully compile any of our large real-world projects ("large" here meaning tens of thousands of files). This was with their own consultants / sales people on-site to babysit the process. They couldn't do it and couldn't figure out why it wasn't working. From their explanations it also seemed that on a properly large code-base you'd have to spend a long time tweaking the output to rid yourself of spurious messages/warnings.

From experience, the best way to clean up a large project is to not actually mess it up in the first place. If you think it's messed up anyway, then the first thing you need to do is to think long and hard about cost/benefit. A large, specialized company like Coverity can't actually make a specialized compiler/linker that works for all possible correct C++ programs. C++ is notoriously hard to handle "automatically" unless you follow certain strict rules, and if the project is already that messy it's unlikely that any automatic tool is going to make that much of it.
Where do you want to go with this project? Is it actually working fine now? Does it need changes? Can you afford these changes? If the answers to all of these are "yes" then I think your best bet is good old elbow grease. Start adding unit tests for the code, and then manually start cleaning up the code. Just the act of adding unit tests will teach you a lot about the dependencies of the code.

CLion by Anonymous Coward · 2015-02-05 06:14 · Score: 3, Interesting

https://www.jetbrains.com/clion/

You call that large? by Anonymous Coward · 2015-02-05 06:14 · Score: 5, Insightful

Seriously, that's mid-sized at best.

Re:You call that large? by Oxdeadface · 2015-02-05 09:10 · Score: 5, Insightful

What compels comments like this? The first AC posts absolutely nothing of value, just wants to let everyone know that they disagree with a minor point that's completely irrelevant to the OP's question. Thanks for the insight, champ. The followup, probably the same person, goes on to ramble like an old fart telling a useless anecdote about his kid that's barely even related to the topic at hand. At what point did either of these seem like a good idea? Neither of these comments address the question being asked or even attempt to be useful at all. No one cares what you consider a large program and absolutely no one gives a shit about you or your fucking crotch fruit. These comments are just some sad cunt's way of claiming, "I'm more experienced and better than you." Fuck right off.

clang static code analysis by Anonymous Coward · 2015-02-05 06:15 · Score: 5, Informative

scan-build and scan-view from clang++ will show you what is being used and what isn't as far as static code analysis goes.

Easy by Anonymous Coward · 2015-02-05 06:16 · Score: 2, Funny

cd Large_Cplusplus_project

sudo rm -r *

sudo apt-get install java

Re:Easy by blue9steel · 2015-02-05 09:02 · Score: 2

Step 4, call sysadmin and complain about slowness of execution of new Java app running on hardware that was sized for C++ code.

Document first by gbjbaanb · 2015-02-05 06:16 · Score: 5, Insightful

So, figure out the layers or logical components between each module and then you will be able to chew smaller chunks.

Then, doxygen the whole lot, making sure to use dot to create the graphs for callers and callees. This will let you see the interaction points so you can see what impact a change in one method will have (ie which callers you have to check).

Some people will say "write unit tests" but frankly, it never works with a legacy code base, to effectively unit test you have to write your code differently to how you'd normally do it. You don't have that luxury here. So a good integration test suite should be developed to test the functionality of the whole thing, then you can repeat it to make sure your changes still work. Its not as instant as unit testing (but more effective) so you'll have to invest in a build system that regularly builds and runs the (automated) integration test and tells you the results - and commit changes reasonably regularly so you can isolate changes that end up breaking the system.

The rest of the task is simply hard work running through how it works and understanding it. There's no short-cuts to working hard, sorry.

Re:Document first by laughingskeptic · 2015-02-05 06:40 · Score: 3, Informative

This will find the static interaction points, but will miss the dynamic interaction points. He also has to watch for callbacks and methods present to satisfy oddball templates in C++, methods that will be invoked as a result of casts, etc.
Re:Document first by bmajik · 2015-02-05 09:12 · Score: 5, Insightful

This.
One of my first professional programming projects was to take a look at the custom C++ billing software our company had purchased from a contract programmer.
I had a long unix and programming background, and was back for a summer job after doing 1 semester of C++ in college.
My boss told me, since I was the only one who had C++ experience, to start documenting the system.
At the time, we were using IRIX, and so I was using the SGI compiler and tools suite, which were, I believe, licensed from EDG. The point is that there was a very nice call graph visualizer. This was helpful for understanding things at a superficial level.
However, what was even better was just running the program a bunch of times on test data and seeing what it did while under the debugger.
While my summer began with the task of documenting the system, as I learned things I'd report them to my boss.
By the end of the summer, I had re-written some fundamental parts of the system; I'd moved some of the processing outside, and I pre-processed and pre-sorted the data.
The overall execution time went from many hours to about 45 minutes to calculate monthly bills. THe key innovation was replacing the inner loop of the charge tabulation -- which was 2 or 3 levels of nested linked list traversal.
Instead, I used the standard unix sort tools to pre-sort the data files before being loaded into the system, and I changed the system to use a data structure that supported a binary search.
The majority of the code got left alone. By understanding the code under a debugger, and realizing that how it worked on production data was much different than how it performed on the test data it was originally delivered with, I was able to make a critical set of changes that had a huge impact.
In general, I spend as much time as I can not writing code, but instead, understanding how the existing system works. For a current project, I've spent the last two weeks playing with somebody else's code, and now I've expanded it so that it can also operate on my data sets, and I've probably changed fewer than 100 lines across about 5 different projects.

--
My opinions are my own, and do not necessarily represent those of my employer.
Re:Document first by boristhespider · 2015-02-05 11:40 · Score: 3, Interesting

No.
It would also be stamped on by management and any competent product owner, unless it was absolutely dripping in tests before he embarked on anything of the sort. If the code is producing the desired numbers but is simply a total and utter mess, no-one is going to thank him for declaring he's going to rebuild it from scratch, and the only way it would be sanctioned at all is if he could absolutely guarantee the same numbers before and after (to within rounding and ordering error). Given the state of the codebase he's talking about, those tests would have to be end-to-end tests since as others have noted writing unit tests for legacy code is in general a thankless and time-consuming task. (Then again, so is attempting to build end-to-end tests that satisfy every useful codepath.)
I genuinely have no idea how large the codebase at my company is; at a guess I'd wager we're in the many millions of lines of code (certainly enough to render Intellisense an utterly useless, chugging, unusable piece of shit) and quite possibly more. Some of it is really quite good code with thorough unit test coverage -- that tends to be the more recent stuff. The rest is covered, in principle if not in reality, by a large number of end-to-end tests that at the very least exercise some extremely fragile pieces of code quite effectively. Even with this, rampant refactoring is discouraged, let alone rampant rewriting. It soaks up developer time we can't afford to spend, and the danger of hitting a bug that isn't covered by our end-to-end tests (or, even more infuriatingly, fixing a bug that clients have grown to trust the results of) is pretty high. Unless there's a very good reason to out and out rewrite, it's to be very much discouraged. Careful refactoring, once every self-contained block of work rerunning all the unit tests and as many of the end-to-end tests as is practical, is about the only way to proceed.
Re:Document first by TechyImmigrant · 2015-02-05 18:50 · Score: 2

If I was asked to 'fix' or 'clean up' a codebase, I'd refuse.
1) 'fixed' or 'cleaned up' is not well defined.
2) One you've changed it to your definition of 'fixed' it's going to be jibberish to the next guy.
3) You don't fix code, you own code. Your management should be asking you to own the code so you can nurture is and improve it. Fixing things is one aspect of improving code.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.

Eclipse, Xcode or any IDE by guruevi · 2015-02-05 06:19 · Score: 5, Insightful

Any decent IDE has the capability of pointing at least towards unused blocks of code and will generate a tree of function calls. I've worked with Eclipse and Xcode both of which have these capabilities. Even GCC (or another C compiler) can warn you about chunks of unused code or missing/bad header files. You can also rename functions across the entire codebase if necessary.

If your code has warnings or errors, continue fixing until the warnings are gone. As far as functions that do similar things but are named differently, that is a bit harder because 'looks like they are doing the same thing' doesn't always mean they ARE doing the same thing (if they have the exact same code, you could perhaps solve with statistical analysis or simply a text finder).

Make sure that if you replace a function that it has the same behavior in all cases. Even mediocre developers have learned that reuse existing code is a "good thing" and often different functions that do "the same thing" have edge cases (often undocumented) where it does behave differently (especially in C/C++ eg. difference in signedness, memory mapping method, characters etc)

--
Custom electronics and digital signage for your business: www.evcircuits.com

Risky by Anonymous Coward · 2015-02-05 06:21 · Score: 2, Insightful

This strikes me as a very risky undertaking. If there are a lot of functions/modules doing similar things, any attempt to combine many similar functions into one runs a huge risk of introducing bugs if you can't wrap your head around the entire program (which is unlikely imo). There is a huge time and budget risk in this endeavor.

If you don't know what it does, don't touch it. by BlueKitties · 2015-02-05 06:22 · Score: 5, Interesting

Seriously, you never know when some previous programmed made a "duplicate" function to do something bizarre, like force a particular initialization order of static-class-member variables between translation units. Sometimes deleting pointless code can do... terrible things. Just be careful, test your changes, etc.

--
"Sorrow is better than laughter, for by sadness of face the heart is made glad." [Ecclesiastes 7:3]

Unit tests by Midnight+Thunder · 2015-02-05 06:22 · Score: 4, Interesting

While I dislike writing unit tests, I have to admit they are useful in protecting your butt when something breaks, since the test should catch it first. Of course you need to decide whether in a particular scenario they add value or just make you manager happy.

In a case like yours, you can make code modifications and hope nothing breaks or build unit tests and ensure that you don't break any of them when refactoring. Initially rather than just ripping out the seemingly duplicate methods, rip out/tweak their implementation and have them point to what they seems like a the right method to provide the common functionality. If your unit tests show breakage, then you know that you missed something.

If you do things wholesale, then you are likely to break something in an unmanageable way. Oh and make sure things are version controlled ;)

--
Jumpstart the tartan drive.

Re:Unit tests by gstoddart · 2015-02-05 06:52 · Score: 5, Interesting

I've maintained several legacy code bases over the years.
And I will flat out tell you that unit tests have VERY limited utility in terms of understanding a mess of code you inherited. At least, in the beginning.
Sure, you can start with a couple of basic premises, and you can convince yourself those basic premises still work.
But the initial grokking of your code, understanding all places where a function may be used, understanding all of the tricky bits and gotchas, trying to understand why there are 9 functions which look like they do the same thing? That takes some time and effort, and quite possibly some tools.
Unit tests are great for starting to build up a few things, and move towards better stuff ... but in a system which has several hundred (or several thousand) functions and interactions, resulting in really large numbers of code paths ... having a few unit tests describing the stuff you understand doesn't mean all of the stuff you don't understand wasn't broken, simply because you don't know what you don't know.
So it is important to understand your new unit tests on legacy code are, at best, a VERY incomplete view of your code. That will improve over time, but you could potentially need to write a few thousand of them to be sure you're not breaking anything in the big picture.

If you do things wholesale, then you are likely to break something in an unmanageable way. Oh and make sure things are version controlled ;)
Oh, yes .... This .. for the love of god, this.
You should learn how to tag branches and the like in your version control so you can identify a baseline of "before I ever touched anything" and then be able to cleanly build everything which predates you, as well as building your "after refactoring this part".
Branching/tags/whatever your version control calls it -- that doesn't take up much space, so use them often, and consistently. Let the tool do the heavy lifting of keeping track of what you've changed.
You do NOT want to find yourself unable to build it as it existed, or identify all of the diffs between what you started with and what you have.

--
Lost at C:>. Found at C.
Re:Unit tests by gstoddart · 2015-02-05 07:40 · Score: 2

I agree about some code being unit-test-proof. I've definitely encountered some.
For the original poster ... start with backups, so you 100% isolate yourself from your own stupidity ... and I'm not calling you stupid, I'm saying everyone who has ever done this has had that "oh, crap, did I just do that?" moment. Plan for it now so you don't have to try to deal with it later.
Then spend a lot of time simply going through the code. Using something like FreeMind or a giant whiteboard to map out the high level stuff. Take paper notes. Lots of them. Spend a lot of time reading it, getting familiar with it, and developing a mental understanding of it.
Understand the hierarchy, the modules, and the high level stuff. Pick a few modules and delve into them. Dissect them to the point you can start to understand how the pieces fit together, and at least have a roadmap. You should be able to draw a diagram which broadly describes the chunks of functionality in your sleep.
If you are trying to make code changes on day one, you're doing it wrong. If your boss expects you to be doing code changes on day one, he's an idiot who doesn't understand what you're being asked to do.
I would say that easily the first few weeks (if not more depending on the code) should be spent doing nothing more than reading and trying to understand. And then doing it some more. Be prepared to walk through with a debugger just to confirm what you think is true -- surprisingly, it often isn't when dealing with someone else's code.
Think of this as being as much archaeology as a technical exercise ... you are sifting through layers of code, likely built up over the course of years, and which has a very good chance of having its own unique nature and strangeness.
First, grasshopper, seek understanding. Then, accept that your understanding is incomplete. Then seek more understanding. =)
It's like trying to understand alien technology ... you could put an eye out if you aren't fully sure you have learned what it really does. ;-)

--
Lost at C:>. Found at C.

graphviz by Anonymous Coward · 2015-02-05 06:24 · Score: 3, Informative

graphviz can visualize the inter-functional and inter-file dependencies.

It's free and built into the functionality of doxygen.

I'd recommend recommenting all the functions using doxygen - because to clean up a large project you need to know it.

Looks like a reverse engineering project by prefec2 · 2015-02-05 06:26 · Score: 4, Interesting

Modularize the software. There are a lot of tools which can help you to analyze static dependencies in the code which can help you to identify components. You could also use a run-time analysis tool for example Kieker which is initially for Java, but there is an extension for C/C++.

Git then doxygen by Ultra64 · 2015-02-05 06:34 · Score: 3, Informative

You didn't mention a version control system, so assuming you aren't using one:

Turn it into a git repository so you can easily back out of changes.

Then run doxygen and start reading through the documentation.

Re:Git then doxygen by vikingpower · 2015-02-05 07:51 · Score: 2

DOXYGEN ?? You MacOS punks ! Now get off my lawn, before I hose you with my emacs-generated documentation !

--
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace

Man Hours by dragonk · 2015-02-05 06:40 · Score: 2

To be quite frank, what you need are man hours. There are many tools out there that can help you finding corners or edges to start working on, but you can do the same with a coin toss, no tool will significantly reduce the amount of man hours that will have to be spent fixing, re-factoring and re-organizing. Take a good loooooong look, devise a simple strategy and then jump in somewhere. From personal experience, add lots of assertions as you go.

Few ideas by postmortem · 2015-02-05 06:42 · Score: 4, Informative

1. Modern IDE with good gcc parser: Eclipse, Netbeans, 3rd party paid ones. Not Visual Studio. You want it to build call hierarchy tree for you, so that you can find methods that are unused. It will require some manual steps
1a. if you have $, Understand for C/C++ is proprietary tool that will map a hierarchy of your code.
2. perform structural coverage analysis of code in live action, will help map the dead code. gcov is free if you can use it.

Stricter compilation also an option by Codeyman · 2015-02-05 06:45 · Score: 3, Insightful

Along with coverity as one of the commenters suggested, you can compile the code with stricter compilation options (like -Werror in gcc, which will error out if variables/functions are not used etc), you would then need to go through each of these files manually and resolve all the issues. Tools like bcpp can help you make sure your complete code base follows a common coding standard. Apart from that, if the name of the function is not indicative of what the function actually does, there are no tools smart enough to help you with that. You'd need to do a lot of cleanup manually by hand.

Before you do anything by OrangeTide · 2015-02-05 06:53 · Score: 3, Interesting

You need to write a test suite to confirm what works and what does not work.

Once you have tests, you can start running coverage tools (like gcov or Coverity).
If your tests are not covering parts, you need more tests or need to consider removing that part of the code.

When tests are complete, then you can think about how to clean it up (refactor, rewrite, organize or whatever word the cool programmers are using now days). You can use your compiler warnings as a lint. And start to work through the spammy build logs to eliminate all the warnings. A good goal is to have zero warnings and after that build with -Werror which will cause builds to fail if any new warnings are introduced. (if you have 3rd parties or customers that build these libraries, you might not want to do that)

Another option that becomes available after writing proper tests, is that you can make the decision to discard the entire project and start over from scratch. This is good if the requirements have changed dramatically over the years and a lot of messy hacks exist to support obsolete requirements. I must warn you though, usually rewriting is a waste of time. Time that is better spent understanding and fixing the existing code, after all source code is just a text file, you know how to edit a text file right?

--
“Common sense is not so common.” — Voltaire

Re:Before you do anything by gstoddart · 2015-02-05 08:17 · Score: 2

You need to write a test suite to confirm what works and what does not work.
No, before you do anything you need to spend some time understanding what it does and sifting through the code for a LOT of hours. You need to understand the layout, the coding style, start to identify the bits which look like duplicates but which might not be.
You need to be prepared to document the hell out of it, and be able to walk someone else through it -- if only as an exercise of "this is what I think I see, do you think you see the same thing?"
Your initial stuff should be entirely in your brain, on your whiteboard, in your paper notes, or in your electronic notes. There's no substitute for spending time ferreting around in the code.
If you start writing a test suite before you do anything ... you probably don't have enough understanding of the code to write the test suite in the first place.
And then you'll spend your time trying to make the program fit your test suite.

Another option that becomes available after writing proper tests, is that you can make the decision to discard the entire project and start over from scratch.

No, if that's even an option, you need to review, understand, and document it first. If you go off half cocked writing a test suite only to decide you are going to scrap the whole thing ... you've wasted your time writing the test suite.
Legacy code doesn't always play well with the idealized assumptions of "write a test suite". In fact, I'd say that's the last thing you want to be doing.
If your management thinks this is a magic process where you dive in on day 1 ... run like hell, because they have no understanding of what you are really doing and what it will take.

--
Lost at C:>. Found at C.

Re:Does Lint Exist anymore by OrangeTide · 2015-02-05 06:58 · Score: 4, Informative

Compiler warnings have mostly caught up with the capabilities of Lint. There are some things Lint still does, but there are lots of things it warns about that have, as far as I know, never been the cause of a real bug. Getting a project to be 100% warning free with gcc -Wall is possible, and usually possible with -Wextra (maybe not so much with g++). The warnings usually are valuable, and I've personally seen bugs that could have been caught with gcc's warnings. Other compilers have other warnings and personalities, but I think it's worthwhile to investigate using warnings to check out a project with any compiler.

--
“Common sense is not so common.” — Voltaire

Re:If it works, DO NOT FUCK WITH IT!!! by OrangeTide · 2015-02-05 07:00 · Score: 3, Insightful

Indeed! This is why writing a test for it, for ALL of it, would be a good start. Not only does one start to learn the deep details of the code when they are doing test development, without running the risk of creating new subtle bugs, at the end of the test writing exercise they also get the bonus of having a useful test suite.

--
“Common sense is not so common.” — Voltaire

Re:But are you lacking experience and the brain fo by Immerman · 2015-02-05 07:15 · Score: 3, Insightful

Who said anything about doing the job? They're asking for suggestions for automated code analysis that can hilight potential "problem" areas/code duplication/etc. Seems like a common enough situation that someone may have made a tool for it. Automated *repair* would be a far more challenging task, but just hilighting potential inconsistencies and redundancy "hot spots" is something that could be done with fairly high false-positive/negative rates and still be extremely useful when faced with cleaning up an atrocious codebase.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.

Re:Fire by kwiecmmm · 2015-02-05 07:17 · Score: 2

Nuke it from orbit. It's the only way to be sure.

Answer: read slashdot for long enough by plcurechax · 2015-02-05 07:18 · Score: 5, Interesting

See: Working Effectively with Legacy Code book review (2008) for a book of that title by Michael Feathers (PDF article) on that very topic.

There is even a summary of key points at Programmers @ StackExchange. Hundreds if not thousands of programmer's blogs address this very topic.

You're welcome. Now get back to work.

Re:If it works, DO NOT FUCK WITH IT!!! by HornWumpus · 2015-02-05 07:22 · Score: 3, Insightful

Not possible.

Sometimes you have a mess that you don't want to fuck with, but you have to.

Don't combine the duplicate functions into one. Decide which one is the 'good one' then have all the others call it and fix up the results to match the alternative versions. Do this one at a time and test it to death.

A plan that has worked for me is to separate the code into two piles. The application, which remains a fucking mess, and a library which only gets clean code. Eventually all the good stuff is in the library and you can just replace the calling mess with a new version.

More basically: If you touch it, it will be your mess until you leave for a new job. Think long and hard if you don't want to stick this one on someone else.

Unless management knows and publicly acknowledges the scope of the problem, don't touch it. You will be held responsible for breaking it, but fixing it will be invisible. Don't be a hero. Falling on grenades isn't fun (unless you are talking about the fat girl).

--
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'

DXR, the code indexer by Grincho · 2015-02-05 07:24 · Score: 5, Interesting

Wow, what an easy pitch. :-) At Mozilla, we've put together a tool called DXR ( https://github.com/mozilla/dxr... ). It indexes your code and lets you do text and regex searches. But if you can get your project to build under clang, you can really have some fun, with queries that find...

* Calls of a function (great for dead code removal)
* Uses a type
* Overrides of a method
* Uses and definitions of macros
* etc., etc., etc. There are something like 24 different structural queries you can do.

Because all of this is informed by the internal data structures of the clang compiler, it's nigh on 100% accurate (aside from more dynamic behaviors like sticking function pointers in a table and passing them around). You can also explore a hyperlinked version of the source, bouncing from #include to #include and drilling into methods.

Here's how to set it up: https://dxr.readthedocs.org/en...
Here's our production instance you can play with: https://dxr.mozilla.org/mozill...

If you run into trouble, pop into #static on irc.mozilla.org, and we'll be happy to help you.

I've done this far too many times by WinstonWolfIT · 2015-02-05 07:35 · Score: 4, Insightful

First off, 220k lines of source isn't that big.

You're not going to solve this with a big bang so get that idea out of your head. You're going to solve it gradually, and for a code base of that size it's going to take maybe a year of relatively slow improvement. Everyone on the team has to be on board, and every code review must include "What has been improved?" and "Did anything get worse? If so, that's not okay."

1) Pick your battles. The code you're not changing is code that doesn't need to be looked at. Address your pain points as they come up.
2) When you find a pain point while making a change, MAKE IT TESTABLE. Since you're in here making a usually simple fix, a single nominal test verifying that fix is fine. Testing anything else is a waste of time. Testable code will improve over time.
3) If you can't make code testable because of an intractable dependency graph, welcome to the hell of "Design Dead". The only way out of this scenario is a rewrite of that area.
4) Find your comfort level with regard to time boxing refactoring work. On my engagements, they just happen automatically, without explanation outside the team, nor apology to anyone. When estimating a piece of work, pad it with some extra time for cleanup. Only actually create work items for design dead areas. Your definition of done must include testable, tested and improved code.
5) Duplicate code in itself isn't evil, and inconsistencies are simply inevitable. If you find duplicate code, pick one and deprecate the rest. However, code that is tightly coupled to the deprecated code will need to be refactored and if the coupling traverses an extended dependency graph, you'll simply have to live with the duplication and just stop adding to it.

Few suggestions by Anonymous Coward · 2015-02-05 07:45 · Score: 3, Informative

-1-
Install "OpenGrok" ( https://github.com/OpenGrok/OpenGrok ) and index your code.
OpenGrok is the best source-code browsing option out there.
Use OpenGrok to extensively read and understand your code based.
Examples:
Which files in the linux kernel call 'printk':
http://lingrok.org/search?q=printk&defs=&refs=&path=fs%2F&hist=&project=linux-next
Where is 'printk' defined?
http://lingrok.org/search?q=&defs=printk&refs=&path=&hist=&project=linux-next

-2-
Use Clang's static code analyzer, 'scan-build' : http://clang-analyzer.llvm.org/scan-build.html .
Depending on how good/bad the code is, there could be many false positives.
but it will give you a sense of what's going on, and what to focus on.

-3-
Enable all possible compilation warnings (either in GCC or CLANG).
The more the better. Use "-Werror" to ensure you don't ignore them.
Do it iteratively if needed by enabling more warnings, fixing what breaks, and repeat.
A good list is here:
http://git.savannah.gnu.org/cgit/gnulib.git/tree/m4/manywarnings.m4#n103

Especailly eliminate unused code and variables.

-4-
Analyzer the McCabe Complexity ( http://en.wikipedia.org/wiki/Cyclomatic_complexity ) of your code, using pmccabe ( https://people.debian.org/~bame/pmccabe/pmccabe.1 ).
Focus on functions with too-high score, and re-factor them.

-5-
Add automated tests to your program, and combine it with code coverage (lcov/gcov).
In addition to the general good advice of 'try to increase coverage', focus specifically on code sections
which are critical but not covereged at all - write tests specifically for them.
Having some tests is better than having no tests at all.

-6-
Decide on code style (e.g. linux kernel style, GNU style, any other style) and build shell commands to tests them (i.e. a combination of grep/awk etc.).
New commited code should adhere to the style. Use git hooks to enfore it.
Existing code should be (slowly) refactored to the new style.
Which style is a matter of personal preference, but having a consisted style across all code really helps.

Ideally, it should be something as easy as 'make syntex-check' in GNU Coreutils.

-7-
With all of the above, integrate the tests into an automated system (e.g. autotools or cmake or just makefiles) that will allow you to run and re-run and re-run these checks easily.
If it takes 10 shell commands to do static analysis - you'll be too lazy/busy/whatever to do it more than once.
It should be as easy as 'make static-scan' or 'make coverage'.
Investing in writing a good makefile is worth the effort.

Good luck.
- gordon

Use Warning Level 4 (W4) by danknight48 · 2015-02-05 07:49 · Score: 2

You should be running at Warning Level 4 when coding. Its good practice to prevent the issue you have now.
It will give you a crap load of warnings (which are all worth fixing if you have the time), but, it will highlight any unused variables and/or functions.

in Visual Studio 2008-2013:
- Project > Properties
- Configuration Properties > C/C++ > General
- Change "Warning Level(W3)" to W4

Bware of 'cleanups' by plopez · 2015-02-05 07:53 · Score: 4, Interesting

Anecdote from the mists of time:

There was this C program which had been around a while which had undergone some evolution and maintenance. The decision was made to 'clean it up' There was a data structure, an array I think, which was unused in a subroutine, lets call it subroutine A. So it was removed. The next test runs of the application and suddenly the program started core dumping. After some agonizing debugging it was discovered to come from another subroutine, lets call it subroutine B.

There had been an array in subroutine B which a loop had run over the end of. But subroutine A had loaded just prior to B and allocated memory for the unused data structure. This had provided enough space to handle the array out of bounds error in subroutine B but when removed subroutine B began overwriting subroutine A causing the crashes.

It was good that the crashes were easily reproducible or could have been one of those intermittent things that drive people insane. An automated tool may not catch things like that since it may not show up until run time. It is C/C++ we are talking about now isn't it?

--
putting the 'B' in LGBTQ+

Re:Bware of 'cleanups' by rrohbeck · 2015-02-05 08:22 · Score: 3, Interesting

I still have some superfluous debugging code in a project that literally does nothing in the production version but without it the code crashes randomly after a week or so; a classic Heisenbug. It's clearly data trashed by a wild pointer but I could never find who did it since it's a large multithreaded program that depends on hardware behavior. Neither valgrind nor Coverity were of any help. It's too big to be rewritten so we just have to live with it.

--
thegodmovie.com - watch it

Re:But are you lacking experience and the brain fo by I'm+New+Around+Here · 2015-02-05 08:13 · Score: 5, Funny

Hey, MC Hammer built my house for me.

Unfortunately, I'm not allowed to touch it.

--
If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.

Comment removed by account_deleted · 2015-02-05 08:18 · Score: 4, Interesting

Comment removed based on user account deletion

Could it be a threading issue like a a deadlock? by Paul+Fernhout · 2015-02-05 11:35 · Score: 4, Interesting

Debugging code that prints or logs may act to synchronize access to some data structure. Sometimes that can prevent a deadlock or illegal pointer access as a side effect:
http://stackoverflow.com/quest...
http://en.wikipedia.org/wiki/D...

So yes, complex programs can act in strange ways from seemingly minor changes.

I spent a couple years helping maintain a large complex multi-threaded app (which included message passing between the apps, for another layer of fun) which supported 24X7 operations where a minute's downtime could cost millions of dollars in some situations, and it was not easy. The code base was easily 10X to 100X of what the poster of the story is tasked with maintaining. Versions of the code had been in production for over fifteen years. Much of the code had been ported from C++ & Tcl to Java (although C++/Tcl systems remained), but the threading model was somewhat different between the two, and the port had not taken account of all the differences. It would have been nice to be able to rewrite some key parts of the system to make them more maintainable, but there was never enough time for that in a big way -- and realistically, bigger rewrites likely introduce new issues. Still, eventually we got most of the worst deadlocks and memory leaks and similar such things fixed and the system got to the point where people stopped even remembering off-hand the last time a core part of the system needed to be rebooted (previously a fairly frequent event). But each deadlock could involve days, weeks, or even months of study and discussion, adding log statements, writing tests, lab tests, analyzing quite a few multi-gigabyte log files (and writing tools to help with that including visualizing internal message flow), and so on. And, same as you mention, hardware and OS issues could interact with it all, making some things hard to duplicate under virtual machines for developers. One thing is that to the end user, a system that is more stable may not look that different than one that is less so -- there are no new features, so it is not obvious what is being paid for.

Although obviously if the program you support core dumps from a bad address or stack overflow, rather than just freezes up, it is probably something else. Still, even then, a bad pointer address can sometimes come from one thread freeing a data structure when another thread is still using it. The original C++ in the above mentioned project generally was highly reliable, but it still had some odd issues too. In one rare case, memory was freed in an unexpected way under certain conditions by other code running in the same thread but in code nested way deep with essentially recursive calls processing complex messages. I finally also traced part of that too what looked like maybe a bug in a supporting third-party library (a RogueWave data structure). Because that C++ code had been in production for years, and we were loathe to change it at the risk of introducing new issues, we mostly "fixed" that issue by making changes elsewhere in the system to prevent that component from getting the pattern of data that it had trouble handling. But we would not have known exactly what to change elsewhere without a lot of analysis.

Sadly, just as we got it mostly working well, the new shiny thing of a mostly COTS system that did something similar came along to replace much of it (at a much bigger expense than maintaining the old, but granted with some nice new features).

As I saw someone else comment recently about a "stable" OS, the end user generally cares more about how much work a system lets them get done, not how "stable" it is. A reboot can be acceptable, depending on the situation and the alternatives, even if not desirable. Erlang code is probably the master at that approach of rebooting code when it fails. :-) Here

--
A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.

Slashdot Mirror

Ask Slashdot: What Tools To Clean Up a Large C/C++ Project?

49 of 233 comments (clear)