The P.G. Wodehouse Method of Refactoring
covertbadger notes a developer's blog entry on a novel way of judging progress in refactoring code. "Software quality tools can never completely replace the gut instinct of a developer — you might have massive test coverage, but that won't help with subjective measures such as code smells. With Wodehouse-style refactoring, we can now easily keep track of which code we are happy with, and which code we remain deeply suspicious of."
It's only 30k lines of code. This is no problem.
First, take ownership. This is your project. Identify your resources, name the gates you must get through to succeed. If you have help make sure they understand their changes must hit the corner cases or it's junk, then give them ownership of their piece explicitly. Create a safe environment for testing changes, with forward and backward versioning.
Define success. So many projects skip this essential step. If you cannot identify the destination you cannot tell when you've won.
Skip the 50,000 foot view and proceed directly to "what does this do and how can it be done better"? Believe it or not flowcharts and Venn diagrams are not obsolete. Create tree views of function calls. Identify processes that should be libraried. Create policies like "maximum function call depth", "Maximum process share", etc.
If you're the lead, look at issues like memory allocation and process management. Do your profiler due diligence.
If you're the lone ranger on this just absorb the whole thing and integrate it. Force feed your brain huge quantities of what-ifs until it gives you the right answer in self defense - and then have somebody else check the result.
30 days development and 60 days testing. Remember to give a nice presentation at the end and sell it!
Good luck.
Help stamp out iliturcy.
But the screen resolution of fanfold paper hanging on the wall cannot be beaten by the best modern monitors.
Sometimes just printing the stuff out, papering the floor with it and literally crawling over it yields answers that otherwise escape.
If the line width won't fit on the paper at a reasonable pitch, there's a clue right there.
Help stamp out iliturcy.
Most of the books and documents that I read in the last 20 years go towards metrics, statistical analysis of code. This ignores the Zen and art of coding and debugging. While much of coding is science, there is a part of it that is feel. If it is only science, then code generators would have already eliminated programmers.
Fight Spammers!
The article highlights a principle which we all know (either explicitly or implicitly): we are highly vision-oriented creatures; visual perception is (relatively) easy for us. A quick convincer: coloured and neatly indented code is easier to read than monochromatic unindented code, right? So perception of colour and position is faster than that of symbols and their relationships.
The methods in the article plays right into this: by viewing the code zoomed out greatly, one can readily see the density of code, and get a visual "fingerprint" of each chunk. By coupling printout position to satisfaction with the printed code, one can readily see which piece of code needs the most work.
Interesting additions: adding colour to each class and method based on how memory they allocate (or how many objects they construct); or colouring functions relating to their position in the call graph, or their in-degree.
I was under the impression that "large projects" started somewhere around the million lines of code mark, not at a mere 30K lines. But here is what I do, and none of this require any special insights into the source code (note that I do this primarily for C++):
// x is the number of blarglewhoppers" - just use "int NumBlargleWhoppers" instead.
1. Ruthlessly delete lines. Get rid of ***anything*** that does not contribute to correct operation or understanding. Even including things like version history (that's why you have the damn tool, use it already (1)!), inane comments (but keep the stuff that actually helps with understanding), code that is commented out (if you really need it, it will be in the aforementioned version tool), code that is not called, and code that is not doing anything at all (such as empty constructors or destructors).
2. Decrease the scope of everything to be as tight as you possibly can. Make everything that you can private, static, or whatever else your language offers to decrease scope. Declare variables in the innermost scope. Make them all const if possible.
3. Anything that belongs together should be in one file (even if that files becomes 5000 lines long). Anything that *doesn't* belong together should be split into separate files (but don't make a file for just a single function - instead create a file with "leftovers").
4. Anything that has a non-descriptive name is to be renamed to what it really represents. No more "int x;
5. Keep an eye open for duplicate code. Get rid of the duplicates.
6. Any special insights gained, write them down as comments in the appropriate place. Anything you do NOT understand, also write them down as comments. Mark those with something you can grep for.
7. Any homegrown version of something that is available in STL or boost, to be replaced by its "official" alternative.
8. And that goes double for string operations! No more "char *" anywhere; it is the 21st century, use strings already! I'll make an exception for functions that allow "const char *" to be passed in, but only with the "const". If I find a "char *" without the "const", I *will* come to your office and bash your head against the wall. Repeatedly. Just so you know.
9. Any error handling through error return codes, probably to be replaced by exceptions, unless it turns the calling code into a wild mass of try/catch blocks.
10. Pointers, to be replaced by references where possible.
11. Negative logic and names, to be replaced by positive logic and names. Don't have "if (!NoPrinterAvailable()) {A();} else {B();}" - instead do "if (PrinterAvailable() {A();} else {B();}".
12. Anything that looks like it was written by drunk lemurs or the French, to be deleted on principle and replaced by something sane.
So there you have it. In my experience, doing this will remove about half of the lines of code (more if there was a significant number of lemurs on the team), at the gain of considerable clarity and usually performance.
(1) And honestly, I don't give a flying fuck which one of you messed up on the 29th of february 1823 or why you thought it was a good idea in the first place. I'm concerned with what the code will be doing in the future, not how it came to be in this sorry state. Chances are, whatever you thought at the time is long obsolete anyway. Get rid of the cruft. Get rid of anything that doesn't help - it just clutters the mind.
There's a big difference between having code which just happens to somehow work, and having code which works because the code is clearly written and documented, where the person in charge of maintaining it actually understands what the code is doing.
Whether you rewrite from scratch or work with the legacy code, it's your job as the programmer to understand and document all of the tweaks, bug fixes, edge conditions, and obscure environments. If there aren't comments in the existing code to explain these things, then it's your job to understand why the code is doing what it is doing, and add the comments as needed. If the code isn't clear, it's your job to make it clear.
The author correctly points out that when you do a total rewrite, then the undocumented special cases handled by the old code will make themselves felt. As these problems present themselves, it takes time to fix them. However, you also get the opportunity to understand the undocumented special cases and get them clearly coded and properly documented, which reduces maintenence costs over the long term. Your judgment whether to maintain or to rewrite should take both of these factors into consideration.
Elegant as it may be, that version of quicksort is so slow that (IIRC) even the Haskell documentation suggests against using it in "real" code.
Personally, I think the C++ way is even easier to read, and it has the benefit of being really fast:
sort(xs.begin(), xs.end());
Maybe not
Agreed. And you have to change code anyway when you're moving functions that are defined elsewhere, so the code does change.
The key idea though is, you have an array of visual cues that tell you instantly this code still needs to be refactored. These cues often can be removed in bulk, even automated with scripts. Indentation for example. Or use of deprecated functions. Certain types of comments. It's attractive to do these bulk cleanups because they give the overal code a healthier outlook. But they remove the cues. The actual rule would more be something like "don't work on cosmetics".
I respectfully disagree. Consider a piece of code that has 8 levels of nesting. With a judicious use of short variable names, parentheses, and curly braces, it is possible to make it *look* not so bad in the original code. It might look like there's only a half dozen levels. With a pretty printer, nesting LOOKS exactly as nesting IS. At a *GLANCE* I can see where things are getting hairy.
That does NOT mean that it must be refactored. Only that it is an area that may be worthy of additional consideration.
On second thought, nothing says this is a black-or-white choice. If it works for you to print out code as is, great! My experience has been that a pretty printer can be helpful. YMMV.
The Netscape browser, like the company that made it, indeed ended up a failure. And my pals who were there at the time tell me that the poor code quality was a major factor in the inability to get anywhere. How long did it take between the last decent release of Netscape and the 1.0 release of Firefox? Four years? Six? I guess it depends on what you consider the last decent release. No matter how you count it, though, there were years of thrashing trying to get something based on the old code out the door. Eventually, they just gave up.
Firefox is a big success now, but Netscape was a giant crater of a failure during a crucial period, leaving Microsoft effectively unopposed in their attempts to take over the browser market.
This happens to me pretty regularly if I write a section of code, wait three months, and then read it again.
It's happening less frequently though, so perhaps my skill level is leveling out.
Either that or I'm stagnating.
kaens.blogspot.com
Do you want to encourage people to inline their functions manually, and not divide things into small, cute trivial functions?
Is this a misguided attempt to increase efficiency?
Well the obvious problem with the analogy is that you're not finding needles in a haystack - you're looking for hay in a haystack.