The P.G. Wodehouse Method of Refactoring
covertbadger notes a developer's blog entry on a novel way of judging progress in refactoring code. "Software quality tools can never completely replace the gut instinct of a developer — you might have massive test coverage, but that won't help with subjective measures such as code smells. With Wodehouse-style refactoring, we can now easily keep track of which code we are happy with, and which code we remain deeply suspicious of."
It's only 30k lines of code. This is no problem.
First, take ownership. This is your project. Identify your resources, name the gates you must get through to succeed. If you have help make sure they understand their changes must hit the corner cases or it's junk, then give them ownership of their piece explicitly. Create a safe environment for testing changes, with forward and backward versioning.
Define success. So many projects skip this essential step. If you cannot identify the destination you cannot tell when you've won.
Skip the 50,000 foot view and proceed directly to "what does this do and how can it be done better"? Believe it or not flowcharts and Venn diagrams are not obsolete. Create tree views of function calls. Identify processes that should be libraried. Create policies like "maximum function call depth", "Maximum process share", etc.
If you're the lead, look at issues like memory allocation and process management. Do your profiler due diligence.
If you're the lone ranger on this just absorb the whole thing and integrate it. Force feed your brain huge quantities of what-ifs until it gives you the right answer in self defense - and then have somebody else check the result.
30 days development and 60 days testing. Remember to give a nice presentation at the end and sell it!
Good luck.
Help stamp out iliturcy.
Just burning up the comment threads on this one.
Don't blame me, I voted for Baltar.
Code that was written while drunk, high, or half-asleep I will be deeply suspicious of, and probably needs to be refactored immediately. Anything else probably needs refactoring as well, but less urgently.
Random and weird software I've written.
Nice article! I thought it was an interesting way to bring a qualitative feel back into software development. In a word of mathematics and code, we often lose sight of those qualitative things in favour of hard numbers. I think developers too often live in the analytical world like european Chess when they should be combining intuition with analysis like in Go / Weiqi.
"The value of a man resides in what he gives,
and not in what he is capable of receiving."
--Albert Einstein
I hate the e.e. cummings method of refactoring, which is to run all your code through a lower-case filter. Never seems to help very much.
John
...is a neat idea. Besides the mentioned practice of raising and lowering pieces of code that the developers are happy and dissatisified with, hanging code encourages peer review.
Perhaps not in-depth code review, but physically hanging code in your office might "scare" developers into adhering to their organization's standards for fear of their coworkers mockery of poor code.
It might be difficult to hide shitty code when anyone can walk by and look at what *you* think is good.
(At least it might take just as much effort to hide bad code as it does to make it good.)
But the screen resolution of fanfold paper hanging on the wall cannot be beaten by the best modern monitors.
Sometimes just printing the stuff out, papering the floor with it and literally crawling over it yields answers that otherwise escape.
If the line width won't fit on the paper at a reasonable pitch, there's a clue right there.
Help stamp out iliturcy.
I'm agreeing with you. 30k lines is 500 pages. That's roughly 8' high by 50' wide. Definitely doable.
Not about the scaring though -- just about it being useful. Anxiety isn't something I'd want to deliberately introduce to a working programmer. Most of the ones I've known had enough performance anxiety issues of their own without adding any.
Hanging the code makes some errors more visible. Not all errors are bugs. Some are structural. Structural fixes sometimes repair "pernicious" bugs.
Help stamp out iliturcy.
I have an even better idea: instead of printing the code on paper, maybe we could represent it by making corresponding holes in little cards. The cards you could hang in front of the window. As the classes get simpler, the holes can get bigger (because less total space is needed) and they get spread around more easily, so more and more light filters through. This way we can emulate the "sun rising on the project", "light at the end of the tunnel" feeling we all love so dearly.
Need a status update? Just look into the room - if you can see sunlight, the work is done!
I really like the concept, and it fits in with a bunch of techniques we've been using at work in line with the "Big Visible Charts" ideas. Things like this and Agile stories written on index cards and pinned to the wall do sound hokey. A number of people like Johanna Rothman http://www.pragprog.com/titles/jrpm however point out, that these techniques are a lot more inclusive and (as I've found) you get much more animated discussions than the pm/architect/team lead writing a document "for discussion."
If nothing else it's fun to watch management trying to cope with your walls being covered with sheets of paper, cards and string when they've paid all this money for MS Project and the Rational Suite.
My code is not ugly. It's battle-scarred
Most of the books and documents that I read in the last 20 years go towards metrics, statistical analysis of code. This ignores the Zen and art of coding and debugging. While much of coding is science, there is a part of it that is feel. If it is only science, then code generators would have already eliminated programmers.
Fight Spammers!
When refactoring dirty code, avoid doing minor cleanups on the other code. This way the places where you still need to work on stand out from the rest. In any case, as soon as minor cleanups go beyond layouting, it also means you're doing changes in code without test coverage. Even straightening out if/else clauses easily leads to errors.
The article highlights a principle which we all know (either explicitly or implicitly): we are highly vision-oriented creatures; visual perception is (relatively) easy for us. A quick convincer: coloured and neatly indented code is easier to read than monochromatic unindented code, right? So perception of colour and position is faster than that of symbols and their relationships.
The methods in the article plays right into this: by viewing the code zoomed out greatly, one can readily see the density of code, and get a visual "fingerprint" of each chunk. By coupling printout position to satisfaction with the printed code, one can readily see which piece of code needs the most work.
Interesting additions: adding colour to each class and method based on how memory they allocate (or how many objects they construct); or colouring functions relating to their position in the call graph, or their in-degree.
...takes a very long time on the product of two large prime codes.
I agree with assigning ownership, but I don't think it is enough. How do you aggregate the combined efforts of all the teams into a birds eye perspective and how do you know where the boundaries of the project lie? If the project is big enough and the requirements are complex enough, you really need an evolutionary method to control the cost of adjusting direction. The number of questions you propose will simply be too many to deal with. On the other hand, if it's all about producing a release (prototype, proof-of-concept, save-my-butt-development) and then move on you may have the benefit of never having to taste your own dog food and the investors will pick up the bill instead and draw strange conclusions about the lifetime of an invention.
Talk about decoupled classes..
What ho.
I was under the impression that "large projects" started somewhere around the million lines of code mark, not at a mere 30K lines. But here is what I do, and none of this require any special insights into the source code (note that I do this primarily for C++):
// x is the number of blarglewhoppers" - just use "int NumBlargleWhoppers" instead.
1. Ruthlessly delete lines. Get rid of ***anything*** that does not contribute to correct operation or understanding. Even including things like version history (that's why you have the damn tool, use it already (1)!), inane comments (but keep the stuff that actually helps with understanding), code that is commented out (if you really need it, it will be in the aforementioned version tool), code that is not called, and code that is not doing anything at all (such as empty constructors or destructors).
2. Decrease the scope of everything to be as tight as you possibly can. Make everything that you can private, static, or whatever else your language offers to decrease scope. Declare variables in the innermost scope. Make them all const if possible.
3. Anything that belongs together should be in one file (even if that files becomes 5000 lines long). Anything that *doesn't* belong together should be split into separate files (but don't make a file for just a single function - instead create a file with "leftovers").
4. Anything that has a non-descriptive name is to be renamed to what it really represents. No more "int x;
5. Keep an eye open for duplicate code. Get rid of the duplicates.
6. Any special insights gained, write them down as comments in the appropriate place. Anything you do NOT understand, also write them down as comments. Mark those with something you can grep for.
7. Any homegrown version of something that is available in STL or boost, to be replaced by its "official" alternative.
8. And that goes double for string operations! No more "char *" anywhere; it is the 21st century, use strings already! I'll make an exception for functions that allow "const char *" to be passed in, but only with the "const". If I find a "char *" without the "const", I *will* come to your office and bash your head against the wall. Repeatedly. Just so you know.
9. Any error handling through error return codes, probably to be replaced by exceptions, unless it turns the calling code into a wild mass of try/catch blocks.
10. Pointers, to be replaced by references where possible.
11. Negative logic and names, to be replaced by positive logic and names. Don't have "if (!NoPrinterAvailable()) {A();} else {B();}" - instead do "if (PrinterAvailable() {A();} else {B();}".
12. Anything that looks like it was written by drunk lemurs or the French, to be deleted on principle and replaced by something sane.
So there you have it. In my experience, doing this will remove about half of the lines of code (more if there was a significant number of lemurs on the team), at the gain of considerable clarity and usually performance.
(1) And honestly, I don't give a flying fuck which one of you messed up on the 29th of february 1823 or why you thought it was a good idea in the first place. I'm concerned with what the code will be doing in the future, not how it came to be in this sorry state. Chances are, whatever you thought at the time is long obsolete anyway. Get rid of the cruft. Get rid of anything that doesn't help - it just clutters the mind.
``The Code of the Woosters''
http://www.amazon.co.uk/Code-Woosters-Penguin-Modern-Classics/dp/014118597X/
What ho! Spiffing!
"Most of the ones I've known had enough performance anxiety issues of their own without adding any."
Wow! Lends new meaning to the term, performance review.
Nope, it's not. And it's a stupid practice, the way some people try to define it.
"What are you doing?"
"I'm refactoring code."
"Oh, you aren't doing anything."
quicksort [] = [] quicksort (x:xs) = quicksort less ++ [x] ++ quicksort greater where less = [ y | y = x ]
!sig
I just scanned the article and I see that the big idea is to reimplement SeeSoft but not as well. Awesome.
There's a big difference between having code which just happens to somehow work, and having code which works because the code is clearly written and documented, where the person in charge of maintaining it actually understands what the code is doing.
Whether you rewrite from scratch or work with the legacy code, it's your job as the programmer to understand and document all of the tweaks, bug fixes, edge conditions, and obscure environments. If there aren't comments in the existing code to explain these things, then it's your job to understand why the code is doing what it is doing, and add the comments as needed. If the code isn't clear, it's your job to make it clear.
The author correctly points out that when you do a total rewrite, then the undocumented special cases handled by the old code will make themselves felt. As these problems present themselves, it takes time to fix them. However, you also get the opportunity to understand the undocumented special cases and get them clearly coded and properly documented, which reduces maintenence costs over the long term. Your judgment whether to maintain or to rewrite should take both of these factors into consideration.
Brilliant! Absolutely brilliant! "Smell test?" Yah, right. But then I got to thinking, "Why are code formatting standards such a hot topic?" The computer doesn't care if indentation is expressed with 2 spaces, 3 spaces, or a tab. But, I do! Over time, I've learned how to see coding errors just from the slight aberrations in the LOOK of code. Couldn't tell you WHAT it was, at first, it just felt (or smelled) wrong. So call it what you will, but I could now see how "smell test" has some basis behind it. Then, I got to thinking of an age-old question:
How do you find a needle in a haystack?
The technique in the article accomplishes BOTH of these. I'd suggest running the code through a pretty printer to get consistent layout throughout the whole project. The more the semantics of the project can be represented by syntax, the more visible the troublesome code becomes.
I know the author means well, but it's really hard to take him seriously when he calls 31 kloc a "fairly large" application. Besides that, too much of the "article" (if a blog entry can be called an article) comes off as ego stroking.
I love the way the article uses the complete rewrite of Netscape as an example of why you shouldn't rewrite from scratch. Cause we all know how big a failure Firefox is
There's a big difference between having code which just happens to somehow work, and having code which works because the code is clearly written and documented, where the person in charge of maintaining it actually understands what the code is doing.
But the point is that when the code is the documentation, which is what you have when you have undocumented code, you're throwing out the documentation with the code if you start from scratch. Refactoring includes documenting the code you're rewriting. In fact I've found that comments added when refactoring do a lot more to explain why the code does what it does than the comments that were already there. I don't mean that the existing comments were wrong, or didn't match the code, but that the comments added by the person who did the refactoring describe the things that were hard to understand in the original code, and often explain why the battle scars were necessary.
I've found that happening with my code that other people have worked on, with other people's code that I have worked on, with other people's code that other people have worked on, and with my code that I have worked on (because, after all, "you three years ago" might as well be another person... if that's not the case for you, you better ask yourself if you've stopped learning).
The author correctly points out that when you do a total rewrite, then the undocumented special cases handled by the old code will make themselves felt. As these problems present themselves, it takes time to fix them.
Sometimes. And sometimes they don't... yet. Sometimes you're reinstalling a time bomb that you thought you'd already defused.
With either a rewrite or a refactoring, too, you need to understand and document the result. If by the end of a refactoring project you haven't documented the depth charges... then you haven't really finished the job yet.
Your judgment whether to maintain or to rewrite should take both of these factors into consideration.
Indeed. I love rewriting code, myself. Start over from scratch. Throw out the old and crufty. But I gotta keep telling myself to watch out for deceptive arguments about why I'm rewriting... and I think this is one of the ones I've tried on myself often enough that I just don't trust it any more.
Thought #1) (GP) There's no such things as a corner. It is a requirement - it may be that fewer people/fewer processes use it; but, it is still a section of the total solution that must be designed to overcome some problematic section. Otherwise, why is the code being written?
Thought #2) Corner cases only effect a small number of your user-base; therefore, code to satisfy 95%-99% of your customers. The underlying principle here is that the manager will wait for another release. This approach is usually taken when the project manager failed to account for something and says (and I quote), "We'll just re-design it after the first release." I have taken part in a few optimization competitions, and each time #1 has been a crucial part of the solution:
The usual approach is to optimize the 90-95% case, then bail on the remainder, but this will almost always be beaten by code which manages to turn everything into the "normal" case, with no if/else handling, no testing, no branching.
When I was beaten by David Stafford in Dr.Dobbs Game of Life challenge, I had lots of specialcase code to handle all the border cases, while David had managed to embed that information into his lookup tables and data structures. (He had also managed to make the working set so much smaller that it would mostly fit in the L1 cache.
When my Pentomino solver won another challenge, being twice as fast as #2, the crucial idea was to make the solver core as tiny as possible, with very little data movement and the minimum possible number of tests.
Terje
"almost all programming can be viewed as an exercise in caching"
Will the result of refactoring code using the PGW method be as funny as his books that the users of the code will laugh till they cry?
>> Code that was written while ... half-asleep has 25 minutes of video which are worth watching just to catch e.g. the beautiful French accent when the researcher Van Cauter remarks that Americans regard going without sleep as a "badge of honour". The idea that you might actually be more effective at linking together all those pieces of information while asleep than awake - if true, it's a paradigm shift from the jolted-caffeine-philosophy.
Sorry, the URL for the CBSnews videos "Science of Sleep" is http://www.cbsnews.com/stories/2008/03/14/60minutes/main3939721.shtml?source=mostpop_story
Of course, then I'd be in a quandry, choosing between "Funny" and "Informative"...
It's one thing to decide to rewrite rather than refactor a product that is losing market share because it is not performing as well as its competitors. (E.g. Netscape.) It's another thing to decide to rewrite (and redesign) rather than refactor a wildly successful and popular product because its continued development has become difficult. Just shy of eight years later, Perl 5 is still creaking along nicely, and Perl 6 (White Elephant Service Pack) is still under design as much as development.
Is Perl 5 so hard to refactor that a determined effort couldn't have made progress, or been completed twice over, in 8 years? Along the way, a lot of the cruft and inelegance in the language could have been removed, and more elegant features inserted.
It happens over and over again - developers, even experienced ones, can't see the impracticality of what they're getting into, and can't see that they're doing work that isn't needed.
Do you want to encourage people to inline their functions manually, and not divide things into small, cute trivial functions?
Is this a misguided attempt to increase efficiency?
So..
Do you think the Gnome HIG was written by a thought #1) project manager or a thought #2) project manager?
Wouldn't it be to leave it to Psmith?
Orignator of the Miserable Failure Googlebomb
I read somewhere (it may be rubbish) that we remember things differently when read on a screen, than when read in print.
...butts.
Things read on a screen tend to be more short term memory
Things read in print tend to be more long term memory.
I know I print out drawings for checking, because you just see more...
thx e
In the Complexity Map, a slightly similar approach, a treemap is used to visualize the code's namespace hierarchy in a 2d-landscape. Results from code metric tools are layed out in the treemap, either for individual metrics (e.g. cyclomatic complexity) or for aggregated metrics (anthing that influences team productivity; e.g. errors that are not logged). Due to the Prefuse-based seamless zooming, combined with drill down functionality, it's really easy to visualize and investigate hotspots in extremely large codebases.
The website contains some more background and a nice interactive demo. If you have the patience to wait for the applet to load, I'll guarantee you you'll like it.
Disclaimer: I am the author of this tool. The website mentions commercial interest, but to be honest: there's hardly any. I've found that the concept is just too difficult to sell over the web, so I'll probably open source it soon.
"I wish someone made these mindmapping programs and made them more accessable to programs and programming."
Free Mind and Compendium Software visualization
You're Terje Mathisen.
For most of us the brilliant flash of insight that allows us to fully integrate a problem is an epiphany - that rare and fleeting moment where our brilliance truly shines. It's neither frequent nor persistent enough to build a plan on which we must rely.
For you it's more like the rising and setting of the sun.
That said, yeah, that's what I meant by "grok it". If I had spelled it out that way though someone could justly accuse me of the hubris of believing that I'm you.
Thanks for the reminder. I had fallen into a bad habit.
Help stamp out iliturcy.
We have a single class that's 30k lines. =(
It's true I tell you, feller at work's next door neighbour read it in the paper.