Refactoring: Improving the Design of Existing Code
Overview
This book could very well do for refactoring what the "Gang of Four" book did for design patterns. In fact, with the number of contributing authors, this might well become known as the "Gang of Five" book. (They contributed content to chapters 3 and 12 through 15.)
Organization
Refactoring leaps in feet first with an extended example. I found this to be a surprisingly effective opener: it didn't overwhelm me, and left me hungry for more. The first chapter follows a sample program through several incremental refactorings, and the reader gets the idea via osmosis.
To illustrate the technique of refactoring, the first chapter presents the original code on the left page, and the resulting code on the right, with changes in bold. This presentation, coupled with explanatory text, makes it easy to see what's going on and focus on what's happening. It's as if you're looking over the author's shoulder as he edits, compiles, and tests code in his development environment.
What is Refactoring?
Now that you've done a refactoring, you might be curious to know more about what refactoring is. The next few chapters provide the relevant background.
Refactoring is what the book's subtitle suggests: changing code in in ways that preserve behaviour, but improve the way that behaviour is generated. This could be as trivial as renaming a method, or as tricky as separating domain and presentation classes.
Why go through this trouble? In the end, the code is different but it acts the same; there has been no new functionality added. Why? You do this to place yourself in a better position to add new functionality to the software. If you don't, you eventually end up with spaghetti code that is unmaintainable and will not support new functionality at all.
I think anyone who has worked on real code can appreciate the need for refactoring. In fact, most good programmers already do it, although perhaps only on a subconscious level. What this book aims to do is to raise that ad-hoc activity to a higher level of applied technique. Just as there are principles and practices in GUI design (as opposed to merely throwing widgets together randomly), there are principles and practices in refactoring activity: this book catalogues them.
Catalogue
Sandwiched between introductory and summary chapters is the meat of the book: a catalogue of over seventy refactorings. This catalogue follows in the footsteps of the highly successful Design Patterns format: Pattern Name and Classification, Intent, Also Known As, Motivation, Applicability, Structure, Participants, Collaborations, Implementation, Sample Code, Known Uses, and Related Patterns. Since the individual refactorings are less complex than patterns, this catalogue uses the format: Name, Summary, Motivation, Mechanics, and Examples.
The idea is the same. The name and summary provide a definitive vocabulary and a reference-card example. The motivation explains the relevance of the refactoring. The mechanics cover the step-by-step details of how the refactoring is executed. Then a series of examples demonstrate the variations.
Applicability
I like the catalogue. Although some refactorings seem deceptively trivial, it is useful to have them laid out in step-by-step detail. You never know when you will make a mistake, and when you absolutely positively must fix a bug or add a feature by the next day, and need to refactor to do it, slow and steady wins the race.
Further, other refactorings are not so trivial and familiar, and it is certainly useful to have their traps and pitfalls exposed. Frequently, they rely on the smaller refactorings themselves.
I can see this book becoming well-used in a shop with plenty of production code.
Supplementary Material
The non-catalogue chapters are informative as well. I especially appreciate the metaphor of bad smells in the code: the "if it stinks, change it" philosophy is the perfect counter-point to the oft-cited "if it ain't broke, don't fix it" mentality.
The chapter on refactoring tools discusses the possibility of automating much of the mechanical work of refactoring. Although there is a Refactoring Browser for Smalltalk, I suspect that Java and C++ versions are a little ways off. I'd wager that, as with the UML, tool support will lag industry practice for some time.
Style
As always, the author's writing style is down-to-earth and easy to read. Martin tells you straight up what he's found useful and what he hasn't. He tells you where he's made mistakes, and where the risk is less pronounced.
I like the way he goes through an example, then goes through it again under different conditions, thereby revealing the many-splendoured variations. Frequently he continues examples that were left off from other refactorings.
Plenty of further reading is suggested; I always like that.
Flaws
The book has a Java focus, and that is the language used for the examples. There is some mention of Smalltalk and C++, but not much; far less than Design Patterns, for example. Still, the book is quite understandable to anyone with object-oriented development experience.
The book references design patterns; some refactorings even apply and manipulate patterns. However, I wish there were more direct references to the Design Patterns book. That would especially help those new to both refactorings and design patterns.
There are a few minor typos (nothing major), so check the author's web site for errata and try to get a recent printing if you can.
Recommendation
It's no secret that I think this is a book whose time has come. I'm hoping it will codify my approach to refactoring, to help me be more efficient in my development.
I recommend this book as both a practical catalogue, and as a general work on the theory and practice of refactoring. I think that the refactoring community will grow much as the patterns community before it, and that we will see more published on the subject.
Until then, this book is a good start.
Purchase this at Amazon.
TABLE OF CONTENTS
Foreword
Preface
1. Refactoring, a First Example
2. Principles in Refactoring
3. Bad Smells in Code
4. Building Tests
5. Toward a Catalog of Refactorings
6. Composing Methods
7. Moving Features Between Objects
8. Organizing Data
9. Simplifying Conditional Expressions
10. Making Method Calls Simpler
11. Dealing with Generalization
12. Big Refactorings
13. Refactoring, Reuse, and Reality
14. Refactoring Tools
15. Putting It All Together
References
List of Soundbites
Index
...This is an important new concept that needs to be looked at. Seems like open source works pretty well but there is still a lot of over development being done, I know this debate has existed between desktop environments etc, but it seems like on things that are smaller projects there could be a lot of effort put into existing projects or starting new projects rather than writing another ftp client when there are already 30 of them.
Funny and I thought Perl == Paid employment recently located
I think a lot of people underestimate the importance of refactoring code. It's put to good use (as I can attest from experience) in the Extreme Programming software development methodology. (If you haven't heard of this, check it out. It seems kind of radical, but it works very well in practice if applied appropriately.)
I've owned this book for a couple months now and I feel it was definitely worth buying. The section on self-testing code was quite useful, even if it was short. My one complaint is that the book does not address how to apply refactoring in an environment that closely tracks SPR's. On a project where there is formal witness testing you usually try to keep SPR's small with limited impact. This is exactly the opposite of how refactoring works... i.e. redesign the whole thing if it is the Right Thing to Do. While self-testing code helps, having to pay for a complete formal regression test for each SPR would get expensive. Other than that, however, this is an excellant book. I would be very happy if it attracted a following as large as the Design Patterns book.
Injured software engineer wins against Mattel!
I haven't read either this book or the book on patterns, so I may be well off the mark. However the key idea behind both of them seems to be old chestnuts in software engineering.
For design patterns, read reusability. Some languages, eg. Haskell, support a high degree of reusability in the way they work, and so one can implement the idea-in-itself once and for all, but in most languages you will find yourself reimplementing the same idea again and again.
Similarly with the current book, for refactoring read refinement. If we have programs M and N, and N terminates on the all of the inputs M does, and with the same observables, then N refines M.
Both of these ideas are important, and fraught with hazards in practice, so they are well deserving of a book length treatment. What irritates me is the contention that until now these ideas are ones we were only subconsciously aware of. Absurd: they are old ideas, and ones good sofware engineers are very conscious of.
The more I write code, the more I realize that it is like any other kind of writing. And after years of looking at garbage spagheti code, I have come to the conclusion that the best way to raise the level of coding is for experienced and talented programmers to review the code of others, making revisions if necessary. Many programmers would no doubt scream in protest, and this too would be a good thing: in my experience the worst programmers are also the ones with the most vanity (especially in regards to their code).
In all my (admittedly slightly less than 5) years working as a programmer, the _worst_ in systems I have worked on have been ones where there has been a lot of architecture analysis done. Within a few years it is either obsolete (and the extra design costs wasted since a quick'n'dirty solution would have costed a fraction and lasted easily for the time span) or it gets patched by so many people to do originally unplanned things that it is no better than a quick'n'dirty original version (again which would have cost far less). KISS. I hold that motto so dear - keep it simple and make sure it works. Multiple layers of abstraction etc in a project typically (unless its some MASSIVE project) add overheads no less than a quick simple architecture would have done [imo - this is my experience - maybe i dont have enough experience but so far in my career it is overcomplication of design rather than code that has been the big time waster for me when working on changing old code]
...can replace good, solid comments. Comments lines are soooo understated in schools, but are sooo conforting in the real world.
At University of Toronto, examples in first year had 2 line of comments for each line of code. I try to stick with that ratio at work. (I said TRY
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Too many comments can be as bad as too few, and trying to get the right mix is somewhat of an art that I still haven't quite mastered. But I think using them as reminders has come in very handy.
Ita erat quando hic adveni.
Excuse my ignorance but what is a SPR?
Another book on this subject that you all may find interesting is Anti-Patterns
You will not drink with us, but you would taste our steel? - Walter Matthau, The Pirates
> What irritates me is the contention that until > now these ideas are ones we were only
> subconsciously aware of.
I think the main reason to be for books like this and Design Patterns are to
a) provide a common vocabulary so everyone knows what we're talking about - for example I 'discovered' the Command pattern and called had a load of classes derived from command. But I talked about compounds rather than composites. I frequently use template methods but until I read Design Patterns I had no concise way of refering to them - I had to explain it everytime. Refactoring is the same.
b) show you more then you already knew, or cast new insight on something you're already familiar with.
This second reason is probably why Design Patterns is so popular and why I think Refactoring will also be. You already be using more than half of what the book describes, but the book(s) show you more then you knew. Because you're already in agreement with the authors (because some of the stuff is already familiar) the new stuff goes in easy.
Good software engineers may already be aware of this sort of thing; I know as I read that book review I see things I've done and things I've wanted to do but not had the time and things I'm perfectly well aware of.
But that's not the primary value of a book like this. If the ideas were brand new and untested, they would be less valuable to have written down. The thing is that there are at least N+1 ways, for any given value of N, to re-engineer or refine or redesign a piece of code, and ideally you want to consider as many as possible before choosing which one to do. A book listing lots of them gives you a massive boost because it reduces the chance that you might overlook the one strategy that could be the biggest win. Think of it as a checklist: you may know, if you look in the fridge, that you have no milk, and as you walk around the house you may see any one of forty things you need to buy and they're all obvious to you, but you still make a shopping list when you go out because otherwise there's a good chance you'll forget at least one of them.
In addition, writing these down might help turn bad software engineers or learning software engineers into good software engineers. I think learning software engineers, if they're going to be good ones, probably are already subconsciously aware of these ideas and benefit from having them brought up into the conscious level and carefully reviewed.
Long live the Egoless Programmer. If you're really good, why be afaid to show it in a code review with your peers? That's Engineering, folks.
..." should have?"
All of the Dilbert books.
Just to maintain your sanity.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
I often write small assembly programs, where correctness, then time optimization are the critical design goals. I have to agree with your statement on over architecting small one-off systems, though careful design is always rewarded.
I'd also agree architecture is of critical importance in large multi-programmer projects, especially any that must be expanded and grow over time. Many open source projects qualify nicely of course.
But refactoring really rings a bell, even with small assembly programs. Typically I develop a simulation in C, test it, translate to assembly, verify equivalence of output, then refactor it until it's time optimal. Don't know if this book would suggest methods useful to me, but rewriting code while preserving its function is something I do a lot of.
WikiWikiWeb (http://c2.com) is one of the underappreciated jewels for programmers and anyone who wonders how you can turn into a 'good' programmer, or what a good programmer is anyway.
I bought this book as soon as I saw Martin Fowler's name on it, and it hasn't disappointed me. On some level, it's design patterns in practice, but it deals with the niggling deals of loose code far more effectively than DP. I liked it.
The testing framework methodology really interests me, but I found I was spending more time writing the tests than actually writing the code. YMMV.
Foo.getBar()
or
Sun.addAnotherJavaFeature()
Maybe Perl or Python would be better examples :)
Ita erat quando hic adveni.
>"Code as if whoever maintains your code is a violent psychopath who knows where you live."
I HAVE A NEW SIG.
Actually, I try to include the business rules/concepts in my comments also.
eg
/* Since each box contains a max of 28 packages and each shelf has 4 rows of 8 boxes.*/
Important when the business rules change. And they will
Also, your post is formated enough to get the idea.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Sigh. Submitted too soon.
>/* Since each box contains a max of 28 packages and each shelf has 4 rows of 8 boxes.*/
Business rules that could change assumptions you made, there are now a new type of box, the rows are now different depending on which warehouse it is in, a box can now contain packages or containers or some of each.
Business rules suck. They get in the way of my beautiful code.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Same difference really... :)
If You think about it...
-- No, no -- Not that one!
Many times in reviewing other peoples' code I see things that look like a half-assed implementation or poor design. It isn't until I dig deeper that I realize why it was done that way. I mean, you could write a script that says "reads through file line by line and process contents" but you need a person to say "Used positional delimitation within the '\' delimited fields to avoid multiple uses of strtok(,,)." It helps future users, and yourself.
Blar.
The most important thing about all of this is that software development goes in cycles. First you make it work, then you make it right, then you make it fast. Leaving out any of these steps is very bad.
Another very bad thing is when you have the whole system planned out in excruciating detail before you write line one of code. Inevitably, one of your assumptions will turn out to be totally unworkable, and if it's already set in stone, that will probably break everything else. Generally you have to sketch the broad strokes, fill in the major code, find out what works and what doesn't, throw away what you've done so far, and start for real. That's just the way it is, and if you don't plan to throw away your first try, you'll just end up being overbudget and late when you have to throw it away anyway.
----
We all take pink lemonade for granted.
There is no K5 cabal.
I am not the real rusty.
I feel that there are merits to more than one style of commenting, but the appropriate style is based on the language being used and to some degree, the type of project being coded. However, I feel that the code itself should always be as readable as possible. Why? Well, suppose you are working on a piece that has been revised many times by many different programmers. The comments may or may not give an accurate picture of what's going on with the program. However if it compiles, the code will always tell what's happening. That should be made as painless a process as possible for those reading it.
That aside, I would offer my opinions for the following languages:
Stack Assemblies: Comment anything tricky, and include a stack status comment on every line! This is necessary to make sure the stack never underflows or overflows due to not knowing what to expect in it after unconditional branching.
Other Assemlies: Commenting most line is still likely a good idea. That way it's possible to see what's being moved and so forth.
Higher Level Languages: Inline comments are usually a waste. In fact they can reduce the clarity of the code. Assume that the reader of your code knows the language. Save inline comments for tricky algorithms, highly mathematical content, or maybe obscure functions in a large language like Perl or Ada. Block comments have a great deal of use though. I'd probably recommend a block comment to describe each function.
...just my 2 cents; it's saved me a heck of a lot of time.
I'm a gnu world man.
- Mike
The ideas of both patterns and refactoring are acknowledged to be old. The design patterns book specifically states that each pattern had to be used in at least two successful, major projects to be included in the catalogue. Similarly, the refactorings in this book have been tested and tried. The best programmers have been working this way for ages.
What is new, is the codification of these ideas into a more textbook form. This is a step along the way from art to science. You can open a textbook now, and discover the steps of a proven method to get from software point A to B, when to do so, and when not to.
This makes it more akin to engineering. These are proven recipes for engineering software, and now they are codified in books.
--
Marc A. Lepage
Software Developer
I'd really recommend reading Stroustrup's "The Design and Evolution of C++" to understand *why* C++ is the way it is. It is *not* entirely about C... it is about other issues, such as performance, static type checking, etc.
Then Stroustrup's "The C++ Programming Language (Third Edition" to understand the language as it now is. If you can see past the support for casting and operator overloading, you will discover that Stroustrup has as much to say on large scale software design as Booch or Lakos.
There are whole chapters on expressing architecture in C++. Of course, you have to understand the syntax and semantics of the language to achieve that goal. But C++ is about more than obfuscation and C compatibility. Lurking in there is a language that combines the best of Simula, C, and other languages, into a successful language that supports large scale programming.
Don't forget: Stroustrup's background is heading a research centre for large scale programming at ATT. He doesn't just work with toy problems.
--
Marc A. Lepage
Software Developer
I've never had it happen to me, personally, but I've read that on large, or mission-critical systems testing code can exceed the directly functional part by low integer factors.
struct mystruct
/* This calls functions "function1" and "function2" and needs a FILE*
foo(int bar, int baz, FILE *fnord)
*that points to the comma-delimited data that gets updated in function "blorf".
*/
That's saved me a few hassles. ("Why is this not working right?" "Hold on, I tweaked foo... looks like function blorf needs an update too; I'll get right on it.") Does CVS do something like this automatically? I ask because I have never used CVS; it seems like overkill for the essentially one-person project I'm working on. Besides, the users always let me know semi-immediately whenever anything breaks.
Give a monkey a brain and he'll swear he's the center of the universe.
I agree that finding people qualified to act as editors would be difficult, but I also don't think that you completely understand my proposal. First, we have to be clear (as too many organizations are not) about the roles of managers verses programmers (engineers or whatever) in regards to technical decision making and administration. Most management positions in most organizations combine both responsibilities, with disastrous results as most experience /.er's will tell you.
The job of a software editor need not, and probably should not, include administrative functions. In this case the job would still be focused on reading and writing code, and would not be that much different than that of a programmer. You could also bribe your better programmers to take on this responsibility by letting them contribute some (limited) amount code(as well as with a financial incentive). This would let them keep their skills up and satisfy their urge to develop software.
Although I have no experience in the publishing industry, I do have the impression that editors edit multiple authors. So Alice would do much more than mark up Bob's code for him to rewrite. Note also that everything that Bob learns from Alice, from better coding techniques to flex/bison, makes him a better programmer and thereby boosts the productivity of the whole team. I think this might make Alice worth $X.
Here is a link that might help:
XProgramming.com
Code is garbage in garbage out.
Languge is garbage in, non-sequitor out.
Perhaps the book addresses this (I haven't read it). Anyone actually work anywhere where management signed on to refactoring?
The problem with high levels of abstraction crop up when the abstract model isn't a good fit for the task or process you're trying to write code for. Bad as coding in C++ may be, it's easier than trying to change the business model. (That said, I've seen more bad C++ code than perhaps any other language -- I've also seen some very good C++ code).
I used to do a lot of development in APL -- now there's a language with a lot of high level abstraction, but it's oriented in a particular direction that is not necessarily a good fit for some of the things I've seen it applied to (email!? business management!?).
It may well be that the applications you're working on could in fact be better developed in ML or Haskell (I'm not familiar with either of those), but in the commercial world that's only one consideration. Other considerations are: who supports the develpment tools, and how big is the available pool of talent to support what gets built. I've beaten my head against that wall, too (I was an early adopter of C++ back in the 'cfront 1.0' days because the OO approach was a much better fit for some of the applications we were developing -- this in a UNIX/C shop that had just barely finished migrating some of their developers from VMS/FORTRAN. C++ has gone downhill since then, in my opinion.)
-- Alastair
Luckily, where I work the emphasis is on rock-solid code. Management will accept the re-engineering of already 'working' code, but it's the same battle every time: convince me that the pay off is worth the extra time invested. I understand their point of view, but sometimes it gets tedious...
Blar.
I always imagine writing a good program to be similar to writing a good novel. Most writers will rewrite a given paragraph or chapter tens or hundreds of times until it's just perfect.
Programs are no different--whenever you write a function or module, consider it a draft, and don't worry about throwing it away and writing it again. Too often I see people spending hours and hours trying to get acceptable behaviour out of their fundamentally flawed "first draft", when it would have been much simpler and easier to just toss the code and rewrite it, now that the problem is better understood. That way, when you're done, you have an elegant, easy-to-understand, simple program, instead of an inscrutable mess that "seems to work okay" (as far as your testing shows, anyway!)
I don't care if it's 90,000 hectares. That lake was not my doing.
Or a compliment for that matter.
Chapter 13, "Refactoring, Reuse, and Reality", by William Opdyke, covers this.
It's a hard topic, especially since many of the best technical people are not the best politicians. How do we cope?
At one point, Fowler addresses the question of "What if your manager won't let you refactor?" His controversial advice is "Don't tell the manager you're refactoring." His justification is, you are a professional, you know what it takes to do your job, and if refactoring here and there is the right thing to do, just do it. When your development improves because the code improves, your manager won't complain.
--
Marc A. Lepage
Software Developer
I am the reviewer. I have trashed books in the past, when deserved. I gave a game programming book 5/10. This book is 9/10, deservedly so. Please don't question my reviewing integrity.
--
Marc A. Lepage
Software Developer
>comments are not necessarily.
It was a first year course and it was mostly to explain to students what was going on. But coming from high-school it was quite abit of comments.
>why Waterloo produces better and more sought-after developers than UofT.
I know and worked with quite a few Waterloo grads and I must admit they are _all_ very nice people and good programmers. Considering that the university is in the middle of nowhere.
The surprise isn't how often we make bad choices; the surprise is how seldom they defeat us.
Another good book on this subject is AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis (ISBN: 0471197130).
That information is great to have, but think about how it will be maintained. Chances are, it won't. You may maintain it religiously, but the next guy will change something, and not think about that it changes the information.
Once the documentation is not reliable, people will stop reading it, and it will grow obsolete at an ever increasing rate.
So I'm doubtful about this mechanical approach.
But any documentation that is in the code itself is always 10 times better than the one that is on it's own in a binder or web site somehwere. That stuff never gets either read or updated, and is just a pure waste of effort.
The WikiWikiWeb is probably *the* finest resource of information for professional object developers that I've ever found. The knowledge laid out there by its little community has completely changed the way I view (and do) software development -- and for the better.
__ Em
I've used Miranda and NIAL (Nested Interactive Array Language). Both are lazy evaluating, functional languages.
I have a background in computer science, and have studied dynamic programming, functional programming, etc.
You should check out C++'s standard valarray templated class. It is designed for optimum performance. Implementations typically use proxy objects for intermediate access and operations. This, in effect, means lazy evaluation. However, it's partly done at compile time, which effectively means performance.
--
Marc A. Lepage
Software Developer
Yes, my web page has a comprehensive catalogue of my software development library.
http://www.cgocable.net/~mlepage/library.html
Of course not all are "must haves" but I did buy them all.
--
Marc A. Lepage
Software Developer
Good comments are like newspaper headlines -- they give you a quick summary of a section of code without having to read article (code) in detail.
Even though the article may be well-written, it still takes more time than the headline if all you want is the gist. Sometimes you are simply hunting for something and need a way to filter out the unlikely code "paths" faster.
Also, giving a "hint" before the actual code may make it's purpose jump out faster because you were prepared with the general idea.
Good commenting is art form, (just like programming itself.)
Table-ized A.I.
I've given away dozens of programming books, but here is a list of books still on the shelf. Used bookstores are better off without a computer section.
(Reality reasserts itself sooner or later.)
http://www.extremeprogramming.org is another site about Extreme Programming (XP).
After reading this review I went to The Bookpool (where Refactoring is available for $28; sorry, Amazon) and ordered it. I've now had it a few days, sampled a number of sections, and started seriously on reading from cover to cover.
Maybe SEGV's seen something I haven't, but I'm tempted to give it at least 9.5/10, and thinking about more.
Yes, as many posters above note, I too have been refactoring for much of my career, to save my sanity if for no other reason. But I called it ``cleaning up the code'', and often couldn't articulate to my peers or bosses why it was the right thing to do. I was abstracting the form of the code, changing it to make it easier to understand. Fowler has abstracted the form of the changes, to make them easier to recognize and execute correctly. This higher level of abstraction is what makes the book worthwhile.
In addition, he's labelled and codified abstractions I haven't thought of, but which will be useful now that they've been brought to my attention.
It's also nice that he's given ``guest authors'' chapters to themselves, so we get different views of the subject. Fowler's upfront about what he owes to others in developing the concepts; he says they should have written the book, but since he's the one to get around to it, he's at least roped them in for their expertise.
All in all, if you ever have to touch sub-standard code, get and apply this book. I would have killed for this at my last job.
I refuse to believe corporations are people until Texas executes one. -- desert rain on http://www.dailykos.com/user/