Study: Refactoring Doesn't Improve Code Quality
itwbennett writes: A team of researchers in Sri Lanka set out to test whether common refactoring techniques resulted in measurable improvements in software quality, both externally (e.g., Is the code more maintainable?) and internally (e.g., Number of lines of code). Here's the short version of their findings: Refactoring doesn't make code easier to analyze or change (PDF); it doesn't make code run faster; and it doesn't result in lower resource utilization. But it may make code more maintainable.
...is pretty important, and you should refactor when needed if only just for that. It'll spread all over rest of the code in many ways, in good ways.
the game
When you code, the most important thing to do is get your memory architecture built right, then methods just write themselves. Come back later and want to make a better method, you can use your old code as a partial refactor. It is an agile sort of run and gun approach and it works.
Refactoring for the sake of refactoring is often wasted time for the original author for there is ways of understanding code past just nice variable names and indentation. Sometimes even badly formatted code stands out and reminds you what it did to remind you of how it works. But when you code in a group, this doesn't hold true and a refactor can help.
God spoke to me
How any anyone say, or write, that refactoring doesn't make code easier to analyze or change, and then follow up with it can make it more maintainable? Also, who in the world ever though that refactoring would make code run faster?
We gave random medicines to groups of random people, and there was no statistical improvement in their health. Some people became healthier, but many people actually became ill.
Isn't the very *definition* of making code more "maintainable" that it makes the code "easier to analyze or change"?
Car repair does not make car faster, nor more comfortable.
Let's see... of the 5 main conclusions, 4 were statistically insignificant, and the 5th showed an improvement in maintainability. Thus the conclusion is don't do it. Sure, that makes sense.
The test case basically converted procedural/structural code (structs and test cases) to object oriented code (classes and polymorphism) for a small, 4,500 line project. What they basically added was extensibility at the expense of overhead and traded individual-line complexity with architectural complexity.
Yeah. The conclusions are nonsense piled on more nonsense. Plus it is plain bullshit. Imagine I only refactor by removing duplicated code across functions or different compilation units. Will the compiled code size become smaller? Yep. Will be easier to read (less LOC to read)? Yep. Will it be more maintainable? Of course you have less code to bother with.
My thoughts exactly. More maintainable code IS higher quality code, in my opinion.
Making code run faster has a completely different name, it's called optimization (and is frequently the root of all evil). And it often involves the exact opposite of things you do when refactoring. Eg, unrolling a loop to make it run faster is pretty much the exact opposite of refactoring for maintenance & readability.
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
I hope not.
Required reading for internet skeptics
No. Refactoring is when you take the awful, unmaintainable spaghetti code you produced when you were in a deadline crunch and convert it into something maintainable. The goal is to restructure the source code without changing the functionality at all.
There's no point in questioning authority if you aren't going to listen to the answers.
Sorry, but it has absolutely nothing to do with the real world. They're giving twenty people - ten in experiment group and ten in the control group 30 minutes to do a bit of analysis. And they measure minutes to apply a few changes, without any qualitative measure on how the code is growing. There's very little proof that the refactoring they did made any sense, the sample size is so low you'd never get reliable results and pretty much what you can conclude is that refactoring doesn't make hackjobs easier. Never thought so, that just involves finding the place something's happening and hack it. If it's a good idea, well... it works there and then.
Live today, because you never know what tomorrow brings
I wouldn't call that study publish worthy.
It certainly isn't statistically significant. 4,500 lines of C# code is nothing. I work with systems that have millions of lines of code. I've seen single class files that have thousands of lines of code (and vomited when I saw it). An important question here would be whether the volume of code in a system is a significant factor in the value of refactoring.
Based on their own statistics the refactoring was poorly done. Their result was more code, more complexity, and more coupling. Certainly not the work I would expect from an experienced software developer, but certainly something I would expect to see from undergraduate students who don't fully understand what they are doing.
I think the last sentence in the actual study sums it up pretty well - "Furthermore, it would be better that the same
experimental setup can be executed in industry environment with the industry experts and with
the industry level matured source code."
That's hilarious, I have web apps (I'm stuck with) having individual pages larger than that, including tons of other crap. Refactoring allows following the DRY principle and removing duplicated code. It allows moving SQL statements all the heck over the place into single places where they can easily be tested and updated when bugs are found.
They're basically working with a program that's not really that awful in the first place and making it a little bit nicer. How about starting with absolute junk and making it useable? Unmaintainable code is a consequence of technical debt, refactoring pays that debt down and keeps things manageable. Sure you may not need to refactor right now, but taking the time to do it once in a while keeps things from getting out of control.
Cwm, fjord-bank glyphs vext quiz
There are so many problems with that study.
First, they use C#. There is no reason to think that it's not language independent.
Second, All the code was from one code base with 4500 lines! How can you extract anything statistically significant from basically 1 data point!
Third, supposedly 10 canonical re-factoring techniques were used. Could it be that these re-factoring techniques are useless? Of course, they are not discussed at all in the article. We don't know what re-factoring techniques they used (out of a big set from a different paper).
Fourth:
In order to apply 10 refactoring techniques a small scale project with bad smells was selected as
the source code. The selected application was a system which was implemented in the
Department of Industrial Management, University of Kelaniya and used by academic staff at the
department to schedule their personal and professional events and to manage their online
documents repository. The source code contained around 4500 lines of codes. The relevant bad
smells were identified and all the selected refactoring techniques were applied to the source code.
What?? They sniffed the code??? Makes absolutely no sense. I'd say the authors are idiots...
Oh yea, neither of the authors have background in CS. They're MIS (i.e. trained to be PHB).
I've seen the before-and-after when crap code was rewritten and refactored by hand by a good coder.
The improvement was huge.
Was it better than if the same coder wrote the code "from scratch" from the problem-description or design document? I don't know, but my point is that crap can be turned into gold by a good coder, and that refactoring can be part of the cleanup.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Nope, it's when I take the awful, unmaintainable spaghetti code someone else produced when they were in a deadline crunch and convert it into something maintainable.
Sigh... I wish I could say that with a straight face.
Interestingly, in my experience, poorly structured code seems to come about often less often because of "rushed code" but instead a lack of foresight in the original structure of a system to deal with continuously evolving features (which happens in most projects), along with a lack of willingness to refactor those systems as soon as it's apparent it's starting to break down.
This is the "golden time" to refactor code, because it's just now become apparent where the structural flaws are in the architecture, but it's still early enough to refactor without causing a significant amount of pain. It's often hard to justify, because you've only got a couple of ugly special cases that complicate things here and there. However, if you procrastinate too long, you're going to start piling on more and more "ugly special cases", and the code is going to get harder and harder to read and maintain.
Irony: Agile development has too much intertia to be abandoned now.
It needs a lot more qualifiers than that.
For a start, as with an unfortunate number of academic studies, it appears that the sample population consisted of undergraduates and recent graduates. That alone completely invalidates any conclusions as they might apply to experienced professionals with better judgement about when and how to use refactoring techniques.
Even without that, there seem to be a number of fundamental concerns about the data.
One obvious example is that they consider lines of code to be a metric that tells you anything useful beyond the width you need to allow for the line number margin in your text editor. I doubt most experienced programmers would agree that a LOC count in isolation tells us anything useful about maintainability or that the mere fact that LOC went up or down after a change necessarily meant the code had become better or worse in any useful sense.
Another concern is that they talk about "analysability", but this seems to be measured only by reference to a brief examination of a small code base in one of two versions, unrefactored and refactored. I'd like to know what the actual code looked like before I read anything at all into that data -- what refactoring was performed, what was the motivation for each change, and how do they know those two small code bases were representative of either refactoring in general or the effectiveness of refactoring on larger code bases or code bases that developers have more time to study and work with?
I'm all for empirical data -- goodness knows, we need more objective information about what really works in an industry as hype-driven and accepting of poor quality as ours -- but I'm afraid this particular study seems to be so flawed that it really tells us very little of value.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Making a judgment about "refactoring" as a single, simplistic concept is like making a judgment about "food" or "government" without going into any further detail. Umm, it's kind of not that simple.
Refactoring in real life is a whole array of different, nuanced activities. Any of them can be wise or foolish depending on the situation. Well-written code requires less of it, but some degree of it will always be needed as we can't tell the future. Each instance is a judgment call with no concrete right answer.
The #1 reason to refactor is to make code testable. The standard recipe is dependency injection with an IOC container, resulting in no live constructor calls, no dependency issues when a constructor changes, and decoupled dependencies. It's not a magic bullet but it's a big step in the right direction. Interestingly, testable code can have no tests and still be higher quality as a result.
from my blog on this, just now:
Proponents of refactoring have never ever said otherwise (unless they themselves are confused on the matter). Code is only readable if it is either simple, or clearly follows design patterns, or is clearly commented and the comments are up to date with the current version of the code. Code is only easy to change when it is readable and when all external dependencies are well known. That last part is a key thing that metrics aren't necessarily able to capture.
A refactoring project, if not refactoring to the right design patterns to address what was wrong with the structure in the first place, is not going to improve it. One must know clearly why the current structure is making a bug-fix or a new feature difficult to implement.
And while some refactorings are 'good' in that they reduce a lot of copy-paste code, others are good because they add code, or add classes (an alternative increase in complexity). Different refactorings have different effects, and are used in different situations.
And as always, if you don't need to refactor, don't. A refactoring is to improve the design, not to rewrite for its own sake.
And there-in lies the great flaw of the whole idea of such a study: you can't measure the quality of a software design. Some things you just have to judge for yourself, based on experience and attention, and no arbitrary metrics number will ever differentiate between a good design and a rubbish heap.
Disclaimer: I hate software metrics.
"But remember, most lynch mobs aren't this nice." (H.Simpson)
-- Joe
Single functions with more than 4500 lines? Sounds like a good time to do some refactoring, that sounds hard to follow and probably should be broken down. We have a 100 line max standard where I am at now.
Hey, there is only one Return and it's not of the King, it's of the Jedi.
The paper itself is a crap read to begin with. It's just screaming for a Fox News headline model like this. I think Slashdot really summarized itwbennet summarized it pretty well.
I read a bit of it looking for actual meaningful numbers, but it was clear to me that none of the people involved or focused worked on multi-million line projects. I am almost choking on my tongue thinking "Imagine the code quality of a web browser if the code wasn't regularly refactored to reduce the number of possible bugs?
Every single thing test they ran and used as proof, I can personally contradict from experience being part of implementing CSS 3 in a browser. There are thousands of contexts in CSS alone where it is business critical to refactor.
So I have a method that brute forces something, then I go back and figure out how to do it with a better big 0, and the functionality doesn't change, but that still isn't refactoring, because ... ?
That is generally considered optimizing, not refactoring. By some definitions of refactoring I guess all optimizations are a form of refactoring, but that is almost never what someone means when they say refactoring.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
This is the whole problem with "narrow and deep" vs "wide and shallow." Too deep a hierarchy is just bad practice.
"Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
I once took over 30,000 lines of code that had been written by a subcontractor and trimmed it to around 4000 LOC. And you better believe it ran faster! Not because refactoring is magic, but because once all the mind-numbing almost-repetition was mucked out you could actually see what the code was doing and notice that a lot of it wasn't really necessary. Ever since then I have always maintained that coders should never ever copy and paste code. I've had people disagree, saying that a little bit of copying and pasting won't hurt, but I say if it's really such a little bit then you shouldn't mind re-typing it. Of course if you do that very soon you start putting more effort into devising ways to stop repeating yourself, which is exactly the point. Repeating yourself should be painful.
That's I think a reliable litmus test for whether you should refactor a piece of software. If it's an area of code that's been receiving a lot of maintenance, and you think you can reduce the size significantly (say by 1/3 or more) without loss of features or generality you should do it. If it's an area of code that's not taking up any maintenance time, or if you're adding speculative features nobody is asked for and the code will get larger or remain the same size, then you should leave it alone. It's almost common sense.
I don't see why anyone would think that refactoring for its own sake would necessarily improve anything. If an automotive engineer on a lark decided to redesign a transmission you wouldn't expect it to get magically better just because he fiddled with it. But if he had a specific and reasonable objective in the redesign that's a different situation. If you have a specific and sensible objective for reorganizing a piece of code, then it's reasonable to consider doing it.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
So I have a method that brute forces something, then I go back and figure out how to do it with a better big 0, and the functionality doesn't change, but that still isn't refactoring, because ... ?
Because it violates the standard definition of "refactoring".
Refactoring is about changing the structure of the code, and not the algorithms used within the code. The goal is typically to reduce coupling, increase cohesion, and (frequently) to improve testability.
Replacing an algorithm with a better algorithm isn't "refactoring", it's "rewriting".
Taking your giant brute-force method and breaking it into smaller parts in a cohesive unit (source file, class, package, etc.) with lowered coupling (perhaps by genericizing previously tightly-coupled bits), in such a way that the individual units have a smaller testing surface -- but is otherwise the same algorithm -- then you've refactored the code, by definition.
Yaz
10,000 line functions are shockingly common in industry. Shit grows over time, and is so poorly written that you can't safely refactor it, and management lacks the balls to let you clean it up, so it just festers and festers.
I hear PayPal had 90% of their processing business logic in a single, multi-million-line class! Thankfully, I don't know that one first hand.
Socialism: a lie told by totalitarians and believed by fools.
The compiler is what makes the final decision on the code, not the programmer.
Just refactoring a poor algorithm will still result in a poor performance, though the code might look better.
The whole point of refactoring in most cases is to make the code look better, not to improve performance. Very rarely do people change an algorithm as part of refactoring. The goal is instead to make the code easy to understand, easy to change, and easy to debug. Creating consistent names and API patterns, re-ordering statements to group them based on the task they're performing, finding code that is the same in many different methods and moving it into its own method, reducing dependencies between classes.... these are common refactoring techniques and have almost nothing to do with performance.
I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
This is the "golden time" to refactor code, because it's just now become apparent where the structural flaws are in the architecture, but it's still early enough to refactor without causing a significant amount of pain.
It can also be the golden time because the original writers are still around and know why certain things were done a certain way, and know what dependencies need to be addressed.
Interestingly, in my experience, poorly structured code seems to come about often less often because of "rushed code" but instead a lack of foresight in the original structure of a system to deal with continuously evolving features...
In my opinion poorly structured code too often comes from individual developers who just don't care. They show up to do a job and get paid; they got into CS because it was a way to make money; they don't really care about doing things elegantly. Or perhaps they care about quality but just don't understand the benefits of refactoring.
I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.
About 5600 lines. However, because it was a glorified case statement, you were really only debugging a single case at a time, each of which was about the length of a sane function, so splitting it into functions would do little to improve readability. I like to trot out that example to terrify people, but the function itself was really quite sane and easy to maintain.
You did, however, have to fully understand the state machine as a whole, which in total was almost twenty kloc, had almost 200 instance variables in the state object, and leaned heavily on a tree object with about 30 instance variables. That's the point at which most people's heads exploded.
Either way, 4,500 lines is the size of a fairly straightforward iOS app. Most folks can dig into that and figure out enough to maintain it without spending a huge amount of time, even if the organization isn't ideal. When you hit tens of thousands of lines, that's where you have to start thinking about how you organize it and document it, because with such large projects, if you jump into the middle without a complete picture, you're likely to be hopelessly lost.
Check out my sci-fi/humor trilogy at PatriotsBooks.
Yes, after 25yrs in the industry I have come to the conclusion that "what is it supposed to do?" is invariably the most difficult question to answer.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
I'd rather have a readable and maintainable 4500-line function than an unreadable and unmaintainable 45-line function.
I remember, years ago, standard advice to students to break up code in to functions when some process got over some number of lines. Know what we got? A lot of really odd functions of similar length, as they broke things apart at seemingly random boundaries! I'm still deeply suspicious when I hear people talk about a proper length for functions.
It's time to let that one go, and teach students that while shorter is often better, length doesn't really matter that much. They ought to factor out functions when they see them and when it makes the program more readable. I'll suggest this as a new rule-of-thumb: functions should do just one thing and be named accordingly.
(Still, 4500 lines? Damn, that's big.)
Required reading for internet skeptics
Code runs. It continues to run. And, over time, the requirements change. Maintenance is adapting the code to the new requirements. If you don't understand why this is important, then I can only assume that no one is using code that you've written.
I am TheRaven on Soylent News
Tech debt is like credit card debt: the interest is a bitch. I worded for a while at one company that nearly folded because the time required for emergency bug fixes grew to, then past 100% of development time for the team. Horrible code doesn't just require more bug fixes in the first place, each change grows progressively more expensive and unsafe.
10k lines of shipped, production code is only of value if it's working bug free and without complaint. 10k lines of buggy code, that you have to add a week to any project that modifies in any way, that has negative value.
That being said, if the code is "cleaned up" by the same team that wrote it in the first place, you likely don't come out ahead. The only reason that company "nearly" folded was monuments willingness to hire about 10 senior guys like me to rescue what we could - 6 of them quite within a few weeks, but the 4 of us who stayed managed a few core fixes that kept it limping along for enough time to find a buyer for the company before it went under.
Socialism: a lie told by totalitarians and believed by fools.