Study: Refactoring Doesn't Improve Code Quality

← Back to Stories (view on slashdot.org)

Study: Refactoring Doesn't Improve Code Quality

Posted by Soulskill on Tuesday March 3, 2015 @11:38AM from the just-makes-you-hate-it-a-bit-less dept.

itwbennett writes: A team of researchers in Sri Lanka set out to test whether common refactoring techniques resulted in measurable improvements in software quality, both externally (e.g., Is the code more maintainable?) and internally (e.g., Number of lines of code). Here's the short version of their findings: Refactoring doesn't make code easier to analyze or change (PDF); it doesn't make code run faster; and it doesn't result in lower resource utilization. But it may make code more maintainable.

19 of 247 comments (clear)

Min score:

Reason:

Sort:

Maintainable... by pokoteng · 2015-03-03 11:43 · Score: 5, Insightful

...is pretty important, and you should refactor when needed if only just for that. It'll spread all over rest of the code in many ways, in good ways.

--
the game
1. Re:Maintainable... by MrBigInThePants · 2015-03-03 12:36 · Score: 5, Insightful
  
  The summary is badly worded with a weird bias.
  
  For most software projects maintainability is THE most important thing for TCO (over 90% as per the article) and thus the MOST important thing. Also I find it hard to believe for your average "REAL" project (i.e. far more than 4.5k of code) changeability and maintainability are not intertwined. Any study arguing otherwise needs it methodology closely inspected.
  
  Technical Debt is real, obvious and accumulates exponentially with the amount of code involved over time. This is what we are talking about here when we talking about being able to change it and maintain it. There is lots of research on this and any experienced enterprise developer will have seen this in action.
  
  MAJOR problems with this study:
  - They used Students. If you don't know why this is bad there is no hope for you.
  - 4500 is barely a code base at in the real world.
  - Debt accumulation is worst over long periods of time and many iterations/changes. This is not commented on at all when describing the example. (NB: From my speed read)
  
  So take this all with a grain of salt. This is a very limited academic paper and not at all definitive or real world applicable in of itself...
2. Re:Maintainable... by monkeyzoo · 2015-03-03 12:47 · Score: 5, Insightful
  
  The study sounds like nonsense (at least as presented in this post).
  
  Refactoring doesn't make code easier to analyze or change.... But it may make code more maintainable.
  What is code maintenance, if not analyzing and changing the code??!?!
  Does code in Sri Lanka need to have its oil changed and tires rotated?!
3. Re:Maintainable... by MrKaos · 2015-03-03 12:51 · Score: 3, Insightful
  
  ...is pretty important, and you should refactor when needed if only just for that. It'll spread all over rest of the code in many ways, in good ways.
  Exactly, and that good way is reliability which is something I observe the study doesn't measure so whilst it's good to challenge the current wisdom, there seems to be a few holes here.
  First up I don't think 4500 lines of code is a good was to asses the interaction complexity of applications where the codebase exceeds 10 or 20 times that number. Second I may write a functional prototype of code knowing full well that I or someone else will refactor later when we have a better idea of how things are functioning.
  Unexpected failure modes are going to exist in the software. The whole point of doing things the 'Agile' way is to provide incremental improvement so things get better. The paper talks of XP but what if you are using DSDM instead of programming pairs, in that case you are *expecting* to refactor often as you explain the domain or new concepts are introduced. That's not scope creep, it's being honest and admitting you don't know everything.
  In my experiences the most powerful concept is the vocabulary you build as you begin to understand the domain better. I've found refactoring is the opportunity to 'put things in the right place' to define the vocabulary which makes things easier on myself and my colleagues a year or two later when someone asks if they can have this extra feature. Sure we should be using certain design patterns when implementing code from the beginning however I'm certain I'm not alone in confronting a codebase and wondering why certain methods are implemented in the controller instead of an information expert and spending the next week refactoring to avoid peoples heads exploding when methods are duplicated...but they don't work the same.
  that's my 2 cents...
  
  --
  My ism, it's full of beliefs.
Easier to Analyze or Change == More Maintainable by medv4380 · 2015-03-03 11:46 · Score: 5, Insightful

How any anyone say, or write, that refactoring doesn't make code easier to analyze or change, and then follow up with it can make it more maintainable? Also, who in the world ever though that refactoring would make code run faster?
New study shows taking medicine is ineffective. by TsuruchiBrian · 2015-03-03 11:47 · Score: 5, Insightful

We gave random medicines to groups of random people, and there was no statistical improvement in their health. Some people became healthier, but many people actually became ill.
Whathuh? by neminem · 2015-03-03 11:47 · Score: 4, Insightful

Isn't the very *definition* of making code more "maintainable" that it makes the code "easier to analyze or change"?
car by bmimatt · 2015-03-03 11:53 · Score: 5, Insightful

Car repair does not make car faster, nor more comfortable.
Re:Easier to Analyze or Change == More Maintainabl by cheesybagel · 2015-03-03 12:04 · Score: 4, Insightful

Yeah. The conclusions are nonsense piled on more nonsense. Plus it is plain bullshit. Imagine I only refactor by removing duplicated code across functions or different compilation units. Will the compiled code size become smaller? Yep. Will be easier to read (less LOC to read)? Yep. Will it be more maintainable? Of course you have less code to bother with.
Re:Easier to Analyze or Change == More Maintainabl by Wraithlyn · 2015-03-03 12:05 · Score: 4, Insightful

My thoughts exactly. More maintainable code IS higher quality code, in my opinion.
Making code run faster has a completely different name, it's called optimization (and is frequently the root of all evil). And it often involves the exact opposite of things you do when refactoring. Eg, unrolling a loop to make it run faster is pretty much the exact opposite of refactoring for maintenance & readability.

--
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
Not significant - ignore by jareth · 2015-03-03 12:16 · Score: 5, Insightful

I wouldn't call that study publish worthy.
It certainly isn't statistically significant. 4,500 lines of C# code is nothing. I work with systems that have millions of lines of code. I've seen single class files that have thousands of lines of code (and vomited when I saw it). An important question here would be whether the volume of code in a system is a significant factor in the value of refactoring.
Based on their own statistics the refactoring was poorly done. Their result was more code, more complexity, and more coupling. Certainly not the work I would expect from an experienced software developer, but certainly something I would expect to see from undergraduate students who don't fully understand what they are doing.
I think the last sentence in the actual study sums it up pretty well - "Furthermore, it would be better that the same
experimental setup can be executed in industry environment with the industry experts and with
the industry level matured source code."
Needs a larger sample size by CODiNE · 2015-03-03 12:17 · Score: 3, Insightful

The researchers selected a small-scale application (about 4,500 lines of C# code) used by the academic staff at the University of Kelaniya for scheduling events and managing online documents for evaluation.
That's hilarious, I have web apps (I'm stuck with) having individual pages larger than that, including tons of other crap. Refactoring allows following the DRY principle and removing duplicated code. It allows moving SQL statements all the heck over the place into single places where they can easily be tested and updated when bugs are found.
They're basically working with a program that's not really that awful in the first place and making it a little bit nicer. How about starting with absolute junk and making it useable? Unmaintainable code is a consequence of technical debt, refactoring pays that debt down and keeps things manageable. Sure you may not need to refactor right now, but taking the time to do it once in a while keeps things from getting out of control.

--
Cwm, fjord-bank glyphs vext quiz
Re:Refactoring done right happens as you go by TsuruchiBrian · 2015-03-03 12:24 · Score: 2, Insightful

I'm not sure I can trust the coding advice from a person who thinks all the predictions in the bible have been, or will be proven to be 100% true (meanwhile claiming all other belief systems have been proven to be fakes). There are a lot of "religious" programmers out their who are sure that their preferred language, design pattern, or style is the only good one, and all others are terrible, even without bringing in actual religion.
If a scientist were to tell me he is 100% sure that the planes that crashed on 9/11 where flown by the US government via remote control, and that the people that were supposed to be on those planes are held in some secret facility, I might question their ability to think critically. It's possible that their is really good evidence to support this, and I've just never seen it, but I think it is far more likely that this scientist has horrible intuition and is probably a terrible scientist as a result.
The paper is BS... by Anonymous Coward · 2015-03-03 12:25 · Score: 2, Insightful

There are so many problems with that study.
First, they use C#. There is no reason to think that it's not language independent.
Second, All the code was from one code base with 4500 lines! How can you extract anything statistically significant from basically 1 data point!
Third, supposedly 10 canonical re-factoring techniques were used. Could it be that these re-factoring techniques are useless? Of course, they are not discussed at all in the article. We don't know what re-factoring techniques they used (out of a big set from a different paper).
Fourth:
In order to apply 10 refactoring techniques a small scale project with bad smells was selected as
the source code. The selected application was a system which was implemented in the
Department of Industrial Management, University of Kelaniya and used by academic staff at the
department to schedule their personal and professional events and to manage their online
documents repository. The source code contained around 4500 lines of codes. The relevant bad
smells were identified and all the selected refactoring techniques were applied to the source code.
What?? They sniffed the code??? Makes absolutely no sense. I'd say the authors are idiots...
Oh yea, neither of the authors have background in CS. They're MIS (i.e. trained to be PHB).
Improving crap code by davidwr · 2015-03-03 12:35 · Score: 5, Insightful

I've seen the before-and-after when crap code was rewritten and refactored by hand by a good coder.
The improvement was huge.
Was it better than if the same coder wrote the code "from scratch" from the problem-description or design document? I don't know, but my point is that crap can be turned into gold by a good coder, and that refactoring can be part of the cleanup.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Re:Easier to Analyze or Change == More Maintainabl by Dutch+Gun · 2015-03-03 12:44 · Score: 4, Insightful

Nope, it's when I take the awful, unmaintainable spaghetti code someone else produced when they were in a deadline crunch and convert it into something maintainable.
Sigh... I wish I could say that with a straight face.
Interestingly, in my experience, poorly structured code seems to come about often less often because of "rushed code" but instead a lack of foresight in the original structure of a system to deal with continuously evolving features (which happens in most projects), along with a lack of willingness to refactor those systems as soon as it's apparent it's starting to break down.
This is the "golden time" to refactor code, because it's just now become apparent where the structural flaws are in the architecture, but it's still early enough to refactor without causing a significant amount of pain. It's often hard to justify, because you've only got a couple of ugly special cases that complicate things here and there. However, if you procrastinate too long, you're going to start piling on more and more "ugly special cases", and the code is going to get harder and harder to read and maintain.

--
Irony: Agile development has too much intertia to be abandoned now.
Re:on *average* by Anonymous+Brave+Guy · 2015-03-03 13:05 · Score: 5, Insightful

It needs a lot more qualifiers than that.
For a start, as with an unfortunate number of academic studies, it appears that the sample population consisted of undergraduates and recent graduates. That alone completely invalidates any conclusions as they might apply to experienced professionals with better judgement about when and how to use refactoring techniques.
Even without that, there seem to be a number of fundamental concerns about the data.
One obvious example is that they consider lines of code to be a metric that tells you anything useful beyond the width you need to allow for the line number margin in your text editor. I doubt most experienced programmers would agree that a LOC count in isolation tells us anything useful about maintainability or that the mere fact that LOC went up or down after a change necessarily meant the code had become better or worse in any useful sense.
Another concern is that they talk about "analysability", but this seems to be measured only by reference to a brief examination of a small code base in one of two versions, unrefactored and refactored. I'd like to know what the actual code looked like before I read anything at all into that data -- what refactoring was performed, what was the motivation for each change, and how do they know those two small code bases were representative of either refactoring in general or the effectiveness of refactoring on larger code bases or code bases that developers have more time to study and work with?
I'm all for empirical data -- goodness knows, we need more objective information about what really works in an industry as hype-driven and accepting of poor quality as ours -- but I'm afraid this particular study seems to be so flawed that it really tells us very little of value.

--
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
Study: This Study Is BS by engineerErrant · 2015-03-03 13:11 · Score: 3, Insightful

Making a judgment about "refactoring" as a single, simplistic concept is like making a judgment about "food" or "government" without going into any further detail. Umm, it's kind of not that simple.
Refactoring in real life is a whole array of different, nuanced activities. Any of them can be wise or foolish depending on the situation. Well-written code requires less of it, but some degree of it will always be needed as we can't tell the future. Each instance is a judgment call with no concrete right answer.
Re:Well DUH... by readin · 2015-03-03 17:31 · Score: 3, Insightful

The compiler is what makes the final decision on the code, not the programmer.
Just refactoring a poor algorithm will still result in a poor performance, though the code might look better.
The whole point of refactoring in most cases is to make the code look better, not to improve performance. Very rarely do people change an algorithm as part of refactoring. The goal is instead to make the code easy to understand, easy to change, and easy to debug. Creating consistent names and API patterns, re-ordering statements to group them based on the task they're performing, finding code that is the same in many different methods and moving it into its own method, reducing dependencies between classes.... these are common refactoring techniques and have almost nothing to do with performance.

--
I often don't like the choices people make, but I like the fact that people make choices. That's why I'm a conservative.