Study: Refactoring Doesn't Improve Code Quality
itwbennett writes: A team of researchers in Sri Lanka set out to test whether common refactoring techniques resulted in measurable improvements in software quality, both externally (e.g., Is the code more maintainable?) and internally (e.g., Number of lines of code). Here's the short version of their findings: Refactoring doesn't make code easier to analyze or change (PDF); it doesn't make code run faster; and it doesn't result in lower resource utilization. But it may make code more maintainable.
...is pretty important, and you should refactor when needed if only just for that. It'll spread all over rest of the code in many ways, in good ways.
the game
How any anyone say, or write, that refactoring doesn't make code easier to analyze or change, and then follow up with it can make it more maintainable? Also, who in the world ever though that refactoring would make code run faster?
We gave random medicines to groups of random people, and there was no statistical improvement in their health. Some people became healthier, but many people actually became ill.
Isn't the very *definition* of making code more "maintainable" that it makes the code "easier to analyze or change"?
Car repair does not make car faster, nor more comfortable.
The test case basically converted procedural/structural code (structs and test cases) to object oriented code (classes and polymorphism) for a small, 4,500 line project. What they basically added was extensibility at the expense of overhead and traded individual-line complexity with architectural complexity.
Yeah. The conclusions are nonsense piled on more nonsense. Plus it is plain bullshit. Imagine I only refactor by removing duplicated code across functions or different compilation units. Will the compiled code size become smaller? Yep. Will be easier to read (less LOC to read)? Yep. Will it be more maintainable? Of course you have less code to bother with.
My thoughts exactly. More maintainable code IS higher quality code, in my opinion.
Making code run faster has a completely different name, it's called optimization (and is frequently the root of all evil). And it often involves the exact opposite of things you do when refactoring. Eg, unrolling a loop to make it run faster is pretty much the exact opposite of refactoring for maintenance & readability.
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
I wouldn't call that study publish worthy.
It certainly isn't statistically significant. 4,500 lines of C# code is nothing. I work with systems that have millions of lines of code. I've seen single class files that have thousands of lines of code (and vomited when I saw it). An important question here would be whether the volume of code in a system is a significant factor in the value of refactoring.
Based on their own statistics the refactoring was poorly done. Their result was more code, more complexity, and more coupling. Certainly not the work I would expect from an experienced software developer, but certainly something I would expect to see from undergraduate students who don't fully understand what they are doing.
I think the last sentence in the actual study sums it up pretty well - "Furthermore, it would be better that the same
experimental setup can be executed in industry environment with the industry experts and with
the industry level matured source code."
I've seen the before-and-after when crap code was rewritten and refactored by hand by a good coder.
The improvement was huge.
Was it better than if the same coder wrote the code "from scratch" from the problem-description or design document? I don't know, but my point is that crap can be turned into gold by a good coder, and that refactoring can be part of the cleanup.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Nope, it's when I take the awful, unmaintainable spaghetti code someone else produced when they were in a deadline crunch and convert it into something maintainable.
Sigh... I wish I could say that with a straight face.
Interestingly, in my experience, poorly structured code seems to come about often less often because of "rushed code" but instead a lack of foresight in the original structure of a system to deal with continuously evolving features (which happens in most projects), along with a lack of willingness to refactor those systems as soon as it's apparent it's starting to break down.
This is the "golden time" to refactor code, because it's just now become apparent where the structural flaws are in the architecture, but it's still early enough to refactor without causing a significant amount of pain. It's often hard to justify, because you've only got a couple of ugly special cases that complicate things here and there. However, if you procrastinate too long, you're going to start piling on more and more "ugly special cases", and the code is going to get harder and harder to read and maintain.
Irony: Agile development has too much intertia to be abandoned now.
It needs a lot more qualifiers than that.
For a start, as with an unfortunate number of academic studies, it appears that the sample population consisted of undergraduates and recent graduates. That alone completely invalidates any conclusions as they might apply to experienced professionals with better judgement about when and how to use refactoring techniques.
Even without that, there seem to be a number of fundamental concerns about the data.
One obvious example is that they consider lines of code to be a metric that tells you anything useful beyond the width you need to allow for the line number margin in your text editor. I doubt most experienced programmers would agree that a LOC count in isolation tells us anything useful about maintainability or that the mere fact that LOC went up or down after a change necessarily meant the code had become better or worse in any useful sense.
Another concern is that they talk about "analysability", but this seems to be measured only by reference to a brief examination of a small code base in one of two versions, unrefactored and refactored. I'd like to know what the actual code looked like before I read anything at all into that data -- what refactoring was performed, what was the motivation for each change, and how do they know those two small code bases were representative of either refactoring in general or the effectiveness of refactoring on larger code bases or code bases that developers have more time to study and work with?
I'm all for empirical data -- goodness knows, we need more objective information about what really works in an industry as hype-driven and accepting of poor quality as ours -- but I'm afraid this particular study seems to be so flawed that it really tells us very little of value.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
10,000 line functions are shockingly common in industry. Shit grows over time, and is so poorly written that you can't safely refactor it, and management lacks the balls to let you clean it up, so it just festers and festers.
I hear PayPal had 90% of their processing business logic in a single, multi-million-line class! Thankfully, I don't know that one first hand.
Socialism: a lie told by totalitarians and believed by fools.
About 5600 lines. However, because it was a glorified case statement, you were really only debugging a single case at a time, each of which was about the length of a sane function, so splitting it into functions would do little to improve readability. I like to trot out that example to terrify people, but the function itself was really quite sane and easy to maintain.
You did, however, have to fully understand the state machine as a whole, which in total was almost twenty kloc, had almost 200 instance variables in the state object, and leaned heavily on a tree object with about 30 instance variables. That's the point at which most people's heads exploded.
Either way, 4,500 lines is the size of a fairly straightforward iOS app. Most folks can dig into that and figure out enough to maintain it without spending a huge amount of time, even if the organization isn't ideal. When you hit tens of thousands of lines, that's where you have to start thinking about how you organize it and document it, because with such large projects, if you jump into the middle without a complete picture, you're likely to be hopelessly lost.
Check out my sci-fi/humor trilogy at PatriotsBooks.
Of course part of the reason is that I refactor as I write.
I'm not sure what that has to do with writing "badly formatted code", but I'd still caution against that. Some of the best advice I ever received: "There are no good writers, only good rewriters." I've found this to be true for code as well. It's amazing how much you can improve your code once you've distanced yourself from it a bit.
Required reading for internet skeptics