The problem with that argument is that there is little evidence that it is actually true.
Let’s assume that by “perfect” what we really mean is “much better than most software today”, since no-one has yet worked out how to write literally perfect software. In that case, the difference in cost (by the time you take into account the full lifetime, including maintenance costs) seems to be rather small: it's not that great even during initial development, and the lower maintenance costs make it rather efficient to put more up-front effort into making less buggy code.
For what it’s worth, that’s not really the kind of target I was gunning for. If you’re going to program in Perl, then of course you need a reasonable level of Perl proficiency to understand the code, and your example is hardly esoteric. I would imagine that it is also comfortably within the ability of a typical Perl programmer to comprehend after a little reading, if they haven’t encountered functional style before.
It's much easier to fix a program that does an ugly job of implementing the right solution than one that does a beautiful job of implementing the wrong solution.
OK, but if your code is ugly, will you be sure that it is implementing the right solution, and that you can keep it that way as it evolves?
There are times where you need to eke out every cycle you can
That’s a fair point.
Curiously, though, such code often becomes simpler from some perspective. For example, extreme optimisation sometimes comes down to things like efficient use of pipelines and caches. Achieving these goals might lead to more straight-line code or to a “flatter” memory layout with related data stored together.
Such optimised code might not be the way you would “naturally” write the algorithm, but a quick explanatory comment goes a long way, and surely anyone working on such performance-sensitive code would be familiar with these principles even if they hadn’t seen a particular case before.
Moreover, this all tends to happen at a very low level. I’ve rarely seen a case where the performance-driven hackery couldn’t still be wrapped up in a fairly tight module and present a normal interface to the rest of the code.
Also, is it really possible to write code that "everyone" can understand? Sometimes you have to assume that whoever inherits the code has at least some basic skills, and understand common techniques and terminology.
Sure, of course. I was oversimplifying, perhaps a little too much.
That said, I do think it’s a reasonable goal that code written for a certain project should be accessible with only minimal support to anyone from that project who is likely to read it. That is, while it may sometimes be necessary to deviate from this rule for practical reasons, such deviations should be deliberate and for a specific purpose, and suitable precautions should be taken to make sure any reduction in clarity does not become a liability.
But this dogmatic "sorry, not enough documentation = Big Rewrite" is nonsense.
Well, please notice that this isn’t what I wrote. I talked about a situation where you had lost both the original developers and the knowledge they had. Documentation is one way to pass on that knowledge, but sometimes one of the least effective. If you can still recover it in other ways — for example, if the code is well-written and self-documenting — then you might still be in the maintenance stage of the project rather than servicing.
Also, remember we’re only talking about a relatively small code base here. There probably isn’t a huge gap between making tactical changes and effectively rewriting the part of the system concerned anyway.
Oftentimes - most times? - it's more effective to master and modify than to rewrite.
I agree that a rewrite is very expensive, but there is an implicit assumption in your statement that it is possible to achieve a sufficient level of mastery to make the required modification effectively instead. Maybe you feel that for a 30–40 KLOC program, such a level is always attainable, and perhaps that is correct. But in general, once you’ve lost too much, it becomes very difficult to continue performing a full range of maintenance on the project.
Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.
I always thought clever code was code that everyone could understand, not code that no-one could understand.
It’s like Blaise Pascal’s apology for writing a long letter because he didn’t have the time to make it shorter: it’s often easier to produce some grandiose design that treats anything awkward as a special case than it is to identify a simpler, more consistent underlying concept and then write simpler code to model that.
If both the original developers and the knowledge they had have been lost, then it is probably already too late to perform any major maintenance on this code base. The project has already entered its “servicing” stage.
At that point, you basically have two possible approaches that actually work: you can restrict maintenance to small-scale changes, which may be sufficient if the goal is just to keep the project ticking over for a while, or you can accept The Big Rewrite (which isn’t so big in this case) in order to get a project that can be properly maintained.
If you want to go down the tactical changes path, there are a couple of approaches to finding your way around the code.
If you’re familiar with the general field of the software, just not this particular code, then you can work top-down. Start with the key, high-level concepts you know the program implements, and try to find the code that represents those:
Look at things like file names and directory structure (often a good starting point, because these tend to reflect the original design/intent behind the code).
Get a tool like Doxygen to draw some graphs of the relationships between functions/classes in the code, and chances are the big clusters of related code will match some of the concepts you’re trying to find.
Just search the code base for key words from the problem domain. Look for functions/modules/classes named after them, or that refer to them often.
Hopefully, if the code has a reasonable modular design and you just don’t know what it is yet, this sort of approach will identify the organisation of the code at a very coarse level, but then you can try to break down each area in more detail the same way.
Alternatively, you can work bottom-up. Find a significant starting point, such as:
somewhere that generates some output you’re interested in
somewhere that throws an exception or trips an assertion relevant to a bug you’re trying to fix
a busy spot when you run the program through a profiler.
Examine the code near that point. Look at what kinds of data it works with. Look at what functions it calls, and what functions call it. Try to figure out the wider significance of the code you started with, and the other code to which it relates. Then move up a level: what is the purpose of all of that code collectively? Repeat until you’ve explored as far as you need to.
After some other discussions about these topics, I recently wrote up a couple of articles with some more background information than I’ve given here — link in my sig if anyone’s interested (though be warned that they are pretty long).
I've heard one person after another argue about how to comment and how much to comment, but what I've never seen is any kind of serious study attempting to measure what actually works best.
FWIW, in studies of the problems developers encounter in practice, one recurring difficulty is working out the motivation behind a given piece of code. This supports advice to comment why the code is written the way it is. See, for example:
LaToza, Thomas D., et al, Maintaining Mental Models: A Study of Developer Work Habits, in ICSE '06: Proceedings of the 28th international conference on Software Engineering
Ko, Andrew J., et al, Information Needs in Collocated Software Development Teams, in ICSE '07: Proceedings of the 29th international conference on Software Engineering
Both papers were coauthored by Robert DeLine and Gina Venolia of Microsoft Research.
In LaToza, 66% agreed that "understanding the rationale behind a piece of code" was a serious problem for them.
In Ko, "Why was the code implemented this way?" was the second most frequently unsatisfied information need among the developers observed.
You cant afford perfect software.
The problem with that argument is that there is little evidence that it is actually true.
Let’s assume that by “perfect” what we really mean is “much better than most software today”, since no-one has yet worked out how to write literally perfect software. In that case, the difference in cost (by the time you take into account the full lifetime, including maintenance costs) seems to be rather small: it's not that great even during initial development, and the lower maintenance costs make it rather efficient to put more up-front effort into making less buggy code.
For what it’s worth, that’s not really the kind of target I was gunning for. If you’re going to program in Perl, then of course you need a reasonable level of Perl proficiency to understand the code, and your example is hardly esoteric. I would imagine that it is also comfortably within the ability of a typical Perl programmer to comprehend after a little reading, if they haven’t encountered functional style before.
It's much easier to fix a program that does an ugly job of implementing the right solution than one that does a beautiful job of implementing the wrong solution.
OK, but if your code is ugly, will you be sure that it is implementing the right solution, and that you can keep it that way as it evolves?
There are times where you need to eke out every cycle you can
That’s a fair point.
Curiously, though, such code often becomes simpler from some perspective. For example, extreme optimisation sometimes comes down to things like efficient use of pipelines and caches. Achieving these goals might lead to more straight-line code or to a “flatter” memory layout with related data stored together.
Such optimised code might not be the way you would “naturally” write the algorithm, but a quick explanatory comment goes a long way, and surely anyone working on such performance-sensitive code would be familiar with these principles even if they hadn’t seen a particular case before.
Moreover, this all tends to happen at a very low level. I’ve rarely seen a case where the performance-driven hackery couldn’t still be wrapped up in a fairly tight module and present a normal interface to the rest of the code.
Also, is it really possible to write code that "everyone" can understand? Sometimes you have to assume that whoever inherits the code has at least some basic skills, and understand common techniques and terminology.
Sure, of course. I was oversimplifying, perhaps a little too much.
That said, I do think it’s a reasonable goal that code written for a certain project should be accessible with only minimal support to anyone from that project who is likely to read it. That is, while it may sometimes be necessary to deviate from this rule for practical reasons, such deviations should be deliberate and for a specific purpose, and suitable precautions should be taken to make sure any reduction in clarity does not become a liability.
But this dogmatic "sorry, not enough documentation = Big Rewrite" is nonsense.
Well, please notice that this isn’t what I wrote. I talked about a situation where you had lost both the original developers and the knowledge they had. Documentation is one way to pass on that knowledge, but sometimes one of the least effective. If you can still recover it in other ways — for example, if the code is well-written and self-documenting — then you might still be in the maintenance stage of the project rather than servicing.
Also, remember we’re only talking about a relatively small code base here. There probably isn’t a huge gap between making tactical changes and effectively rewriting the part of the system concerned anyway.
Oftentimes - most times? - it's more effective to master and modify than to rewrite.
I agree that a rewrite is very expensive, but there is an implicit assumption in your statement that it is possible to achieve a sufficient level of mastery to make the required modification effectively instead. Maybe you feel that for a 30–40 KLOC program, such a level is always attainable, and perhaps that is correct. But in general, once you’ve lost too much, it becomes very difficult to continue performing a full range of maintenance on the project.
Nothing like being handed a steaming plate of spaghetti and hearing about how much of a "genius" its creator was.
I always thought clever code was code that everyone could understand, not code that no-one could understand.
It’s like Blaise Pascal’s apology for writing a long letter because he didn’t have the time to make it shorter: it’s often easier to produce some grandiose design that treats anything awkward as a special case than it is to identify a simpler, more consistent underlying concept and then write simpler code to model that.
If both the original developers and the knowledge they had have been lost, then it is probably already too late to perform any major maintenance on this code base. The project has already entered its “servicing” stage.
At that point, you basically have two possible approaches that actually work: you can restrict maintenance to small-scale changes, which may be sufficient if the goal is just to keep the project ticking over for a while, or you can accept The Big Rewrite (which isn’t so big in this case) in order to get a project that can be properly maintained.
If you want to go down the tactical changes path, there are a couple of approaches to finding your way around the code.
If you’re familiar with the general field of the software, just not this particular code, then you can work top-down. Start with the key, high-level concepts you know the program implements, and try to find the code that represents those:
Hopefully, if the code has a reasonable modular design and you just don’t know what it is yet, this sort of approach will identify the organisation of the code at a very coarse level, but then you can try to break down each area in more detail the same way.
Alternatively, you can work bottom-up. Find a significant starting point, such as:
Examine the code near that point. Look at what kinds of data it works with. Look at what functions it calls, and what functions call it. Try to figure out the wider significance of the code you started with, and the other code to which it relates. Then move up a level: what is the purpose of all of that code collectively? Repeat until you’ve explored as far as you need to.
After some other discussions about these topics, I recently wrote up a couple of articles with some more background information than I’ve given here — link in my sig if anyone’s interested (though be warned that they are pretty long).
I've heard one person after another argue about how to comment and how much to comment, but what I've never seen is any kind of serious study attempting to measure what actually works best.
FWIW, in studies of the problems developers encounter in practice, one recurring difficulty is working out the motivation behind a given piece of code. This supports advice to comment why the code is written the way it is. See, for example:
Both papers were coauthored by Robert DeLine and Gina Venolia of Microsoft Research.
In LaToza, 66% agreed that "understanding the rationale behind a piece of code" was a serious problem for them.
In Ko, "Why was the code implemented this way?" was the second most frequently unsatisfied information need among the developers observed.