Comments are More Important than Code
CowboyRobot writes "I was going through some code from 2002, frustrated at the lack of comments, cursing the moron who put this spaghetti together, only to realize later that I was the moron who had written it.
An essay titled Comments Are More Important Than Code goes through the arguments that seem obvious only in hindsight - that 'self-documenting' code is good but not enough, that we should be able to write code based on good documentation, not the other way around, and that the thing that separates human-written code from computer-generated code is that our stuff is readable to future programmers.
But I go through this argument with my colleagues, who say that using short, descriptive variable names 'should' be enough as long as the code is well-organized.
Who's right?"
I used to grade student's code as a TA at my university, and I'll tell you what is more annoying than NO comments, this:
/* print "Encrypt message..." to the console */
printf("Encrypt message...");
and then followed by about 150 lines of uncommented spaghetti code
____
~ |rip/\/\aster /\/\onkey
Has he read the ones here?
Before any liberals are tempted to mod up one of my comments, a word of warning: I'm actually making fun of you.
...where comments are more important than articles.
Take off every sig. For great justice.
This works for code I write that nobody else will ever maintain. Even then I can get tripped up, I'll have to lean back in my chair and try to remember what I was thinking when I wrote the code.
But if you write code you're getting paid for, or code for an organization, anything but personal stuff, write good comments. Variable names might give a good idea about what data the variable holds, but it does not tell us much about how it is used.
When I took my first programming class, the most frustrating part was the documentation. I thought it was retarded and stupid and a waste of time. But now I realize it is very important once you write something more significant than "Hello World".
Rosco: "If brains were gunpowder, Enos couldn't blow his nose."
In my opinion, comments are useful -- but literate programming is where it's at if you're looking for the best way to document your code.
Knuth did a lot of work in the area -- if I remember correctly, all of the sources to TeX are written in a style understood by the "web" literate programming tool.
There was also a good article by one of the Perl folks (Nathan Torkington? M.J. Dominus? Chromatic? I can't remember.) on POD, and how although POD wasn't literate programming, it was still useful. That article was great in that it showed a middle ground that may be more palatable to your non-LP-fanatic programmer.
That being said, I prefer full-on LP for large projects.
unixkb.com -- articles on practical Unix issues.
i do a lot of code reviews at work and nothing makes me happier than good comments.
but just putting a bunch of blocks of comments that are like "get customer", "build record", etc are basically useless. If you use programming by intent then its more or less obvious that the code
GetCustomerFromDatabase(foo)
is going to do something with a database and get a customer. a comment telling me that is useless.
what i like to do is write a few paragraphs of text at the top of a function. it explains my general approach, why im doing certain things the way i am, why im not doing other things, and why the function even exists.
essentially the comments should be enough that anyone that knew the problem space ought to be able to read them and come up with more or less a similar implementation.
then, in the body of the method anytime i do something that i feel isn't completely obvious, i put a 1 or 2 liner infront of, i.e. "we have to do this because zergs are sensitive"
the end result of all of this is that code reviews can see what you were thinking at the time the code was written.. and you're documenting assumptions about the problem, the apis, your understanding of the language, etc, all right in the code. it makes finding errors pretty easy.. someone that can't even read source code can read the comments and get an idea of the correctness of your approach.
My opinions are my own, and do not necessarily represent those of my employer.
"Don't get suckered in by the comments -- they can be terribly misleading. Debug only code."
--Dave Storer
I want a new world. I think this one is broken.
Variable names should be proportional to the size of their scope within the code.
"There's a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
0 69.html
It's harder to read code than to write it."
From Joel on Software
http://www.joelonsoftware.com/articles/fog0000000
Always comment.
I hope that after I die the one word people use to describe me is "resurrected."
If the comments are clear, a better programmer than you can come along later and say "What the hell was this guy doing?" and then replace your lines of fumbling crap with much cleaner/clearer code.
It's the difference between being able to see what you were trying to do vs. figuring out what you actually did.
Call it "Intent Oriented Programming" if you want.
Simple Machines in Higher Dimensions
Comments are a maintenence nightmare. They get out of sync with the code, and (especially when the code is bad to begin with) people read them instead of the code.
That means over time the human's understanding of what the program does starts to diverge from the computer's understanding.
This is not good.
If something is too hard to understand the way it is written without comments, it should be rewritten. You will save time in the long run, trust me.
Remember the old adage: Don't get suckered in by the comments--the bug is in the code.
--MarkusQ
UsngShrtCmtsIsOftNotEnghAsOneMyNdToReWrtShtInTheFt r.
The problem is usually functions that are too long and are not orthogonal. Write short, orthogonal functions and you'll see your need for heavy commenting go away along with the need for long variable names.
JEF RASKIN, professor of computer science at the University of Chicago, is best known for his book, The Humane Interface (Addison-Wesley, 2000), and for having created the Macintosh project at Apple. He holds many interface patents, consults for companies around the world, and is often called upon as a speaker at conferences, seminars, and universities. His current project, The Humane Environment (http://humane.sourceforge.net/home/index.html), is attracting interest in both the computer science and business worlds.
For those who don't know (which apparently includes whoever is in charge of the linked article), Jef Raskin passed away this february. You can view the official press release, or read more about his contributions to computer science. I don't know when the article was written, but it seems it should mention that Raskin has passed away. In any case, his advice about commenting is good, just as his advice on user-interface design has always been lucid and helpful.
Imagine my relief when I came upon a helpful comment:
All it took was one comment to put my mind to rest: no, it's not just me being stupid in the present. This code seemed this terrible back then, too.
Comments save the day once again.
The above would be far more useful like this:Now the intent of the method is clear, and anyone coming along who wonders why it's hard coded will know under what circumstances they should change it to a formula (namely, if MyClass becomes capable of meeting the Magilla criteria).
Comments can be good, but they should always be a guidepost to the intent of a block of code, and not attempt to explain how the code achieves that goal.
Stop-Prism.org: Opt Out of Surveillance
Of course, there are always exceptions. When I was writing low-level code that manipulated hardware registers, I wrote a multi-line comment before each line of bit-fiddling code, complete with what the code did and a cross-reference into the hardware manual. Something like:
What a fool believes, he sees, no wise man has the power to reason away.
Now with that out of the way, here's my philosophy on variable names... Every variable name should be as long as necessary to describe what the variable is. Having said that, the shorter, the better. If you have a lot of long variable names, then you probably have not found the most elegant solution to your problem.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
I find that for simple code that requires little thought, little in the way of comments is required. Sometimes comments just get in the way as the code progresses over time, and the comments don't. It's better to code with long descriptive variable names, and structures then depend on comments.
//Customer ID structure // ID number for the customer // Agent Name // Customer Name
For example which would you rather read.
#define AgentNameLen 41
#define CustomerNameLen 51
struct CustomerID
{
int CustomerID;
char AgentName[AgentNameLen];
char CustomerName[CustomerNameLen];
};
or
struct CoID
{
int ID;
char AName[41];
char CName[51];
};
To me the first is much more clear, and throughout the code will be obvious as to it's use. The second is mildly clear, but will degrade as new things get added to the structure.
The first also allows things like
for(int i=0;iAgentNameLen;i++)
which makes it very clear that you are iterating throught the agent name.
Algorithm's should be documented, as much as possible, at the top of the function, and any function that takes more then a screen should be looked at to see if it can be broken up into smaller functions.
(Almost) Always code towards maintainability, never speed. Usually it pays off within the year.
So to sum up. In order of importance:
80% code clearly with an eye towards maintainablity
20% comment
Don't laugh. As an old school COBOL programmer in my youth, documentation was everything. Now that other languages have come along, you really need to document what you do, you can't just be lazy and not be willing to explain what your code does, This is what is happening to every program tha I see (C++. and M). Poor documentation, poorly written code, and very little commentary on what the particular function does.
My spin on this, your are a very poor communicator to your co-workers if you can not put a few simple words to a function in your program.
Having RTFA, I can see what he's trying to get at, but as someone who has (unfortunately) found himself spending most of my 10 year career in programming cleaning up other people's poop. At first I thought it was because I must have done something wrong that I kept ending up being assigned this work, but as I came to realise, it was because I make the code better than I found it and I have a knack for fixing stuff other people give up on. I also had silly managers who assign work to the people least qualified to do it.
At any rate, some observations:
1. 20 lines of comments "documenting" your code before you write it (or even after you write it) is far less useful than writing the code COHERENTLY and CORRECTLY in the first place.
Last month, I had a 1200 (yes 1,200) line method with huge blocks of documentation before big pieces of code. I still can't quite tell you what it thought it was doing. The code was a for loop wrapping around code to handle 3 different and mutually exclusive situations. Instead of identifying which of the 3 situations it was and creating a method for each situation, the person just stuffed it all in with lots of comments documenting everything the article's author said. The code was still unmaintainable.
2. Comments are useless unless they are kept up to date
Part of the reason that code was so difficult to figure out was because most of those big verbose documentation comments referred to a completely different implementation. After the programmer had written the first case, she encountered some other bad cases and eventually had to completely change a block of code embedded in this 1200 line for-loop. The code was now correct, but the comments no longer had anything to do with that block of code.
3. Don't be clever when you can be clear
I have made a solemn vow to hunt down and hurt anyone who puts "clever" code in my project. I am so sick of trying to figure out what some obfuscated piece of code in C, C++ or even SmallTalk is doing. And find out it was just a "clever" way of doing something pretty straight forward like iterating over a list. There was no speed gain from the clever trick, and the code wasn't even a bottleneck to begin with. *sigh*
4. If you don't know how to solve the problem, write some experiment code in a separate app to figure it out. Then take the time to do the "right" thing in the production code.
3 days from final for a video game. The CD streaming library for the Sony Playstation was making this strange "hic-up" sound at rare moments. By this time, the original author of this code has long since gone to another company. So I plunge into the code and found that the original programmer didn't know how to write streaming code so he created this hack of a hack of a hack of a test (ad nauseum). The code was programmed by accident, not design. No amount of comment before coding could help this. If the author had dumped the code, wrote documentation describing everything he learned then wrote the code, things would have been a lot better.
5. Unrelated to comments, but use variable names that make sense. Don't name them arbitrarily or to amuse yourself!
That CD sound streamer code I mentioned above used quirky names for variables. Can you tell what "little_ninja" is supposed to be just from the name? When I confronted the coder about this quirk of his (in another library he wrote), he got all huffy and didn't understand why people didn't appreciate his little puzzles or his sense of humor. It galls me he still earns a paycheck in the industry.
The bitter lessons of a veteran coder: http://bitterprogrammer.blogspot.com
I kid you not, this is real code my supervisor writes.
Note that this is matlab code, where commas are both an end-of-statement indicator (it's also possible to use just a semicolon), and an array index separator. The nice thing about this code, is that I can at least guess where most of the variable names come from. Oh, and there was *no* line break in the original code. Hooray for the avoidance of those wasteful '/n' characters?!?!
To answer the original poster: yes, comments are of vital importance.
Then again, if a program is structured right, things can be organized into sections, each of which is then commented, as opposed to a bunch of seemingly random lines with comments spaced throughout. Sometimes the layout of code, in conjunction with good variable names, is the best possible method of commenting it. The one thing comments are good for is to assure that someone not familiar with the particular language being used, will still understand the purpose of the code.
In "Literate Programming" the comments are all important and the code itself is trivialized. The code, as Jef told me, is like "raisons in the muffin of the comments." There are paragraphs of verbiage which might go on about the history of the project, why certain features were discarded, etc., etc, and might not even explain what the following line or two of code was concerned with.
It's really very difficult to deal with code that has been written in this style ("literatized?" ) since the actual structure of the program is severly obscured. It serves as an example of how overdoing a good thing is usually a bad thing.
Jef was a nice guy, and I recommend his book, "The Humane Interface" for its many interesting ideas. His attempt to put them into practice in the Archy project http://rchi.raskincenter.org/aboutrchi/index.php/, was not completed before his death. Even for that, Archy is very close to his vision.
But since Jef was in many ways an extremist, one can demonstrate in Archy his pushing of his concepts to the limit resulted in the end fully maching his goals. Somewhat like "Literate Programming", in fact.
The true useful skill lies in reading sloppy and/or wizardly code. Some people think that they have job security if they write impenetrable code, but then they can just be fired and all their code rewritten. If you can read others' "unmaintainable" code, you enable your employer to save money by not having to rewrite everything the guy they just fired wrote. So they'll want to keep you around as they fire/downsize everyone else. I It doesn't really matter what kind of code you write, since you can read whatever. advise everyone to start reading up on the Obfuscated C Contest, and practice figuring out what evertyhing does. Then you can handle any kind of code thrown at you, and the code you actually write becomes of secondary importance.
My learning experience about commenting code was a difficult one. Like many, while in college I wrote the code and then went back to comment it so the profs were happy.
Then I did a co-op with an automated storage/retreival systems company in their software department. One of the processes involved in a communications system needed some work. The code was licensed from another company in another country. There was no documentation for this communications system. There was very little commenting in the code. Luckily it wasn't in a foreign language. Unluckily it was wrong, apparently the structure of this program was similar to that of another, which was mostly gutted and rewritten, but a few old-program comments survived to be the ONLY comments in the new program.
Sure, the sources could be reverse engineered to provide the documentation required. I did it. It took a few weeks.
After that, I didn't leave comments for last anymore. It's been a good thing. I now work for a semiconductor design company and often write perl scripts or skill-language scripts to automate tedious tasks. I think I'm abou thte only one in the office that comments such scripts in any way. It's nice to read what stuff does when I have to revisit code many months or years later. I hate having to revisit someone else's code because it's nearly guaranteed to be completely barren of anything human-readable.
Listen up kids! Commenting is GOOD! Your professors aren't just being jerks. Learn the easy way and hopefully save yourself a great deal of trouble with your own code. Other people's code will always suck, but your own shouldn't have to.
I had a co-worker who needed a little help with his program. It was C code, but all the semicolons were in column 80, arranged in a nice column. Yikes. He casually said "what, you've never seen that?". Turns out that part of his problem was that he was using float to represent a byte position in an extremely large file. He knew enough that UINT32 wouldn't do it, but ...
HIV Crosses Species Barrier... into Muppets
1. Comment each function
- Function name
- what it does
- parameters
parameter name - what is is for and any restrictions on it (i.e., must not be null)
- return value (all possible return values)
2. Add comments to each file you modify so that over time, the file becomes better documented
3. Add ASSERT() like comments and ASSERT() or equivalent to your code
4. Use dividing comments like a line of dashes to seperate blocks of code
5. Put in a '?' comment for code that you do not understand (good for function headers)
6. Avoid stupid naming schemes for your local variables since that makes it harder to comment
7. Review your code for both logic and comment completeness after you code it before committing it to version control
8. Tag your bug fixes, code enhancements with a comment followed by a dash, date, and your initials. This is essential for large projects or for anything you will be working on for more than 6 months.
9. Format your comments so that multiple line comments line up on the left hand side (increases readability)
10. Don't count on java doc or equivalent as being good enough code documentation.
GRANDPARENT: Especially if you change the code and now the comments are wrong
PARENT: You're incompetent if you don't change the comments to match the code. You're equally incompetent if you come across incorrect comments and leave them in. You're supposed to the job, so do it...
PARENT: As Fred Brooks said, "There is no silver bullet."
A database backend would go a long ways towards providing a silver bullet, i.e. if instead of writing your code to an ASCII text file, you were writing to a document management system that kept doubly linked associations between the lines of code and the comments associated with those lines of code, and if code/comment pairs had dirty bits, so that if you changed one [e.g. the code], then the dirty bit wouldn't get changed to clean until you verified that the other [e.g. the comment] was correct, then that would go a long way towards solving the problem.
I think we are still in the infancy of code/documentation/database integration, however.
I have written programs in both raw and literate programming style, i prefer the latter. In fact, i wrote a literate-program pre-processor to write programs, and it made the program easier to write and more bug-free.
In literate programming, you rely on a pre-processor to make the output production, so you are free to put things together as you want. What this means, is that bracketing code (eg open, close files), can be written in the same block, which are invoked separately.
The main program then ends up looking like a rough scetch, full of commands to include other bits. With wing comments, it is easy to see what is going on.
One uses a folding editor to search for strings like "!topic". This will not show you a consolidated index, but you can use it to also show where you're are, and any missing bugs.
On the main, Jon Bentley's comments on Literate Programming are fair (that is, it creates a good environment for writing single-purpose code), but one needs to consider the context the program is written for.
The form i use was specifically designed to allow all sorts of text-output, so the same file can make as output, eg .CMD, .REX, .TXT and .HTM output, which means that when you run the script you get a perfectly matched set of files, all correctly pointing to each other.
OS/2 - because choice is a terrible thing to waste.
I love telling this story...
// break
Last year I had a brief stint at a small software company that had just taken a project in-house that was developed by an outside contractor. My job was to take the code they'd just inherited (which no one there knew anything about) and add some features to it on a tight schedule. Documentation? What documentation?
The extent of all of the code comments it had was the following (and no, I'm not making this up):
if(...) {
break;
}
If that wasn't bad enough, I knew the original developer personally. She was a former professor of mine and I'd worked for her company only a few months before she had taken that contract.
As someone who has had to deal with code with descriptive names and no comments or docs to go with them: If you write such code, may you rot in the lowest level of hell along with traitors, used car salesmen, and people who answer cell phones during movies.
--GrouchoMarx
Card-carrying member of the EFF, FSF, and ACLU. Are you?
I'd say you're right, comments are more important. Clearly-written code should make how it's doing things obvious, yes. Comments, though, should say what is being done and why it's being done the way it is.
I want to laugh when I read it.
Some of it is funny.
Some of it is just scary.
The human mind can be a freakishly messed up thing.
http://mindprod.com/unmaindocumentation.html
It is part of a larger essay about writting crappy code.
Anybody that even comes close to software development
should check it out.
--- begin excerpts ---
Avoid Documenting the "Obvious" :
writing an airline reservation system, make sure there are at
least 25 places in the code that need to be modified if you were
to add another airline. Never document where they are. People who
come after you have no business modifying your code without
thoroughly understanding every line of it.
Units of Measure :
variable, input, output or parameter. e.g. feet, metres, cartons.
This is not so important in bean counting, but it is very important
in engineering work.
As a corollary, never document the units of measure of any conversion
constants, or how the values were derived.
It is mild cheating, but very effective, to salt the code with some
incorrect units of measure in the comments.
If you are feeling particularly malicious, make up your own unit of
measure; name it after yourself or some obscure person and never
define it. If somebody challenges you, tell them you did so that
you could use integer rather than floating point arithmetic.
On the Proper Use of Design Documents :
complicated algorithm, use the classic software engineering principles
of doing a sound design before beginning coding. Write an extremely
detailed design document that describes each step in a very complicated
algorithm. The more detailed this document is, the better.
In fact, the design doc should break the algorithm down into a hierarchy
of structured steps, described in a hierarchy of auto-numbered individual
paragraphs in the document. Use headings at least 5 deep. Make sure that
when you are done, you have broken the structure down so completely that
there are over 500 such auto-numbered paragraphs.
For example, one paragraph might be: (this is a real example)then... (and this is the kicker) when you write the code, for each of these paragraphs
you write a corresponding global function named:Do not document these functions. After all, that's what the
design document is for!
Since the design doc is auto-numbered, it will be extremely difficult
to keep it up to date with changes in the code (because the function
names, of course, are static, not auto-numbered.) This isn't a problem
for you because you will not try to keep the document up to date.
In fact, do everything you can to destroy all traces of the document.
Those who come after you should only be able to find one or two
contradictory, early drafts of the design document hidden on some
dusty shelving in the back room near the dead 286 computers.
--- end excerpts ---
The code should be written so that it is obviously WHAT it does.
:-) Like when I go "ok, it's doing that... But WHY?", I know that I should have put in a comment. But if I go "WHAT the f**k is it doing here", those lines should be rewritten.
The comment should explain WHY.
Not that I use that convention myself, but I often wish that I did
It's not just that the programs to write are small, it's that they're write-only. You write them once, get graded, that's it. We churn generation after generation of students who are taught that code is written once, then never ever maintained.
Sure, you learn lots of things about design, software engineering, etc, in university, but they're pure theory. And seemingly useless theory at the moment. There is _nothing_ to illustrate there why some code organization is good, and why spagetti code is bad. All those lessons about maintenance as wasted when you never have to maintain anything, nor ever write anything big enough.
So while I'll say your idea does have merit, I think it can be done better. Don't just give them 1000 lines of someone else's code. Make them keep building and expanding the same program until the last year.
E.g., ok, in introductory programming they had to write some 100 line trivial program. But don't throw it away. When the next course comes along, give them the assignment to change or expand that original program.
E.g., if at some point you also get a computer graphics course, make them add a graphics module to that program. GUI programming? Sure, add a GUI to it. Database programming? Sure, make it save the data in a database. YACC? Ok, make them add a small scripting language to it. Different language? Make them port it to that language. Etc.
Make it a part of the grade to explain _what_ had to be changed and _why_.
Eventually it _will_ grow to be 1000 lines, and then it will grow even larger. And more importantly it'll be example of why code has to be readable and maintainable.
A polar bear is a cartesian bear after a coordinate transform.
The three items you seem to object to all have fairly legitimate uses in some computing contexts.
1. Comment each function
- Function name
Why do you need to repeat the name of the function? It's right there in the code.
Yes, and in some contexts that is precisely the problem. Depending on the language and editor(s) you are using, embedding the function names in comments can be very useful.
For example, if I use an "FC" command in a Unisys 2200 mainframe editor (say ED or UEDIT) to show all lines starting with "C" (comment lines) in a Fortran program, the results will be confusing if the function names themselves are not specified in a comment line. All I will see is comments with no context.
By adding the subroutine or function names in the related comments themselves, that missing context is restored.
5. Put in a '?' comment for code that you do not understand (good for function headers)
Code with personal annotations should never be released.
While I agree with this part,
If you don't understand it, you shouldn't be working with it; go find someone who does understand it instead, or stop and work it out.
this part of your comment seems limited to "perfect world" scenarios only.
I've spent most of my career working on mainframe code that was either originally written by people at other organizations or companies, or that was written so long ago that the original designers and coders were either retired or deceased.
When making a first pass at understanding such code, I've found it to be quite useful to leave various comments in the code so I have a good idea about my level of understanding the next time I have to come into that area of code.
I've also found such comments useful when left in the code by others, since it reflects their level of understanding as well. Some of the stuff I'm looking at now was originally coded in the early 80's and rewritten by my predecessor, and the comments he's placed in the code have proven to be invaluable to me, even though some of them were speculative in nature. Something like "What does this code do?" in his comments often alerts me to a tricky part in the logic that I'll want to pay special attention to.
FWIW, question marks and other comments/speculations are commonly used when working on someone else's previously-written code, at least in my experience, especially when that code is only intended for use in-house (the source is not released to customers).
8. Tag your bug fixes, code enhancements with a comment followed by a dash, date, and your initials. This is essential for large projects or for anything you will be working on for more than 6 months.
Once again, nothing with personal annotations should ever be released into source control. Not only should I not need to know who wrote or changed a particular snippet of code to fix a bug, I shouldn't even be able to tell by looking at the code.
When working on code that was written in-house, and which is intended to stay in-house, I *want* to know who made a certain change and why it was done. That type of notation can save me (or another support programmer) a lot of time.
While those types of details might be handled by the source control system(s) that you work with, keep in mind that some of us work on platforms where analogs to the UNIX-like source control systems many people are used to simply do not exist (and often such a thing isn't really needed).
When I did Unisys A-series mainframe contract work at a small company in southern Minnesota, they used the unparsed end of each COBOL line as a comment area, and one always placed their initials and a date there so people knew who did which changes and when those changes were created. The editor they used (CANDE) even had a special setting to automagically place a change mark on the end of each line that was modified.
A si
Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
The Theorem Theorem: If If, Then Then.