Writing Documentation: Teach, Don't Tell
Programmer Steve Losh has written a lengthy explanation of what separates good documentation from bad, and how to go about planning and writing documentation that will actually help people. His overarching point is that documentation should be used to teach, not to dump excessive amounts of unstructured information onto a user. Losh takes many of the common documentation tropes — "read the source," "look at the tests," "read the docstrings" — and makes analogies with learning everyday skills to show how silly they can be. "This is your driving teacher, Ms. Smith. ... If you have any questions about a part of the car while you’re driving, you can ask her and she’ll tell you all about that piece. Here are the keys, good luck!" He has a similar opinion of API strings: "API documentation is like the user’s manual of a car. When something goes wrong and you need to replace a tire it’s a godsend. But if you’re learning to drive it’s not going to help you because people don’t learn by reading alphabetized lists of disconnected information." Losh's advice for wikis is simple and straightforward: "They are bad and terrible. Do not use them."
Do you really want to read the source code for ssh every time you forget whether it's -p or -P to specify the port? (It's one for ssh and another for rsync...)
As an author of three successful dead-tree programming books, I have a few observations.
1) I use the electronic versions myself because of easy search (better than an index) and copy/paste.
2) In book format, it's possible to lead a reader through topics in a sensible order that builds on prior topics.
3) The challenge with electronic/on-line documentation is that there is no expectation that readers will approach the material in any particular order. Readers type a search term into google and up pops a page or two of documentation. How can the author make safe assumptions about the definitions of terms and prior conceptual knowledge the reader will have? Adding links to the definitions of terms and links to chapter oriented conceptual documentation doesn't usually help because readers are impatient, and there is no good place in the middle of the documentation to start.
4) Many readers don't know the terms to type into google and therefore aren't lead to the relevant conceptual documentation even if they would have read it had they known.
In my opinion, Stack Overflow is most often the blind leading the blind. There will be 20 wrong answers, 10 answers to the wrong question, 2 suboptimal solutions, and if you are in luck there will be 1 good solution. Now, tell me which is which. It seems to me that the good answer is almost always buried under crap.
Stack overflow questions are often badly stated and difficult to find with more correct search terms. If you don't even know the search terms, the site is useless.
There have been a few times when stack overflow saved me a lot of time. There have been many times when stack overflow has been a pointless time sink.
I hate you! You're one of those co-workers that urgently e-mails me at 1AM in the morning asking me how to use some utility I wrote. In the morning I reply, "Use the -h switch, you mother f*cker." Followed by my usual disclaimer--"Every utility I write has an -h switch, which describes the switches option-by-option, followed by short description of the function of the utility, plus gives links to additional documentation."
And if you think you're going to find the -p switch in OpenSSH source code, good luck. Option argument handling is strewn about in several different files. I know, because I've had to hack on it and add options, as well as fix the parsing of forwarding option arguments, among other things. I've seen worse, but it's a long way from some utilities, where getopt and getopt_long processing is concise and easily readable.
Pro tip: readable source code has nothing to do with methods, classes, functions, or variables per se. It's the overall structure that counts, even if it's a single 10,000 SLoC function. Most C++ apps are harder to read than a gigantic ASM app.
Most people organize their code by what it literally does--by the components they learned in school or a textbook. They tediously breakdown blocks into a myriad functions and classes based on their algorithmic role. Or they farm out "parse_int", but then have a 200-line chunk of code processing a dozen different kinds of ints (ints for timeouts, ints for userid, etc).
I don't have many simple tips for alternatives. I just know that most people are doing it horribly wrong. I like to think my code is fairly easy to read--and people have told me that--but I know I could get better.
Okay.. one simple thing people could do more often--use fewer source files, fewer classes, etc.
Also, people abstract too early, before they understand what the meaningful abstractions are. So they end up with too many abstractions, creating too much complexity. People should begin to write their applications as quickly as possible, without worrying about structure--just functionality. It's only until you're about one third or even halfway through that you have an idea of how the whole application should be structured. That's when you start over, before it's too late to re-architect, but after you have a concrete idea of what's necessary and what's superfluous.
WHile the difference between a textbook and a reference manual should be obvious to all, TFA still has a point: good documentation should include both.
Most docstrings I see are worthless: they add nothing to the code right below them. OTOH, a bit of tutorial-style documentation with examples can be golden. Often that makes well-written unit tests the best docs, or at the very least, if you're going to provide examples, they should also be unit tests - another test is always nice, plus you know your examples actually work.
Socialism: a lie told by totalitarians and believed by fools.
That's how I got my entree into writing about Linux. Programmers are very smart, but not very eloquent and they are also very poor teachers.
There are any number of rules and guidelines for writing documentation, most of which are ignored since documentation is often the red-headed stepchild of the project.
Documentation should tell a story clearly and help the reader understand the 'why' and 'how' as well as the 'what'.
"I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert
I produce a lot of documentation along with my coding, and the one thing that makes it palatable (even to me, re-reading it) are illustrations.
I'm not talking about UML class or activity diagrams, although those things are great where appropriate. It could be anything relevant to getting your point across, like a fragment of a database table showing sample data so people can visualize how a group of tables will work together. Screen grabs with arrows and circles.
My rule of thumb: if I ever find myself drawing a picture on a whiteboard as I'm explaining my module to someone, I immediately stop and take a picture of the diagram I just drew, and ASAP afterwards I turn that picture into an illustration in the user docs. Then next time I can just whip out the docs and point to the illustration.
Koans and fables for the software engineer
And for no arguments. Or at least print what is required to get help
C:\>app
Crappy app 0.0.0.1a
GPL 2 (If you don't like it fix it yourself
For help type -?
C:\>app /?
Crappy app 0.0.0.1a
GPL 2 (If you don't like it fix it yourself
I said enter -?, not /? This program was barely ported using cygwin, so you have to use *NIX arguments
Don't like it, fix it yourself
C:\>_
No. An expert may be an expert in an area, but until he's familiar with your code, he's not an expert in your system. I've spent way to much time deciphering code where a single sentence explaining what the hell the code was doing would have saved time. If you had enough time to write the system, you have enough time to document it. And if it's hard to document, that's a hint that it's a crappy system.
-h? Next time, use all three of these: -?, -help, --help. I'm probably not going to try throwing -h at a program without having a clue what it might do.
Then use the damn manual. That's why we write them. If you want to know how to use the manual, use the manual:
$ man man
$ man woman
No manual entry for woman
Yep. It knows everything!
Actually, no there isn't. A tutorial is one type of documentation. Tutorials are documentation for processes. Non-process subjects require other approaches. It is important to write the right types of documentation based on the likely audience and the subject matter.
I disagree with many things in this article, not because the points are invalid, but because they conflate misuse of tools with low quality of tools. For example:
Wikis are great tools for writing documentation. They make it easy for people to fix minor errors when they notice them. They make it easy to collaborate on documentation without having to deal with the relatively high overhead of source code version control systems (which are particularly awful when merging structured content like XML and HTML).
What the author is complaining about is not the wiki, but rather the fact that those projects have no one who is responsible for maintaining the documentation. If no one is responsible for writing the docs and ensuring their completeness, the documentation will inevitably be half-finished, whether they use a wiki or some other mechanism. The wiki is not an alternative to writing documentation, but rather is a tool for creating documentation.
Doc generation software is great for writing reference documentation. By placing the content into the source code, it becomes the responsibility of the programmer to update any behavior changes when they modify the behavior of a function. It also means that the documentation for the function is easily readable right there in the source code when you're trying to understand a function. By producing the generated documentation, you then have a convenient reference for all your functions, methods, classes, data types, etc. that is readily searchable, indexable, and (perhaps most importantly) is viewed in a separate app or window from your source code so it doesn't force you out of your coding flow when you need to look something up.
Once again, what the author is complaining about is not the doc generation tool, but rather the fact that those projects have no one who is responsible for writing the documentation. When used properly, the output of doc generation tools is every bit as good as documentation produced by hand. However, it takes exactly as long to write that documentation in the source code as it does to write it in a word processor. It is not a tool for saving time, but rather a tool to aid in maintaining consistency between behavior and documentation.
To do software-generated documentation correctly, you need to add comments that explain every field in every data structure, every class, every function or method, and for particularly complex functions, even documentation for many of the local variables. You should write code in your build system to warn about undocumented methods and data structure fields. For example, in one project I regularly work on, there are almost 17,000 lines of documentation comments out of just shy of 59,000 total lines of code—a whopping 28.8% of the total code volume. The result is that it is fairly easy to learn what each piece does in the context of the code while you're looking at it, and the automatically generated documentation is pretty thoroughly fleshed out reference documentation for the project. One particularly complex class by itself produces a whopping 72 pages of reference documentation.
The problem that folks run into is that they usually don't put in any doc comments at all, or at best don't actually take the time to write the thorough comments that are needed to make the output from automatic reference documentation tools be useful. As a result, when you build the reference docs, you end up with an empty skeleton that isn't of much value at all. This is not a flaw in the tool; it is a flaw in the development team. They didn't take the time to write the documentation.
And so on.
Check out my sci-fi/humor trilogy at PatriotsBooks.
Given the number of bugs in most code, I'd suggest that it is pretty poor documentation for what the code is SUPPOSED to do.
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.