Literate Programming and Leo
jko9 writes "First proposed almost 20 years ago by Donald Knuth, the idea of Literate Programming is basically that of making program documentation primary, and embedding code in the documentation, rather than vice versa. Despite some obvious
advantages apparent to anyone who has struggled to understand a poorly
documented program, literate programming never really caught on.
That all could change, however, with the release of a new program called Leo,
written by Edward K. Ream.
Leo supports standard literate programming
languages like noweb and
CWEB, but with a crucial
difference - Leo adds outlines. The effect is striking: overall
organization of a program is always visible and explicit. Much of the narrative of the documentation gets placed in the outline, making documentation simpler, and allowing viewers to approach the code at various levels of detail. Screenshots and tutorials for Leo are here - if
that site gets slashdotted, you can download the visual tutorials in .chm
form or html form from Leo's
Sourceforge site. Leo is an open source program written in Python. Any current practioners of Literate Programming techniques out there? People
who have tried it and given it up? Can the addition of outlines to Literate
Programming make it more powerful / popular?"
Roedy Green has written an excellent, humorous online article on writing unmaintainable code. This relates directly to Literate Programming, especially Roedy's points about maintaining existing code. He writes (here): "[the maintainence programmer] views your code through a toilet paper tube. He can only see a tiny piece of your program at a time. You want to make sure he can never get at the big picture from doing that. You want to make it as hard as possible for him to find the code he is looking for. But even more important, you want to make it as awkward as possible for him to safely ignore anything. "
Literate programming in general, and Leo in particular, would be the ultimate cure for this. It allows you to easily navigate between multiple levels of description of a program. This is critically important if you are coming fresh to an existing piece of code. You need to constantly cross-reference the high-level design and low-level implementations (and the various levels of description between these extremes).
Sailing over the event horizon
If your code requires massive documentation within the code to make it understandable, then your code likely needs to be rewritten.
:-)
// Adds two numbers together
I think you're missing the point. All code can be described at several different levels. At the highest level, you might describe a program as (for example) "an online banking application", which is a complete description of the app. However there are obviously a lot of details below this level of description
Different people need to understand a program at different levels of description. The CEO may only need to know the highest level description. At the other end of the spectrum, someone working on the optimal algorithm for maintining user session should be isolated from the implementation details of other parts of the program. The architect should be concentrating on the interconnection of modules within the code, not the implementation itself.
The code itself is good at describing some levels of description and very poor at describing others. The example you give doesn't need any documentation to understand what those two lines do, but it will need documentation to understand their relevance to the higher levels of the system.
Programmers tend to see the details and often miss the larger context. This can lead to unstated and often false assumptions about what role the code fulfills and how it interacts with the rest of the system These are the hardest bugs to find and fix.
There are many ways to solve this "levels of description" problem. Inline documentation is one very valuable tool. Of course it shouldn't be:
a = b + c;
It should describe the functional role of the code in relation to the higher-level components of the system.
As you point out, abstraction and encapsulation are good mechanisms for constructing higher-level descriptions of functionality. Why stop there? Why not try to build up beyond these levels as well? Perhaps we will evolve to high-level languages that can express these high-level designs. Until then inline docuemntation and literate programming are excellent tools to help you achieve these goals.
Sailing over the event horizon
I can't tell what your code should do if it can't find a person named Harry.
.name returns a char * that I'm supposed to free or delete [], if it returns a const char *, if it returns a string that I can modify but won't modify the original Person, if it returns a string reference which I can use to modify the original Person's name, if it returns a wstring reference which I can use to modify the original Person's name, if it returns a const string reference, or if it returns a const wstring reference, or if it uses some other string representation like a Qt one, or some custom one - heck, it could even use an MFC-style CString.
:
I can't tell what your code should do if it finds multiple people named Harry.
I can't tell how to use your code to find a person whose name requires Unicode to represent it.
I can't tell if
I don't like that the function you've called is named "findPerson" - wouldn't it be far better to call it something like "findPersonByFirstName"? Or "findFirstPersonWithFirstName"? For that matter, why am I calling "Person::findPerson"? Isn't that slightly redundant? Wouldn't "Person::find" be just as clear, and less verbose? Therefore, the function should be something like "Person::findFirstWithFirstName". Wouldn't that be much more highly documented than what you've got?
While we're on it, if it is returning the "first", by which method is it sorted? Shouldn't I be able to pass in a parameter which describes the order in which I want the results returned? And shouldn't you get an iterator instead of a reference, anyway?
Back to "name", is that their entire given name? Is it a nickname? Is it in last-name first format? Is there some additional identifier in the name if two people have the same name?
And I still don't know if I'll get a special Person which is supposed to be a Non-Person, if it can't find "Harry", or if this is going to throw an exception.
I don't like that your code uses a hard coded-value, "Harry".
I don't like that your code has the variable "p". Granted, you've got a pretty amazingly short scope in your example, but code tends to grow. It would be better if the variable had a slightly longer name.
There are all sorts of things to nit-pick about, that a new coder could be confused about, or bugs which might be on the verge of instantiation, even in code as simple as yours.
But my real point is this
If I've just walked in to your code, I don't know what behavior it's SUPPOSED to have, since you haven't documented that. All I can tell is what it DOES do. And since code changes over time, it's impossible for me to distinguish between the two, unless you document it.
Education is the silver bullet.
min = max
i++;
Is good commenting, even though it's the same number of comments.
For a project I am working on, I needed to extend CWEB to do some things Knuth hadn't thought of, and I found that excessive cleverness in the data structures made it much more difficult to extend than it should have been, so that Knuth could demonstrate clever data structures that probably add a few percent to the performance over what he could have achieved with more prosaic ones (Knuth does not document why he made these excessively clever design choices, nor whether the performance advantages they offer were significant).
Similarly, a recent thread on comp.text.tex recently asking about the extensibility of TEX produced a number of comments from those who know about how unextensible and unreusable TEX really is.
So, while I use literate programming (CWEB) to document a lot of my own code, I don't believe in all these years, that I have ever seen a good example of literate-programming that looks towards the future (refactoring, extending, reusing) as opposed to generating a fossil with that comes with a good story of its life and times.
I would like to distinguish between the techniques of literate programming and the practice of literate programming (LP) as it has always been done before Leo (traditional LP). The key technique of LP is what might be called "functional pseudocode." For example, here is a fragment of code that can be written in Leo:
The line: << do something complicated >> is a section reference. It works pretty much like a macro call. In particular, the code in the defintion of << do something complicated >> has access to the done and result variables. This is almost the entire content of noweb, one form of literate programming. It turns out that this technique can be extremely useful, as simple as it seems. Leo creates one or more "derived" files from an outline automatically when the outline is written, and Leo can update the outline from changes made to derived files when Leo reads the outline.In contrast to the technique of literate programming, the practice of traditional LP has focused on the central role of comments, and lots of them. Here is where Leo radically parts company with the LP tradition.
One's view of the proper role of documentation in a project hardly matters to Leo. You are free to use comments as you always did, though you will probably find that LP as implemented in Leo helps you out in unexpected ways. I discuss at length and in great detail the relationship between traditional LP, comments and Leo here. In short, discussions about the role of comments in programming (literate or not) do not get to the heart of Leo.
In fact, Leo often reduces the need for comments. Indeed, it is good style to organize Leo outlines like a reference book. Well-designed Leo outlines act both like self-updating tables of contents and self-updating indices. This is in marked contrast to the "stream-of-consciousness" or "narrative" style typically employed in traditional literate programming.
In my view, the essence of Leo is this: Leo makes outline organization the most important part of a program or a project. Both code and documentation could be considered secondary. At every moment, the overall big picture of a function, class, module, file or project is always at hand. Moreover, Leo makes outlines structure a part of the computer language. For example, I often define a Python class as follows:
The @others directive acts as a reference to all the text in all the outline nodes which are descendents of the node containing this class declaration. Such nodes are copied to the output (derived) file in the order in which they appear in the outline. The reference << declarations of myClass >> ensures that those declarations precede the methods. There are several other ways that outline structure is important in Leo; I won't discuss them here.
Leo fully exploits the organizational power of outlines. A single outline typically organizes an entire project. Outlines can handle large amounts of data with ease. Moreover, it is possible to clone any part of an outline so that changes to one clone affect all other clones. This is feature makes it possible for a single outline to contain multiple views of a project. For example, when fixing a bug, I clone all nodes related to the bug and gather them in a new part of the outline, called a task node. This task node effectively becomes a view of the project that focuses exclusively on the bug. Any changes I make to code are propagated to all other clones.
Earlier I mentioned that a well designed Leo outline acts like self-updating tables of contents and self-updating indices. Tables of contents you get for free: an entire outline is the table of contents. Clones create self-updating indices. For example, each task node acts like the index entry for that particular task.
- Edward K. Ream