Literate Programming and Leo

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Wednesday August 28, 2002 @04:59AM from the pod-comparisons-inevitable dept.

jko9 writes "First proposed almost 20 years ago by Donald Knuth, the idea of Literate Programming is basically that of making program documentation primary, and embedding code in the documentation, rather than vice versa. Despite some obvious advantages apparent to anyone who has struggled to understand a poorly documented program, literate programming never really caught on. That all could change, however, with the release of a new program called Leo, written by Edward K. Ream. Leo supports standard literate programming languages like noweb and CWEB, but with a crucial difference - Leo adds outlines. The effect is striking: overall organization of a program is always visible and explicit. Much of the narrative of the documentation gets placed in the outline, making documentation simpler, and allowing viewers to approach the code at various levels of detail. Screenshots and tutorials for Leo are here - if that site gets slashdotted, you can download the visual tutorials in .chm form or html form from Leo's Sourceforge site. Leo is an open source program written in Python. Any current practioners of Literate Programming techniques out there? People who have tried it and given it up? Can the addition of outlines to Literate Programming make it more powerful / popular?"

20 of 358 comments (clear)

Literate Programming by bigjocker · 2002-08-28 05:03 · Score: 4, Interesting

My previous employer had a strict rule concerning code: you first write the JavaDoc for all the project, then implement it. It's useful as hell ... and if you mix that with UML design before the documentation, its a killer technique.

--
Life isn't like a box of chocolates. It's more like a jar of jalapenos. What you do today, might burn your ass tomorrow.
1. Re:Literate Programming by SerpentMage · 2002-08-28 06:45 · Score: 4, Insightful
  
  Being a professional engineer this is not how you approach the problem whatsoever. No engineer in their right mind writes the documentation ahead of time. Actually there are engineers that do that, but they work for the government.
  
  Real engineering is tinkering and logging what you did. In engineering there are three phases, which involve tinkering and experimenting and doing simulation. The second phase is coming up with a game plane. With the last phase being the implementation.
  
  And engineers do just jump in and do something when they know what they are doing. An engineer is an engineer because they know how to guess-estimate. That is why an engineer goes to school for 4-5 years to learn what engineering is. They when you need to tinker and when to jump in!
  
  The problem in IT is that you have people who do not have enough engineering education to know what they are doing. And by education I do not simply mean school education, but training or simply good mentoring.
  
  --
  
  "You can't make a race horse of a pig"
  "No," said Samuel, "but you can make very fast pig"
Programs as flat text files - why? by Animats · 2002-08-28 05:05 · Score: 4, Interesting

It's wierd, when you think about it, that programming is still done in flat text files. Almost nothing else is still done that way. One could argue for programs in HTML, with the code bracketed in XML so that the compiler could find it.
Few systems even allow multiple fonts in program text, although the original Bravo editor for the Xerox Alto did.
Just giving it a name... by wiremind · 2002-08-28 05:06 · Score: 4, Insightful

Did ANYONE learn (sic.) pseudo code ???

When i learned programming writing pseudo code was SUCH a big deal to the teacher that by the end of the year without even thinking i would write out the whole program in pseudo code, then, under each line of english add one line of code.

And has it ever paid off!

Now when I want to look at my own documentation, I just grep my java files and pull out all lines that begin with '//'

now when I am writing 20 pages of java code, and all my boss see's are comments I can tell him i'm am just writing Literate code!

Good day to you sir.
1. Re:Just giving it a name... by jgerman · 2002-08-28 05:40 · Score: 4, Insightful
  
  Ugh, there is certainly such a thing as over-commenting, and from the sound of it you have contracted this disease. If I were reading someone's code and saw:
  
  // set min equal to max
  
  min = max;
  // increment i
  
  i++;
  
  I'd rip his (or her) head off. There's a balance involved in commenting. Comments are only needed when program flow isn't obvious. Though a comment block summary in front of subroutines is certainly a good idea.
  
  --
  I'm the big fish in the big pond bitch.
2. Re:Just giving it a name... by gorilla · 2002-08-28 06:06 · Score: 5, Insightful
  
  That's not overcommenting, that's commenting wrong. You should be commenting why you are doing something, not what the code does.
  // Default Minimum to be same as Maximum
  min = max
  // We have finished this data cell, Move onto next data cell
  i++;
  Is good commenting, even though it's the same number of comments.
3. Re:Just giving it a name... by Tablizer · 2002-08-28 07:20 · Score: 4, Funny
  
  If I were reading someone's code and saw:... // increment i ... i++; I'd rip his (or her) head off.
  
  I feel that punishment should mirror characteristics of the crime itself.
  
  Tie them to the ground, get a perm marker and write "eye" on their eyelids, "nose" on their nose, "neck" on their neck and so forth, and for a good summarizing comment, "STUPID!" on their forehead, and finally "Brain" on their ass.
  
  --
  Table-ized A.I.
Curing unmaintainable code by gwernol · 2002-08-28 05:20 · Score: 5, Interesting

Roedy Green has written an excellent, humorous online article on writing unmaintainable code. This relates directly to Literate Programming, especially Roedy's points about maintaining existing code. He writes (here): "[the maintainence programmer] views your code through a toilet paper tube. He can only see a tiny piece of your program at a time. You want to make sure he can never get at the big picture from doing that. You want to make it as hard as possible for him to find the code he is looking for. But even more important, you want to make it as awkward as possible for him to safely ignore anything. "

Literate programming in general, and Leo in particular, would be the ultimate cure for this. It allows you to easily navigate between multiple levels of description of a program. This is critically important if you are coming fresh to an existing piece of code. You need to constantly cross-reference the high-level design and low-level implementations (and the various levels of description between these extremes).

--
Sailing over the event horizon
Been there, done that by devphil · 2002-08-28 05:26 · Score: 4, Funny

It's wierd, when you think about it, that programming is still done in flat text files.

Every compiler vendor who has sold a mainstream language compiler/IDE using a "program database" or some other such approach has tanked. (Note that I mean program database as the primary means of storing the code -- a replacement of flat files, not an addition to them.) So far, it's not really been a technological lack, it's just that programmers don't like it.

I recall reading some papers written by the major language guys a decade ago, and one of the things they all wanted to see was per-function recompilation (instead of per-translation-unit), better program information (like "where is this function used?") and other things that would require a more database-like format. Still hasn't happened except in research environments. (Pity.)

One could argue for programs in HTML, with the code bracketed in XML

One could, but one would be a lunatic.

(I'm too tired to write it all down now, but I'll just summarize by saying that XML is not a silver bullet.)

--
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
1. Re:Been there, done that by RelentlessWeevilHowl · 2002-08-28 08:14 · Score: 4, Interesting
  
  IBM's Visual Age for Java used something similar, adapted from their Visual Age Smalltalk. My problem with VAJ was that you couldn't do anything in their environment except what they had specifically designed for you to do. If you have files in disk, you can run whatever tools you want on them. But in VAJ or Visual Studio .NET? "I dunno, what's in the context menu?"
  
  To avoid flat text files, you'd need an interactive scripting language powerful enough to perform any task you'd care to think of (viz., Emacs). Plus you'd need enough support libraries available to you to interact with third-party utilities, and finally bindings for the abstract syntax trees of all the languages you want to program in, so you could manipulate them programatically.
The Problem With Literate Programming by raytracer · 2002-08-28 05:29 · Score: 4, Insightful

The biggest problem with literate programming is that most people don't write programs that are worthy of exposition. Most programs are written under extreme time constraints to solve immediate or practical problems, and their complexity arises from handling exceptions, special cases, and last minute or ill conceived extensions. Documenting these with prose actually doesn't help very much, as the prose reads pretty much as the code does: as a set of ill conceived exceptions rather than bold themes. Making the prose flow well is just work that could be used to make the code better.

If your code doesn't have these faults, then the code is already an expression of the program ideas, and one that you can excecute, so in that case literate programming techniques are needed to a much smaller degree.

There is no doubt that literate programming (like extreme programming) has its benefits, but their principal benefits are to encourage an attitude of critical evaluation to your coding efforts. This criticism is encouraged in literate programming
but not a unique feature of that approach.

--
There is much pleasure to be gained in useless knowledge.
The right balance by teetam · 2002-08-28 05:43 · Score: 4, Interesting

Too much documentation is just as bad as too little documentation, even when the documentation is good. It is very difficult to strike a balance.
For example, many of the core java apis are well written and well documented. If you see the HTML javadocs, you can get a pretty good idea of the class.
However, when you open the source code of the same class, it is not good looking anymore. Why? Because each method is preceded with dozens of lines of javadoc, each of which is embedded with HTML markup. That is good when the javadoc HTML pages are finally generated, but not so good when you look at the source itself. C# is worse with its XML based documentation!
When I look at the source code, I want to see the flow of the code easily. All the documentation in the source should only aid this and not hinder this. Javadoc does both. The explanation part of the javadoc can be very useful in understanding what the author's intent was when he/she wrote the method, but I am not so sure about the rest. The param, return and exception tags are no doubt useful, but often developers don't explain these very well. Plus, these are the tags that can easily become outdated.
I would prefer short and succint pieces of information documenting the code, preferrably close to the line of code that it documents.

--
All your favorite sites in one place!
Re:Inline Documentation is evil by gwernol · 2002-08-28 05:44 · Score: 5, Insightful

If your code requires massive documentation within the code to make it understandable, then your code likely needs to be rewritten.

I think you're missing the point. All code can be described at several different levels. At the highest level, you might describe a program as (for example) "an online banking application", which is a complete description of the app. However there are obviously a lot of details below this level of description :-)

Different people need to understand a program at different levels of description. The CEO may only need to know the highest level description. At the other end of the spectrum, someone working on the optimal algorithm for maintining user session should be isolated from the implementation details of other parts of the program. The architect should be concentrating on the interconnection of modules within the code, not the implementation itself.

The code itself is good at describing some levels of description and very poor at describing others. The example you give doesn't need any documentation to understand what those two lines do, but it will need documentation to understand their relevance to the higher levels of the system.

Programmers tend to see the details and often miss the larger context. This can lead to unstated and often false assumptions about what role the code fulfills and how it interacts with the rest of the system These are the hardest bugs to find and fix.

There are many ways to solve this "levels of description" problem. Inline documentation is one very valuable tool. Of course it shouldn't be:

// Adds two numbers together
a = b + c;

It should describe the functional role of the code in relation to the higher-level components of the system.

As you point out, abstraction and encapsulation are good mechanisms for constructing higher-level descriptions of functionality. Why stop there? Why not try to build up beyond these levels as well? Perhaps we will evolve to high-level languages that can express these high-level designs. Until then inline docuemntation and literate programming are excellent tools to help you achieve these goals.

--
Sailing over the event horizon
Re:Inline Documentation is evil by Viking+Coder · 2002-08-28 05:51 · Score: 5, Insightful

I can't tell what your code should do if it can't find a person named Harry.

I can't tell what your code should do if it finds multiple people named Harry.

I can't tell how to use your code to find a person whose name requires Unicode to represent it.

I can't tell if .name returns a char * that I'm supposed to free or delete [], if it returns a const char *, if it returns a string that I can modify but won't modify the original Person, if it returns a string reference which I can use to modify the original Person's name, if it returns a wstring reference which I can use to modify the original Person's name, if it returns a const string reference, or if it returns a const wstring reference, or if it uses some other string representation like a Qt one, or some custom one - heck, it could even use an MFC-style CString.

I don't like that the function you've called is named "findPerson" - wouldn't it be far better to call it something like "findPersonByFirstName"? Or "findFirstPersonWithFirstName"? For that matter, why am I calling "Person::findPerson"? Isn't that slightly redundant? Wouldn't "Person::find" be just as clear, and less verbose? Therefore, the function should be something like "Person::findFirstWithFirstName". Wouldn't that be much more highly documented than what you've got?

While we're on it, if it is returning the "first", by which method is it sorted? Shouldn't I be able to pass in a parameter which describes the order in which I want the results returned? And shouldn't you get an iterator instead of a reference, anyway?

Back to "name", is that their entire given name? Is it a nickname? Is it in last-name first format? Is there some additional identifier in the name if two people have the same name?

And I still don't know if I'll get a special Person which is supposed to be a Non-Person, if it can't find "Harry", or if this is going to throw an exception.

I don't like that your code uses a hard coded-value, "Harry".

I don't like that your code has the variable "p". Granted, you've got a pretty amazingly short scope in your example, but code tends to grow. It would be better if the variable had a slightly longer name.

There are all sorts of things to nit-pick about, that a new coder could be confused about, or bugs which might be on the verge of instantiation, even in code as simple as yours.

But my real point is this :

If I've just walked in to your code, I don't know what behavior it's SUPPOSED to have, since you haven't documented that. All I can tell is what it DOES do. And since code changes over time, it's impossible for me to distinguish between the two, unless you document it.

--
Education is the silver bullet.
Why this doesn't work. by FreeLinux · 2002-08-28 06:07 · Score: 4, Insightful

The following statements will be highly inflamatory to many people. They are not intended to be inflamatory but, rather a simple observation.

Basically, Leo is yet another tool to automate the documentation of programming code. There are dozens, possibly hundreds, of programs available for this task. Yet, the problem that these tools were designed to solve remain very prevalent, if not pervasive.

The reason that the problem remains and that Leo will not solve the problem either is relatively simple. Simply put, the problem is garbage-in, garbage-out (GIGO). These tools are not able to determine the purpose of the code or the intent of the programmer that is writing it. These tools cannot read the minds of the programmers. The tools rely on the programmer to write out their thoughts and the intended purpose of the code.

Most programmers are unwilling or incapable of performing this critical step thoroughly. All too often, they use shorthand and expect the reader to understand what they mean. Or, they believe that the reader should be able to understand their thought process by reading the code itself. Furthermore, they assume that if the reader can't do this, they are simply not a good programmer (1337).

To go a step further, many programmers are not capable of clearly expressing their thoughts in their native tongue. These people are quite brilliant and can do amazing things with their code but, they can't express their thoughts to another person unless that person is indeed, able to read and comprehend the code itself.

Now, in fairness to the programmers, we have to look at what they do and what they are taught. Most programming languages are all about efficiency. They rely heavily on abreviations and aliases, why do you think it's called code? They are designed to require a minimum or typing while providing a maximum of functionallity. The programmers themselves are always striving for increased efficiency both in their code and in the way they get the code done. They always try to put out more which leads to further shortcuts and abreviations. This all tends to make programmers minimalists and their documentation clearly reflects this.

So, Leo is unlikely to provide any documentation breakthroughs. The old rules still apply, garbage-in, garbage-out. The best idea I've seen was an earlier post, where the documentation is written first and then the code is developed to match the documentation. But, honestly, which of us going to do it that way. That's a lot of work and our ingrained habits are going to be hard to break.
Re:Bogus, truly! by alienmole · 2002-08-28 06:09 · Score: 4, Insightful

I've been a Q1 member of the IOOC 911.11 committee for programming languages since the early 90's
IOOC 911.11? Would that be the International Olive Oil Council, or the Iranian Offshore Oil Company?
Not to feed the troll, but for the benefit of any impressionable young programmers:
The goal of a programming language is to provide a machine with a set of instructions, not to sit down and read it a story.
Programming languages intended for use by humans (as opposed to languages intended primarily for machine generation) have multiple goals, three of which are to be human-writable, human-readable, and human-maintainable.
Literate programming may not be a perfect solution, but it's addressing a real issue. Current programming languages tend to be pretty horrible at expressing abstractions in a human readable way. The ideal programming language would be one that allowed you to express abstractions at the level of the problem domain, yet was able to translate that into something as efficiently executable, or close to it, as something written in a lower-level language. Literate programming allows you to do something along these lines, although it still involves a fair amount of "manual intervention" on the part of the programmer.
Amen by ArcSecond · 2002-08-28 06:28 · Score: 4, Insightful

I am more of a technical writer than a programmer (well, really, I'm not much of a programmer at all), but it was always clear to me that 90% of the software development headaches I lived with at various companies could have been resolved with minimal effort early in the project.. IF anyone cared about using a methodical approach to project documentation.

But nobody likes documentation. Writing it. Reading it. Just the word makes some people itch. For some reason, this is something that BOTH business managers and programmers don't get: documentation saves work. It is a way to produce a testable set of requirements, then a testable architecture/design, then a way to match up features and metrics in production and testing.

I mean, why does everybody think writing the manual is the LAST thing you do when you make software? With all the snarky "RTFM" comments I hear from geeks, I should start a new variant...

"PUHLEASE! BEFORE YOU START CODING, WTFM!"

--
I've got a bad attitude and karma to burn. Go ahead. Mod me down.
1. Re:Amen by G-funk · 2002-08-28 13:35 · Score: 4, Insightful
  
  The reason geeks don't like writing too much documentation is simple. It's not laziness (well not always), it's just one simple thing.
  
  Documentation written before the project completion is wrong.
  
  Always.
  
  Full stop.
  
  No matter how good your documentation is, people in charge will look at it, and go "great!" then half way through, they look over your shoulder and say "that's not how i want that to work" and they make a "simple" change that creates a whole new use case, or sends an existing one off on a tangent. Or, a programmer half way through will come up with a better idea himself, and discuss it with the boss, and so it changes from spec again.
  
  And the worst thing in the world definitely isn't no documentation, it's wrong documentation.
  
  --
  Send lawyers, guns, and money!
Literate programming versus continuing development by Phronesis · 2002-08-28 06:36 · Score: 5, Interesting

Although literate programming has a lot of potential, all too often literate projects become completely ossified. M.D. McIlroy's criticism of Knuth's literate programs (CACM 29, 471-83 (1986)), that they tend to be like "industrial strength Faberg eggs" as opposed to reusable tools, is still quite valid.
For a project I am working on, I needed to extend CWEB to do some things Knuth hadn't thought of, and I found that excessive cleverness in the data structures made it much more difficult to extend than it should have been, so that Knuth could demonstrate clever data structures that probably add a few percent to the performance over what he could have achieved with more prosaic ones (Knuth does not document why he made these excessively clever design choices, nor whether the performance advantages they offer were significant).
Similarly, a recent thread on comp.text.tex recently asking about the extensibility of TEX produced a number of comments from those who know about how unextensible and unreusable TEX really is.
So, while I use literate programming (CWEB) to document a lot of my own code, I don't believe in all these years, that I have ever seen a good example of literate-programming that looks towards the future (refactoring, extending, reusing) as opposed to generating a fossil with that comes with a good story of its life and times.
The creator's view of Leo by edream · 2002-08-28 08:22 · Score: 5, Interesting

Hi. I am the creator of Leo and I'd like to say here what my own view of Leo is. Joe Orr has contributed greatly to Leo, and I would not characterize Leo exactly as he did in his original article. In this posting I hope to clear up misconceptions about what Leo is, what it can do, and the relationship of Leo to literate programming.
I would like to distinguish between the techniques of literate programming and the practice of literate programming (LP) as it has always been done before Leo (traditional LP). The key technique of LP is what might be called "functional pseudocode." For example, here is a fragment of code that can be written in Leo:
def spam():
done = false ; result = None while not done: << do something complicated >>
return result

The line: << do something complicated >> is a section reference. It works pretty much like a macro call. In particular, the code in the defintion of << do something complicated >> has access to the done and result variables. This is almost the entire content of noweb, one form of literate programming. It turns out that this technique can be extremely useful, as simple as it seems. Leo creates one or more "derived" files from an outline automatically when the outline is written, and Leo can update the outline from changes made to derived files when Leo reads the outline.
In contrast to the technique of literate programming, the practice of traditional LP has focused on the central role of comments, and lots of them. Here is where Leo radically parts company with the LP tradition.
One's view of the proper role of documentation in a project hardly matters to Leo. You are free to use comments as you always did, though you will probably find that LP as implemented in Leo helps you out in unexpected ways. I discuss at length and in great detail the relationship between traditional LP, comments and Leo here. In short, discussions about the role of comments in programming (literate or not) do not get to the heart of Leo.
In fact, Leo often reduces the need for comments. Indeed, it is good style to organize Leo outlines like a reference book. Well-designed Leo outlines act both like self-updating tables of contents and self-updating indices. This is in marked contrast to the "stream-of-consciousness" or "narrative" style typically employed in traditional literate programming.
In my view, the essence of Leo is this: Leo makes outline organization the most important part of a program or a project. Both code and documentation could be considered secondary. At every moment, the overall big picture of a function, class, module, file or project is always at hand. Moreover, Leo makes outlines structure a part of the computer language. For example, I often define a Python class as follows:
class myClass:
<< declarations of myClass >> @others

The @others directive acts as a reference to all the text in all the outline nodes which are descendents of the node containing this class declaration. Such nodes are copied to the output (derived) file in the order in which they appear in the outline. The reference << declarations of myClass >> ensures that those declarations precede the methods. There are several other ways that outline structure is important in Leo; I won't discuss them here.
Leo fully exploits the organizational power of outlines. A single outline typically organizes an entire project. Outlines can handle large amounts of data with ease. Moreover, it is possible to clone any part of an outline so that changes to one clone affect all other clones. This is feature makes it possible for a single outline to contain multiple views of a project. For example, when fixing a bug, I clone all nodes related to the bug and gather them in a new part of the outline, called a task node. This task node effectively becomes a view of the project that focuses exclusively on the bug. Any changes I make to code are propagated to all other clones.
Earlier I mentioned that a well designed Leo outline acts like self-updating tables of contents and self-updating indices. Tables of contents you get for free: an entire outline is the table of contents. Clones create self-updating indices. For example, each task node acts like the index entry for that particular task.
- Edward K. Ream