Slashdot Mirror


Literate Programming and Leo

jko9 writes "First proposed almost 20 years ago by Donald Knuth, the idea of Literate Programming is basically that of making program documentation primary, and embedding code in the documentation, rather than vice versa. Despite some obvious advantages apparent to anyone who has struggled to understand a poorly documented program, literate programming never really caught on. That all could change, however, with the release of a new program called Leo, written by Edward K. Ream. Leo supports standard literate programming languages like noweb and CWEB, but with a crucial difference - Leo adds outlines. The effect is striking: overall organization of a program is always visible and explicit. Much of the narrative of the documentation gets placed in the outline, making documentation simpler, and allowing viewers to approach the code at various levels of detail. Screenshots and tutorials for Leo are here - if that site gets slashdotted, you can download the visual tutorials in .chm form or html form from Leo's Sourceforge site. Leo is an open source program written in Python. Any current practioners of Literate Programming techniques out there? People who have tried it and given it up? Can the addition of outlines to Literate Programming make it more powerful / popular?"

26 of 358 comments (clear)

  1. Just giving it a name... by wiremind · · Score: 4, Insightful

    Did ANYONE learn (sic.) pseudo code ???

    When i learned programming writing pseudo code was SUCH a big deal to the teacher that by the end of the year without even thinking i would write out the whole program in pseudo code, then, under each line of english add one line of code.

    And has it ever paid off!

    Now when I want to look at my own documentation, I just grep my java files and pull out all lines that begin with '//'

    now when I am writing 20 pages of java code, and all my boss see's are comments I can tell him i'm am just writing Literate code!

    Good day to you sir.

    1. Re:Just giving it a name... by jgerman · · Score: 4, Insightful
      Ugh, there is certainly such a thing as over-commenting, and from the sound of it you have contracted this disease. If I were reading someone's code and saw:

      // set min equal to max

      min = max;
      // increment i

      i++;


      I'd rip his (or her) head off. There's a balance involved in commenting. Comments are only needed when program flow isn't obvious. Though a comment block summary in front of subroutines is certainly a good idea.

      --
      I'm the big fish in the big pond bitch.
    2. Re:Just giving it a name... by gorilla · · Score: 5, Insightful
      That's not overcommenting, that's commenting wrong. You should be commenting why you are doing something, not what the code does.

      // Default Minimum to be same as Maximum
      min = max
      // We have finished this data cell, Move onto next data cell
      i++;

      Is good commenting, even though it's the same number of comments.

    3. Re:Just giving it a name... by wkitchen · · Score: 2, Insightful

      Pseudo code works especially well with languages that are inherently hard to read. Thanks to pseudo code, I can still easily understand PIC assembly language programs I wrote 10 years ago. Without it, it can be hard to comprehend something I wrote 10 days ago.

      The assembler uses a semicolon to identify comments. For my pseudo code lines, I put a slash immediately after the semicolon so I can extract the pseudo code but ignore other miscellaneous comments.

      Funny thing is, having no formal training as a programmer, I hadn't heard of pseudo code before I reinvented it for myself. I even called it by that name, well before discovering that it was already a common technique.

    4. Re:Just giving it a name... by j7953 · · Score: 3, Insightful
      // Default Minimum to be same as Maximum
      min = max

      I'm not sure if this is a good comment. Of course it depends on the context, but if I read this comment, I'd immediately wonder why the default minimum is the same as the maximum. Imho it would be much better to explain the complete algorithm at the beginning of the routine, and then have only few comments within the code. However, as I said, this depends on the context and in some situations the above comment might be useful.

      // We have finished this data cell, Move onto next data cell
      i++;

      This is not a good comment, imho. Or at least an unnecessary one. If it is not clear from the context (e.g. the loop is short enough) what the variable i is being used for, you should give it a more explanatory name. Your example could be much better written as

      cellIndex++;

      Using too many comments instead of self-explaining code is not only unnecessary, it often also causes the problem of the comments not being updated when the code is modified.

      --
      Sig (appended to the end of comments I post, 54 chars)
  2. good code is... by jukal · · Score: 3, Insightful

    literate, without literate programming :)

  3. Questions. by bons · · Score: 3, Insightful
    When programming in a literate system do you describe the objects and methods from a programming viewpoint, a business viewpoint, or from a metaphor viewpoint?

    When we build systems, we work directly with the client and we are able to describe the system in three equal, but very different ways. Depending on the documentation required and the target audience, we can describe the system in a way that allows everyone involved to communicate effectively. This is an advantage I don't want to lose.

    From what I've read, literate programming seems to be a discipline that works best when the programmers are isolated from the client. How it works when the programmers and the client closely interact is something I simply don't understand.

    1. Re:Questions. by Matthew+Weigel · · Score: 3, Insightful

      Blockquoth bons:

      When programming in a literate system do you describe the objects and methods from a programming viewpoint, a business viewpoint, or from a metaphor viewpoint?

      At its heart, literate programming creates multiple documents from a single master document. The common case is creating two documents - a document that is a paper on a program, and a document that compiles to the program - from the master document; but it's entirely possible to create more than just the two documents with a tool like noweb.

      As an example, you could produce API documentation, algorithm descriptions, a description of the interaction of the whole schebang, and the program source itself from a single set of master documents.

      And, again, the gain of literate programming is that you can keep all these forms of documentation close to each other and close to the code, which is a win.

      Now, noweb isn't perfect: it's optimized for creating just one set of documentation, so the other documentation would have to be treated as code. It would be a lot better if you could name documentation blocks just like code blocks, but oh well...

      --
      --Matthew
  4. The Problem With Literate Programming by raytracer · · Score: 4, Insightful

    The biggest problem with literate programming is that most people don't write programs that are worthy of exposition. Most programs are written under extreme time constraints to solve immediate or practical problems, and their complexity arises from handling exceptions, special cases, and last minute or ill conceived extensions. Documenting these with prose actually doesn't help very much, as the prose reads pretty much as the code does: as a set of ill conceived exceptions rather than bold themes. Making the prose flow well is just work that could be used to make the code better.

    If your code doesn't have these faults, then the code is already an expression of the program ideas, and one that you can excecute, so in that case literate programming techniques are needed to a much smaller degree.

    There is no doubt that literate programming (like extreme programming) has its benefits, but their principal benefits are to encourage an attitude of critical evaluation to your coding efforts. This criticism is encouraged in literate programming
    but not a unique feature of that approach.

  5. More focus on API Doc and Unit Testing by one9nine · · Score: 3, Insightful

    I don't think what he has is bad, but I think there a better ways to achieve cleaner code.

    Many people have mentioned that writing cleaner code is the best form of documentation. This is definitely true, unfortunately you still have people who use letter for significant variables (i.e. not loop indexes) and who don't format their code or try to do too much in one line of code.

    I think a better approach to documentation is the test driven approach that is used in XP and with packages such as JUnit and Cactus. Basiclly, you write your test cases first, which will force you to pin down the exact functionality for your components. These unit tests are essenailly doecumentation on how your components should work. Granted, this doesn't document the specific code but I think that one of the reasons why so much code is hard to read is because the functionality was not clearly thought through.

    I also think API documention is more important. Alot of times I am trying to use an open source package and I have a hard time understanding how to use the API to achieve certian fucntionality. I can read the code just fine but it isn't clear how to use the objects themselves.

  6. Re:Inline Documentation is evil by gwernol · · Score: 5, Insightful

    If your code requires massive documentation within the code to make it understandable, then your code likely needs to be rewritten.

    I think you're missing the point. All code can be described at several different levels. At the highest level, you might describe a program as (for example) "an online banking application", which is a complete description of the app. However there are obviously a lot of details below this level of description :-)

    Different people need to understand a program at different levels of description. The CEO may only need to know the highest level description. At the other end of the spectrum, someone working on the optimal algorithm for maintining user session should be isolated from the implementation details of other parts of the program. The architect should be concentrating on the interconnection of modules within the code, not the implementation itself.

    The code itself is good at describing some levels of description and very poor at describing others. The example you give doesn't need any documentation to understand what those two lines do, but it will need documentation to understand their relevance to the higher levels of the system.

    Programmers tend to see the details and often miss the larger context. This can lead to unstated and often false assumptions about what role the code fulfills and how it interacts with the rest of the system These are the hardest bugs to find and fix.

    There are many ways to solve this "levels of description" problem. Inline documentation is one very valuable tool. Of course it shouldn't be:

    // Adds two numbers together
    a = b + c;

    It should describe the functional role of the code in relation to the higher-level components of the system.

    As you point out, abstraction and encapsulation are good mechanisms for constructing higher-level descriptions of functionality. Why stop there? Why not try to build up beyond these levels as well? Perhaps we will evolve to high-level languages that can express these high-level designs. Until then inline docuemntation and literate programming are excellent tools to help you achieve these goals.

    --
    Sailing over the event horizon
  7. Re:Inline Documentation is evil by Viking+Coder · · Score: 5, Insightful

    I can't tell what your code should do if it can't find a person named Harry.

    I can't tell what your code should do if it finds multiple people named Harry.

    I can't tell how to use your code to find a person whose name requires Unicode to represent it.

    I can't tell if .name returns a char * that I'm supposed to free or delete [], if it returns a const char *, if it returns a string that I can modify but won't modify the original Person, if it returns a string reference which I can use to modify the original Person's name, if it returns a wstring reference which I can use to modify the original Person's name, if it returns a const string reference, or if it returns a const wstring reference, or if it uses some other string representation like a Qt one, or some custom one - heck, it could even use an MFC-style CString.

    I don't like that the function you've called is named "findPerson" - wouldn't it be far better to call it something like "findPersonByFirstName"? Or "findFirstPersonWithFirstName"? For that matter, why am I calling "Person::findPerson"? Isn't that slightly redundant? Wouldn't "Person::find" be just as clear, and less verbose? Therefore, the function should be something like "Person::findFirstWithFirstName". Wouldn't that be much more highly documented than what you've got?

    While we're on it, if it is returning the "first", by which method is it sorted? Shouldn't I be able to pass in a parameter which describes the order in which I want the results returned? And shouldn't you get an iterator instead of a reference, anyway?

    Back to "name", is that their entire given name? Is it a nickname? Is it in last-name first format? Is there some additional identifier in the name if two people have the same name?

    And I still don't know if I'll get a special Person which is supposed to be a Non-Person, if it can't find "Harry", or if this is going to throw an exception.

    I don't like that your code uses a hard coded-value, "Harry".

    I don't like that your code has the variable "p". Granted, you've got a pretty amazingly short scope in your example, but code tends to grow. It would be better if the variable had a slightly longer name.

    There are all sorts of things to nit-pick about, that a new coder could be confused about, or bugs which might be on the verge of instantiation, even in code as simple as yours.

    But my real point is this :

    If I've just walked in to your code, I don't know what behavior it's SUPPOSED to have, since you haven't documented that. All I can tell is what it DOES do. And since code changes over time, it's impossible for me to distinguish between the two, unless you document it.

    --
    Education is the silver bullet.
  8. Why this doesn't work. by FreeLinux · · Score: 4, Insightful

    The following statements will be highly inflamatory to many people. They are not intended to be inflamatory but, rather a simple observation.

    Basically, Leo is yet another tool to automate the documentation of programming code. There are dozens, possibly hundreds, of programs available for this task. Yet, the problem that these tools were designed to solve remain very prevalent, if not pervasive.

    The reason that the problem remains and that Leo will not solve the problem either is relatively simple. Simply put, the problem is garbage-in, garbage-out (GIGO). These tools are not able to determine the purpose of the code or the intent of the programmer that is writing it. These tools cannot read the minds of the programmers. The tools rely on the programmer to write out their thoughts and the intended purpose of the code.

    Most programmers are unwilling or incapable of performing this critical step thoroughly. All too often, they use shorthand and expect the reader to understand what they mean. Or, they believe that the reader should be able to understand their thought process by reading the code itself. Furthermore, they assume that if the reader can't do this, they are simply not a good programmer (1337).

    To go a step further, many programmers are not capable of clearly expressing their thoughts in their native tongue. These people are quite brilliant and can do amazing things with their code but, they can't express their thoughts to another person unless that person is indeed, able to read and comprehend the code itself.

    Now, in fairness to the programmers, we have to look at what they do and what they are taught. Most programming languages are all about efficiency. They rely heavily on abreviations and aliases, why do you think it's called code? They are designed to require a minimum or typing while providing a maximum of functionallity. The programmers themselves are always striving for increased efficiency both in their code and in the way they get the code done. They always try to put out more which leads to further shortcuts and abreviations. This all tends to make programmers minimalists and their documentation clearly reflects this.

    So, Leo is unlikely to provide any documentation breakthroughs. The old rules still apply, garbage-in, garbage-out. The best idea I've seen was an earlier post, where the documentation is written first and then the code is developed to match the documentation. But, honestly, which of us going to do it that way. That's a lot of work and our ingrained habits are going to be hard to break.

  9. Re:Bogus, truly! by alienmole · · Score: 4, Insightful
    I've been a Q1 member of the IOOC 911.11 committee for programming languages since the early 90's

    IOOC 911.11? Would that be the International Olive Oil Council, or the Iranian Offshore Oil Company?

    Not to feed the troll, but for the benefit of any impressionable young programmers:

    The goal of a programming language is to provide a machine with a set of instructions, not to sit down and read it a story.

    Programming languages intended for use by humans (as opposed to languages intended primarily for machine generation) have multiple goals, three of which are to be human-writable, human-readable, and human-maintainable.

    Literate programming may not be a perfect solution, but it's addressing a real issue. Current programming languages tend to be pretty horrible at expressing abstractions in a human readable way. The ideal programming language would be one that allowed you to express abstractions at the level of the problem domain, yet was able to translate that into something as efficiently executable, or close to it, as something written in a lower-level language. Literate programming allows you to do something along these lines, although it still involves a fair amount of "manual intervention" on the part of the programmer.

  10. Re:Inline Documentation is evil by shaper · · Score: 3, Insightful

    Nope. You've given an example that is far more simple than any real-world situation where you might encounter uncertainty about code functionality. But I'll match you strawman for strawman. Same code sample...

    Person &p = Person::findPerson("Harry");
    cout p.name() endl;

    Questions: what do you do when findPerson() doesn't find Harry? Come to think of it, what are the preconditions for using the Person class in the first place? Do you have to set up a JNDI datasource first? Or maybe it uses an LDAP server so you need to have one for it to work? Why in the world is it looking for "Harry" in the first place? Who is this Harry person and why do we care about him at this point in the code? Should we send him a page if we can't find him? Is it the responsiblity of the caller of the code to use alternate means to locate the mysterious Harry or do we just give up and look for Jane? Uh oh, Harry quit last week! Now what?

    Oh and too bad for me that you quit last week and moved to Mongolia with Harry so I can't ask anyone these questions about the code that you failed to document and that I now have to support in my copious spare time.

  11. Re:Inline Documentation is evil by Anonymous Coward · · Score: 1, Insightful

    i like the last part of your comment "And since code changes over time, it's impossible for me to distinguish between the two, unless you document it.", since its total crap. code changes over time, yes, and code is the documentation if written well, and if not written well the coder should find a different job. its too hard to update comments when you are updating code, if you have a code that has a chain of things to be changed for it to work right(say you figure out 5 different exploits to your code for a server app, related to 1 or 2 main bugs, and they branch out), you will end up breaking your thought process if you update the comments while updating the code; however if you do not update the comments a month or year later when it is used again and needs to be changed someone will have to go though a massive debugging because they beleive the comment is correct when really it says something like "//int MyFunc(int foo)" when it should say something like "//int MyFunc(long foo)", problems like that happen a lot when you comment a large project, which is why in the long run it is easier to just not comment, in the short term as well.

  12. Amen by ArcSecond · · Score: 4, Insightful

    I am more of a technical writer than a programmer (well, really, I'm not much of a programmer at all), but it was always clear to me that 90% of the software development headaches I lived with at various companies could have been resolved with minimal effort early in the project.. IF anyone cared about using a methodical approach to project documentation.

    But nobody likes documentation. Writing it. Reading it. Just the word makes some people itch. For some reason, this is something that BOTH business managers and programmers don't get: documentation saves work. It is a way to produce a testable set of requirements, then a testable architecture/design, then a way to match up features and metrics in production and testing.

    I mean, why does everybody think writing the manual is the LAST thing you do when you make software? With all the snarky "RTFM" comments I hear from geeks, I should start a new variant...

    "PUHLEASE! BEFORE YOU START CODING, WTFM!"

    --

    I've got a bad attitude and karma to burn. Go ahead. Mod me down.

    1. Re:Amen by G-funk · · Score: 4, Insightful

      The reason geeks don't like writing too much documentation is simple. It's not laziness (well not always), it's just one simple thing.

      Documentation written before the project completion is wrong.

      Always.

      Full stop.

      No matter how good your documentation is, people in charge will look at it, and go "great!" then half way through, they look over your shoulder and say "that's not how i want that to work" and they make a "simple" change that creates a whole new use case, or sends an existing one off on a tangent. Or, a programmer half way through will come up with a better idea himself, and discuss it with the boss, and so it changes from spec again.

      And the worst thing in the world definitely isn't no documentation, it's wrong documentation.

      --
      Send lawyers, guns, and money!
  13. Re:Literate Programming by SerpentMage · · Score: 4, Insightful

    Being a professional engineer this is not how you approach the problem whatsoever. No engineer in their right mind writes the documentation ahead of time. Actually there are engineers that do that, but they work for the government.

    Real engineering is tinkering and logging what you did. In engineering there are three phases, which involve tinkering and experimenting and doing simulation. The second phase is coming up with a game plane. With the last phase being the implementation.

    And engineers do just jump in and do something when they know what they are doing. An engineer is an engineer because they know how to guess-estimate. That is why an engineer goes to school for 4-5 years to learn what engineering is. They when you need to tinker and when to jump in!

    The problem in IT is that you have people who do not have enough engineering education to know what they are doing. And by education I do not simply mean school education, but training or simply good mentoring.

    --

    "You can't make a race horse of a pig"
    "No," said Samuel, "but you can make very fast pig"
  14. Re:Inline Documentation is evil by maiden_taiwan · · Score: 2, Insightful

    Most of your criticisms are questions about the behavior of findPerson. These properties should be documented within findPerson, not in the caller.

  15. programmers can't write: the fly in the ointment by Xtifr · · Score: 3, Insightful

    There's an old saying (was on a "Murphy's Laws of Computing" poster I used to have): "make it easy for programmers to write in English, and you'll find that programmers can't write in English."

    Others have pointed out the all-too-common case where the code gets edited but the comments don't. This is bad, but not as bad as another common case: the programmer tries to comment the code, but his/her grasp of English isn't up to the task. This may be because English is a second language, or simply because the person specializes in computer languages, not human ones. In any case, the result is frequently misleading or incomprehensible comments that either do no good, or worse than no good. And, of course, deadline pressures never help.

    I think Literate programming is a wonderful idea, but I don't think it's a practical one in many (most?) real-world environments.

  16. Re:It still won't take off.. by Louis_Wu · · Score: 3, Insightful
    Combine this with some well placed comments, and you achieve a very high degree of readability.
    [sarcasm]

    Good writers know how to spell, and will catch spelling errors while proofreading for content and style. Besides, all good writers have dictionaries sitting on the desk for clarification of subtle meaning of words, and thesauri to remind them of better ways to express the idea. Knowing this, spellcheckers are unnecessary, and often counterproductive. I can't tell you how many times I've been writing a technical paper and had some stupid spellchecker choke on acronyms or technical terms! A good writer's skill nullifies the primary benefit of a spellchecker.

    [/sarcasm]

    But seriously, the problem isn't that it is IMpossible to write good, well documented code with Your-IDE-Of-Choice, but that Literate-Programming + Leo might make it easier to write well documented code. Hmm, sounds like the language selection process for a project; text manipulation in Perl, sound driver in C. You could write your text mangler in C, but Perl makes text processing easier. That's the point of Leo, make documentation easier.

    Consider any spelling errors intentional. :) BTW, I tried to post this two hours ago, but /. disappeared from the net. Since the discussion continued, I can only conclude that it's the computers at work which were being stupid.

  17. Limits of Javadoc by fm6 · · Score: 3, Insightful
    Well, specifying the API before you write is certainly a good idea. But you hardly need Javadoc to do that. The problem with Javadoc -- and all LP tools I've seen is that it confuses documentation with specification. A specification just has to be clear to other working on the project. It can be written by someone with no training in technical communication. The writer doesn't even have to have a full grasp of the language he or her is writing in -- computer terms are pretty universal.

    None of that is true for technical writing. It's a discipline onto itself. It's not just about good writing. (I've known computer scientists who'd written award-winning papers and articles, but couldn't write technical docs worth beans.) It's about understanding your audience and the (often painfully boring) task of writing in the clearest possible language.

    Not every project needs technical writers. If you're a small software shop, and you're building a set of components with an uncomplicated API, and hiring a professional writer isn't cost effective -- then yeah, use Javadoc or some other LP tools.

    But for big projects... Back in 1998, I was in charge of production for the doc set of a large Java framework. Having the API docs embedded in the source code was a nightmare. Javadoc was supposed to allow any of the engineers who wanted to to do their own API docs -- but many botched it, because they didn't understand Javadoc or HTML very well. We had professional writers, but many of them couldn't be trusted with source code. Hell, some of them didn't understand why they couldn't edit the SCCS archives!

    Worst of all was when the release cycle entered code freeze. Document freeze is always later than code freeze -- but you cannot let people modify the release code base during code freeze. The only solution was to split the source, then merge the docs back in after release. Very painful.

  18. Re:Literate Programming by ipjohnson · · Score: 2, Insightful

    Actually there are many ... many different ways to measure software one of the ones we use is "McCabe Complexity" along side a handful of other metrics. Hell CMU came up with a rating system for software engineering groups called CMM that evaluates your process as well as your process to change your process (defect reduction and what not).

    I'm not saying I agree with them but they are out there. I personally feel coding is a craft and not a science ... but management doesn't like to hear that because it means results are less reproducable. Thats a whole other can of worms.

  19. Re:Literate Programming by ipjohnson · · Score: 2, Insightful

    The trick is to have the requirements laid out. I know thats not possible some of the time but for the most part you should not be writting production code until most everything is nailed down. That said pathfinding (i.e. writting code to test theory) should be done before sitting down to write the real code.

  20. Re:The creator's view of Leo by holon · · Score: 2, Insightful
    I'm a long time outliner (15 years). Began with MaxThink then jumped to Ecco.

    Most people don't 'get' outlining. Most people are what I call linear thinkers. They use MS Word in page layout mode thank you very much, have no need to outlining, and will never understand brainstorming and the power of organization of thoughts thru outlining.

    Same thing with programmers. Most of the programmers here just aren't going to 'get it.' But I do. You're on to something big here. Makes complete sense... an orthogonal view of the physical artifacts of the system. And, the orthogonal view is the most important one - it's the logical view. But, the key that you succeeded with is clones. I'm a long time Ecco Pro user and the same effect is implemented there. It's multiple orthogonal perspectives that makes it truly work. You can shift perspectives, isolate certain elements and create a new perspective.

    I look forward to helping Leo evolve. I see many uses for it beyond a programmer's tool of course - as obviously do you.

    Congrats and right on,

    david bolene...